Sorry, you need to enable JavaScript to visit this website.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-DAY CI LINUX KERNEL PERFORMANCE REPORT (V4.19)

BY Philip Li ON Nov 08, 2018
  1. Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, merged regressions and improvements in v4.19 release candidates

  • Section 3, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v4.19 release cycle

  • Section 4, test machine list

 

  1. Linux Kernel v4.19 Release

The v4.19 release of the Linux kernel was on October 22. Headline features in this release include the new AIO-based polling interface, L1TF vulnerability mitigations, the block I/O latency controllertime-based packet transmission, the CAKE queuing discipline, and much more. 0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 1 regressions and 3 improvements during feature development phase for v4.19. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.

  1. fsmark.files_per_sec

The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.

  1. scenario: fsyncBeforeClose on btrfs

 

\Users\rongch2\Private\ml-report_files\html\0-Day CI Linux Kernel Report_files\test-status.png

Commit 172b06c32b was reported to have -2.0% regression

of fsmark.files_per_sec when comparing to v4.19-rc4. It was merged to mainline at v4.19-rc5.

 

Correlated commits

172b06c32b

mm: slowly shrink slabs with a relatively small number of objects

branch

linus/master

report

[LKP] [mm] 172b06c32b: fsmark.files_per_sec -2.0% regression

status

merged at v4.19-rc5. The author has not responded the regression yet.

 

  1. perf-bench-numa-mem.GB_per_thread

perf began as a tool for using the performance counters subsystem in Linux, and has had various enhancements to add tracing capabilities.

  1. scenario: memory test on numa

 

\Users\rongch2\Private\ml-report_files\html\0-Day CI Linux Kernel Report_files\test-status_003.png

Commit efaffc5e40 was reported to have 38.7% improvement of perf-bench-numa-mem.GB_per_thread when comparing to v4.19-rc5. It was merged to mainline at v4.19-rc7. The purpose of the patch is to remove rate-limiting of automatic NUMA balancing migration.

 

Correlated commits

efaffc5e40

mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration

branch

linus/master

report

[LKP] [mm, sched/numa] efaffc5e40: perf-bench-numa-mem.GB_per_thread 38.7% improvement

status

merged at v4.19-rc7

 

  1. vm-scalability.median

vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. We tested on multiple machines such as HSW EP server, during which we reported improvement on one test scenario.

  1. Scenario: anon-cow-seq test

 

\Users\rongch2\Private\ml-report_files\html\0-Day CI Linux Kernel Report_files\test-status_002.png

Commit c9f4cd7138 was reported to have 6.1% improvement

of vm-scalability.median when comparing to v4.18. It was merged to mainline at v4.19-rc1.

 

Correlated commits

c9f4cd7138

mm, huge page: copy target sub-page last when copy huge page

branch

linus/master

report

[LKP] [mm, huge page] c9f4cd7138: vm-scalability.median 6.1% improvement

status

merged at v4.19-rc1

 

  1. Observation Summary

0-Day CI observed 1 regressions and 3 improvements during feature development phase for v4.19, which is in the time frame from v4.19-rc1 to v4.19 release.

Test Indicator

Report

Test Scenario

Development Base

Status

fsmark.files_per_sec

[LKP] [mm] 172b06c32b: -2.0% regression

iterations:1x

nr_threads:64t

disk:1BRD_48G

fs:btrfs

fs2:nfsv4

filesize:4M

test_size:40G

sync_method:fsyncBeforeClose

ucode:0x42d

cpufreq_governor:performance

V4.19-rc4

Merged at v4.19-rc5, no response from author yet

perf-bench-numa-mem.GB_per_thread

[LKP] [mm, sched/numa] efaffc5e40: 38.7% improvement

nr_threads: 2t

mem_proc: 300M

cpufreq_governor: performance

ucode: 0x42d

v4.19-rc5

Merged at v4.19-rc7

vm-scalability.median

[LKP] [mm, huge page] c9f4cd7138: 6.1% improvement

runtime: 300s

size: 8T

test: anon-cow-seq

cpufreq_governor: performance

V4.18

Merged at v4.19-rc1

will-it-scale.per_process_ops

[kprobes/x86] 80006dbee6: +2.6% improvement

test: sched_yield cpufreq_governor: performance

V4.18-rc1

Merged at v4.19-rc1

 

  1. Shift-Left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v4.19 release cycle, 0-Day CI had reported 8 major performance regressions and 10 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized at report summary section.

  1. aim7.jobs-per-min

aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

 

  1. scenario: disk_rw test on f2fs

 

\Users\rongch2\Private\ml-report_files\html\0-Day CI Linux Kernel Report committed_files\test-status_002.png

Commit 4d8253119c was reported to have of aim7.jobs-per-min when comparing to v4.19-rc2.

 

Correlated commits

4d8253119c

f2fs: checkpoint disabling

branch

f2fs/dev

report

[LKP] [f2fs] 4d8253119c: aim7.jobs-per-min -11.0% regression

status

Not merged yet

  1. blogbench.write_score

Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server. It stresses the filesystem with multiple threads performing random reads, writes and rewrites in order to get a realistic idea of the scalability and the concurrency a system can handle.

  1. scenario: stress write test on btrfs

 

\Users\rongch2\Private\ml-report_files\html\0-Day CI Linux Kernel Report committed_files\test-status_016.png

Commit e329073364 was reported to have 84.7% improvement of blogbench.write_score when comparing to v4.19-rc7.

 

Correlated commits

e329073364

Btrfs: kill btrfs_clear_path_blocking

branch

linux-next/master

report

[LKP] [Btrfs] e329073364: blogbench.write_score 84.7% improvement

status

Not merged yet

 

  1. netperf.Throughput_Mbps

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

  1. scenario: ipv4 TCP STREAM test in local

\Users\rongch2\Private\ml-report_files\html\0-Day CI Linux Kernel Report committed_files\test-status_007.png

Commit a337531b94 was reported to have -6.1% regression of netperf.Throughput_Mbps when comparing to v4.19-rc5.

 

Correlated commits

a337531b94

tcp: up initial rmem to 128KB and SYN rwin to around 64KB

branch

net-next/master

report

[LKP] [tcp] a337531b94: netperf.Throughput_Mbps -6.1% regression

status

Not merged yet

 

  1. will-it-scale.per_thread_ops

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

  1. scenario: thread unlink2

\Users\rongch2\Private\ml-report_files\html\0-Day CI Linux Kernel Report committed_files\test-status.png

Commit 60f7ed8c7c was reported to have -5.9% regression of will-it-scale.per_thread_ops when comparing to v4.19-rc2.

 

Correlated commits

60f7ed8c7c

fsnotify: send path type events to group with super block marks

branch

linux-next/master

report

[LKP] [fsnotify] 60f7ed8c7c: will-it-scale.per_thread_ops -5.9% regression

status

Not merged yet

 

  1. Report Summary

0-Day CI had reported 8 performance regressions and 10 improvements by doing shift-left testing on developer and maintainer repos.

 

Test Indicator

Mail

Test Scenario

Test Machine

Status

aim7.jobs-per-min

[LKP] [f2fs] 4d8253119c: -11.0% regression

disk: 1BRD_48G

fs: f2fs

test: disk_rw

load: 3000

cpufreq_governor: performance

lkp-ivb-ep01

Currently not merged yet, no response from author yet

blogbench.read_score

[Btrfs] 5239834016: -7.2% regression

disk: 1SSD

fs: btrfs

ucode: 0xb00002e

cpufreq_governor: performance

lkp-bdw-ep3b

Currently not merged,but  the author thinks the patch is acceptable.

blogbench.write_score

[LKP] [Btrfs] e329073364: 84.7% improvement

disk: 1SSD

fs: btrfs

ucode: 0xb00002e

cpufreq_governor: performance

lkp-bdw-ep3b

Currently not merged yet

fio.latency_100ms%

[LKP] [sched/numa] 8d8968f954: 62.9% improvement

disk: 2pmem

fs: ext4

mount_option: dax

runtime: 200s

nr_task: 50%

time_based: tb

rw: rw

bs: 2M

ioengine: libaio

test_size: 200G

ucode: 0x3d

cpufreq_governor: performance

lkp-hsw-ep6

Currently not merged yet

fsmark.files_per_sec

[LKP] [mm] 172b06c32b: -2.0% regression

iterations: 1x

nr_threads: 64t

disk: 1BRD_48G

fs: btrfs

fs2: nfsv4

filesize: 4M

test_size: 40G

sync_method: fsyncBeforeClose

ucode: 0x42d

cpufreq_governor: performance

ivb44

Merged at v4.19-rc5, no response from author yet

netperf.Throughput_Mbps

[LKP] [tcp] a337531b94: -6.1% regression

ip: ipv4

runtime: 900s

nr_threads: 200%

cluster: cs-localhost

test: TCP_STREAM

ucode: 0x7000013

cpufreq_governor: performance

lkp-bdw-de1

This regression was fixed by “tcp: start receiver buffer

autotuning sooner”

netperf.Throughput_Mbps

[LKP] [net/sock] b99259a614: -6.6% regression

ip: ipv4

runtime: 300s

nr_threads: 200%

cluster: cs-localhost

send_size: 5K

test: TCP_SENDFILE

ucode: 0x25

cpufreq_governor: performance

lkp-hsw-d01

Currently not merged yet, no response from author yet

pbzip2.throughput

[LKP] [sched/numa] ec247fe1e0: 2.0% improvement

nr_threads: 25%

blocksize: 900K

ucode: 0xb00002e

cpufreq_governor: performance

lkp-bdw-ep3b

Currently not merged yet

perf-bench-numa-mem.GB_per_thread

[LKP] [mm, sched/numa] efaffc5e40: 38.7% improvement

nr_threads: 2t

mem_proc: 300M

cpufreq_governor: performance

ucode: 0x42d

ivb44

Merged at v4.19-rc7

unixbench.score

[LKP] [x86/mm/tlb] 5462bc3a9a: 7.0% improvement

runtime: 300s

nr_task: 1

test: context1

ucode: 0x20

cpufreq_governor: performance

lkp-ivb-d01

Currently not merged yet

vm-scalability.median

[LKP] [mm, huge page] c9f4cd7138: 6.1% improvement

runtime: 300s

size: 8T

test: anon-cow-seq

cpufreq_governor: performance

lkp-skl-2sp2

Merged at v4.19-rc1

vm-scalability.throughput

[LKP] [mm/page_alloc.c] bd46e9ba39: -7.6% regression

nr_threads: 100%

blocksize: 128K

ucode: 0x200004d

cpufreq_governor: performance

lkp-skl-2sp3

Currently not merged yet, no response from author yet

will-it-scale.per_process_ops

[LKP] [cpuidle] 23e8ceb9ce: 8.9% improvement

nr_task: 16

mode: process

test: context_switch1

cpufreq_governor: performance

lkp-bdw-ep3d

Currently not merged yet

will-it-scale.per_process_ops

[LKP] [x86/xen] f030aade91: 5.6% improvement

nr_task: 100%

mode: process

test: poll2

ucode: 0x20

cpufreq_governor: performance

lkp-ivb-d01

Currently not merged yet

will-it-scale.per_process_ops

[LKP] [mm/swap] d884021f52: -2.4% regression

nr_task: 50%

mode: process

test: brk1

cpufreq_governor: performance

lkp-bdw-ep3d

It is not merged and the author is working on the fix.

will-it-scale.per_thread_ops

[LKP] [x86/pti/64] de87ae2926: 3.8% improvement

nr_task: 16

mode: thread

test: futex3

cpufreq_governor: performance

lkp-bdw-ep3d

Currently not merged yet

will-it-scale.per_thread_ops

[LKP] [fsnotify] 60f7ed8c7c: -5.9% regression

nr_task: 16

mode: thread

test: unlink2

cpufreq_governor: performance

lkp-bdw-ep3d

Currently not merged yet, no response from author yet

will-it-scale.per_thread_ops

[LKP] [x86/pti/64] bf904d2762: 1.7% improvement

nr_task: 16

mode: thread

test: pwrite1

cpufreq_governor: performance

lkp-bdw-ep3d

Currently not merged yet

 

  1. Test Machines

  1. IVB Desktop

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

8

memory

16G

 

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

4

memory

8G

 

  1. SKL Desktop

model

Skylake

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

80

memory

64G

 

  1. BDW EP

model

Broadwell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu number

88

memory

128G

 

  1. HSW EP

model

Haswell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu number

72

memory

128G

 

  1. IVB EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

cpu number

40

memory

384G

 

model

Ivytown Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

48

memory

64G

 

  1. HSX EX

model

Brickland Haswell-EX

brand

Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

cpu number

144

memory

512G