0-Day CI Linux Kernel Performance Report (v4.17)
-
Introduction
0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:
-
Section 2, merged regressions and improvements in v4.17 release candidates
-
Section 3, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v4.17 release cycle
-
Section 4, test machine list
-
Linux Kernel v4.17 Release
The v4.17 release of the Linux kernel was on June 3rd. Headline features in this release include improved load estimation in the CPU scheduler, raw BPF tracepoints, lazytime support in the XFS filesystem, full in-kernel TLS protocol support, histogram triggers for tracing, and the removal of support for eight processor architectures. 0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 5 regressions and 5 improvements during feature development phase for v4.17. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage we have now. The list is summarized in the observation summary section.
-
will-it-scale.per_process_ops
will-it-scale runs a test case from 1 to n parallel copies to see if the test case will scale. It builds both a process and threads based test in order to see any differences between the two.
-
scenario: test page fault in process mode
Commit e27be240df was reported to have -27.2% regression of will-it-scale.per_process_ops when comparing to v4.16." It was merged to mainline at v4.17-rc1." Aaron Lu worked on this and found that the commit itself looks innocent enough as it merely changed some event
counting mechanism and this test didn't trigger those events at all, it's likely due to the changed layout of 'struct mem_cgroup' that either make stat_cpu falling into a constantly modifying cache line or some hot fields stop being in the same cache line. Aaron has submitted a fix patch to community which can restore the performance.
Correlated commits
e27be240df |
mm: memcg: make sure memory.events is up-to-date when waking pollers |
branch |
linux/master |
report |
[LKP] [lkp-robot] [mm] e27be240df: will-it-scale.per_process_ops -27.2% regression |
status |
The bad patch was merged at v4.17-rc1, fixing patch is now landed in Linus’ tree, it’s expected to show in v4.18-rc1. |
-
netperf.Throughput_Mbps
Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.
-
scenario: ipv4 SCTP STREAM test in localhost
Commit a37b969a61 was reported to have -25.3% regression of netperf.Throughput_Mbps when comparing to v4.16-rc4." It was merged to mainline at v4.17-rc1." and performance was recovered after v4.17-rc2.
Correlated commits
a37b969a61 |
cpuidle: poll_state: Add time limit to poll_idle() |
branch |
pm/idle-poll |
report |
[LKP] [lkp-robot] [cpuidle] a37b969a61: netperf.Throughput_Mbps -25.3% regression |
status |
The bad patch was merged at v4.17-rc1 and fixed in v4.17-rc2. |
-
aim7.jobs-per-min
aim7 is a traditional UNIX system level benchmark suite that is used to test and measure the performance of a multiuser system.
-
Scenario: f2fs-sync_disk_rw
Commit 84b89e5d94 was reported to have 91.4% improvement of aim7.jobs-per-min when comparing to v4.16-rc2." It was merged to mainline at v4.17-rc1."
Correlated commits
84b89e5d94 |
f2fs: add auto tuning for small devices |
branch |
f2fs/dev-test |
report |
[LKP] [lkp-robot] [f2fs] 84b89e5d94: aim7.jobs-per-min 91.4% improvement |
status |
merged at v4.17-rc1 |
-
unixbench.score
UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.
-
scenario: execl test
Commit d519329f72 was reported to have -9.9% regression of unixbench.score when comparing to v4.16-rc6." It was merged to mainline at v4.17-rc1." The author is addressing the regression.
Correlated commits
d519329f72 |
sched/fair: Update util_est only on util_avg updates |
branch |
linux-next/master |
report |
[LKP] [lkp-robot] [sched/fair] d519329f72: unixbench.score -9.9% regression |
status |
The bad patch was merged at v4.17-rc1 and the author is in progress to fix it. |
-
Observation Summary
0-Day CI observed 5 regressions and 5 improvements during feature development phase for v4.17, which is in the time frame from v4.17-rc1 to v4.17 release.
Test Indicator |
Report |
Test Scenario |
Development Base |
Status |
aim7.jobs-per-min |
disk: 4BRD_12G md: RAID1 fs: f2fs test: sync_disk_rw load: 600 cpufreq_governor: performance |
V4.16-rc5 |
merged at v4.17-rc1 |
|
aim9.disk_rr.ops_per_sec |
[tracing/x86] 1c758a2202: aim9.disk_rr.ops_per_sec -12.0% regression |
testtime: 300s test: disk_rr cpufreq_governor: performance |
v4.17-rc6 |
The patch was merged at v4.17-rc3, and it’s not a real issue, the overhead is introduced by syscall trace events monitor. |
blogbench.write_score |
disk: 1SSD fs: xfs cpufreq_governor: performance |
V4.16-rc5 |
merged at v4.17-rc1 |
|
netperf.Throughput_Mbps |
[cpuidle] a37b969a61: netperf.Throughput_Mbps -25.3% regression |
ip: ipv4 runtime: 300s nr_threads: 1 cluster: cs-localhost send_size: 10K test: SCTP_STREAM_MANY cpufreq_governor: performance |
V4.16-rc7 |
The bad patch was merged at v4.17-rc1, the performance drop was due to the overhead of local_clock, fixed in v4.17-rc2. |
pxz.throughput |
nr_threads: 25% cpufreq_governor: performance |
V4.17-rc5 |
The bad patch was merged at v4.17-rc5, the author is in progress to fix it. |
|
pxz.throughput |
nr_threads: 25% cpufreq_governor: performance |
V4.16-rc4 |
merged at v4.17-rc1 |
|
stress-ng.open.ops |
nr_threads: 100% testtime: 1s class: filesystem cpufreq_governor: performance |
V4.16 |
merged at v4.17-rc1 |
|
unixbench.score |
runtime: 300s nr_task: 100% test: execl |
V4.16-rc7 |
The bad patch was merged at v4.17-rc1 and the author is working on it. |
|
vm-scalability.throughput |
[sched/numa] 7347fc87df: vm-scalability.throughput +45.6% improvement |
runtime: 300s size: 8T test: anon-cow-seq cpufreq_governor: performance |
V4.16-rc5 |
merged at v4.17-rc1 |
will-it-scale.per_process_ops |
[mm] e27be240df: will-it-scale.per_process_ops -27.2% regression |
nr_task: 100% mode: process test: page_fault3 cpufreq_governor: performance |
V4.17-rc1 |
The bad patch was merged at v4.17-rc1 and the fix patch has been landed in Linus’ tree. |
-
Shift-Left Testing
Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v4.17 release cycle, 0-Day CI had reported 10 major performance regressions and 5 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized at report summary section.
-
aim7.jobs-per-min
aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.
-
scenario: creat-clo test on btrfs
Commit 7d3c8e53f5 was reported to have 57.9% improvement of aim7.jobs-per-min when comparing to v4.17-rc4."
Correlated commits
7d3c8e53f5 |
Btrfs: stop creating orphan items for truncate |
branch |
linux-review/Omar-Sandoval/Btrfs-update-stale-comments-referencing-vmtruncate/20180512-153554 |
report |
[LKP] [lkp-robot] [Btrfs] 7d3c8e53f5: aim7.jobs-per-min 57.9% improvement |
status |
Not merged yet |
-
fsmark.files_per_sec
The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.
-
scenario: no synchronous write on xfs
Commit 499c0a9aec was reported to have +6.6% improvement of fsmark.files_per_sec when comparing to v4.17-rc4."
Correlated commits
499c0a9aec |
xfs: allow writeback on pages without buffer heads |
branch |
hch-xfs/xfs-remove-bufferheads.3 |
report |
[LKP] [lkp-robot] [xfs] 499c0a9aec: fsmark.files_per_sec +6.6% improvement |
status |
Not merged yet |
-
stress-ng.copy-file.ops_per_sec
stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
-
scenario: 100%-1s-filesystem-performance
Commit 9cb0698bda was reported to have -97.9% regression of stress-ng.copy-file.ops_per_sec when comparing to v4.17-rc3."
Correlated commits
9cb0698bda |
copy_file_range: splice with holes |
branch |
linux-review/Goldwyn-Rodrigues/Holey-splice-copy_file_range-with-holes/20180504-104538 |
report |
[LKP] [lkp-robot] [copy_file_range] 9cb0698bda: stress-ng.copy-file.ops_per_sec -97.9% regression |
status |
Not merged. |
-
unixbench.score
UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.
-
scenario: execl test
Commit f6f47c66af was reported to have -14.5% regression of unixbench.score when comparing to v4.17-rc2."
Correlated commits
f6f47c66af |
ACPI: ensure acpi_parse_entries_array() does not access non-existent table data |
branch |
linux-review/Al-Stone/mailbox-ACPI-Remove-incorrect-error-message-about-parsing-PCCT/20180426-020343 |
report |
[LKP] [lkp-robot] [ACPI] f6f47c66af: unixbench.score -14.5% regression |
status |
Not merged. |
-
Report Summary
0-Day CI had reported 8 performance regressions and 6 improvements by doing shift-left testing on developer and maintainer repos.
Test Indicator |
|
Test Scenario |
Test Machine |
aim7.jobs-per-min |
disk: 4BRD_12G |
lkp-ivb-ep01 |
|
aim7.jobs-per-min |
disk: 4BRD_12G |
lkp-ivb-ep01 |
|
aim7.jobs-per-min |
disk: 1BRD_48G |
lkp-ivb-ep01 |
|
fsmark.files_per_sec |
iterations: 8 |
lkp-hsw-ep4 |
|
fsmark.files_per_sec |
iterations: 1x |
ivb44 |
|
pxz.throughput |
nr_threads: 25% |
lkp-bdw-ep3 |
|
stress-ng.chdir.ops_per_sec |
[Linux messages full of `random] 125bac9e15: stress-ng.chdir.ops_per_sec 38.8% improvement |
nr_threads: 100% |
lkp-bdw-ep6 |
stress-ng.clone.ops_per_sec |
[netns] a3498436b3: stress-ng.clone.ops_per_sec 77.5% improvement |
nr_threads: 100% |
lkp-bdw-ep6 |
stress-ng.copy-file.ops_per_sec |
[copy_file_range] 9cb0698bda: stress-ng.copy-file.ops_per_sec -97.9% regression |
nr_threads: 100% |
lkp-bdw-ep6 |
unixbench.score |
runtime: 300s |
lkp-ivb-d01 |
|
vm-scalability.throughput |
[mm/cma] a57a290bd3: vm-scalability.throughput -15.5% regression |
runtime: 300s |
lkp-bdw-ep2 |
vm-scalability.throughput |
[mm/can_skip_merge()] 0314b5363f: vm-scalability.throughput -3.9% regression |
runtime: 300s |
lkp-bdw-ep2 |
vm-scalability.throughput |
[mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement |
runtime: 300s |
lkp-hsx04 |
vm-scalability.throughput |
[LSM] acde387a6d: vm-scalability.throughput -58.0% regression |
runtime: 300s |
lkp-ivb-d02 |
-
Test Machines
-
IVB Desktop
model |
Ivy Bridge |
brand |
Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz |
cpu number |
8 |
memory |
16G |
model |
Ivy Bridge |
brand |
Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz |
cpu number |
4 |
memory |
8G |
-
BDW EP
model |
Broadwell-EP |
brand |
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz |
cpu number |
88 |
memory |
128G |
-
HSW EP
model |
Haswell-EP |
brand |
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz |
cpu number |
72 |
memory |
128G |
-
IVB EP
model |
Ivy Bridge-EP |
brand |
Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz |
cpu number |
40 |
memory |
384G |
model |
Ivytown Ivy Bridge-EP |
brand |
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz |
cpu number |
48 |
memory |
64G |
-
HSX EX
model |
Brickland Haswell-EX |
brand |
Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz |
cpu number |
144 |
memory |
512G |