Feedback

Your feedback is important to keep improving our website and offer you a more reliable experience.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-Day CI Linux Kernel Performance Report (v4.17)

BY Xiaolong Ye ON Jun 18, 2018
  1. Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, merged regressions and improvements in v4.17 release candidates

  • Section 3, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v4.17 release cycle

  • Section 4, test machine list

 

  1. Linux Kernel v4.17 Release

The v4.17 release of the Linux kernel was on June 3rd. Headline features in this release include improved load estimation in the CPU scheduler, raw BPF tracepoints, lazytime support in the XFS filesystem, full in-kernel TLS protocol support, histogram triggers for tracing, and the removal of support for eight processor architectures. 0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 5 regressions and 5 improvements during feature development phase for v4.17. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage we have now. The list is summarized in the observation summary section.

  1. will-it-scale.per_process_ops

will-it-scale runs a test case from 1 to n parallel copies to see if the test case will scale. It builds both a process and threads based test in order to see any differences between the two.

  1. scenario: test page fault in process mode

 

Commit e27be240df was reported to have -27.2% regression of will-it-scale.per_process_ops when comparing to v4.16." It was merged to mainline at v4.17-rc1." Aaron Lu worked on this and found that the commit itself looks innocent enough as it merely changed some event

counting mechanism and this test didn't trigger those events at all, it's likely due to the changed layout of 'struct mem_cgroup' that either make stat_cpu falling into a constantly modifying cache line or some hot fields stop being in the same cache line. Aaron has submitted a fix patch to community which can restore the performance.

 

Correlated commits

e27be240df

mm: memcg: make sure memory.events is up-to-date when waking pollers

branch

linux/master

report

[LKP] [lkp-robot] [mm] e27be240df: will-it-scale.per_process_ops -27.2% regression

status

The bad patch was merged at v4.17-rc1, fixing patch is now landed in Linus’ tree, it’s expected to show in v4.18-rc1.

 

  1. netperf.Throughput_Mbps

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.


 

  1. scenario: ipv4 SCTP STREAM test in localhost

 

Commit a37b969a61 was reported to have -25.3% regression of netperf.Throughput_Mbps when comparing to v4.16-rc4." It was merged to mainline at v4.17-rc1." and performance was recovered after v4.17-rc2.

 

Correlated commits

a37b969a61

cpuidle: poll_state: Add time limit to poll_idle()

branch

pm/idle-poll

report

[LKP] [lkp-robot] [cpuidle] a37b969a61: netperf.Throughput_Mbps -25.3% regression

status

The bad patch was merged at v4.17-rc1 and fixed in v4.17-rc2.

 

  1. aim7.jobs-per-min

aim7 is a traditional UNIX system level benchmark suite that is used to test and measure the performance of a multiuser system.

  1. Scenario: f2fs-sync_disk_rw

 

Commit 84b89e5d94 was reported to have 91.4% improvement of aim7.jobs-per-min when comparing to v4.16-rc2." It was merged to mainline at v4.17-rc1."

 

Correlated commits

84b89e5d94

f2fs: add auto tuning for small devices

branch

f2fs/dev-test

report

[LKP] [lkp-robot] [f2fs] 84b89e5d94: aim7.jobs-per-min 91.4% improvement

status

merged at v4.17-rc1

 

  1. unixbench.score

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

  1. scenario: execl test

 

Commit d519329f72 was reported to have -9.9% regression of unixbench.score when comparing to v4.16-rc6." It was merged to mainline at v4.17-rc1." The author is addressing the regression.

 

Correlated commits

d519329f72

sched/fair: Update util_est only on util_avg updates

branch

linux-next/master

report

[LKP] [lkp-robot] [sched/fair] d519329f72: unixbench.score -9.9% regression

status

The bad patch was merged at v4.17-rc1 and the author is in progress to fix it.


 

  1. Observation Summary

0-Day CI observed 5 regressions and 5 improvements during feature development phase for v4.17, which is in the time frame from v4.17-rc1 to v4.17 release.

Test Indicator

Report

Test Scenario

Development Base

Status

aim7.jobs-per-min

[f2fs] 84b89e5d94: aim7.jobs-per-min 91.4% improvement

disk: 4BRD_12G

md: RAID1

fs: f2fs

test: sync_disk_rw

load: 600

cpufreq_governor: performance

V4.16-rc5

merged at v4.17-rc1

aim9.disk_rr.ops_per_sec

[tracing/x86] 1c758a2202: aim9.disk_rr.ops_per_sec -12.0% regression

testtime: 300s

test: disk_rr

cpufreq_governor: performance

v4.17-rc6

The patch was merged at v4.17-rc3, and it’s not a real issue, the overhead is introduced by syscall trace events monitor.

blogbench.write_score

[xfs] 19957a1816: blogbench.write_score 5.2% improvement

disk: 1SSD

fs: xfs

cpufreq_governor: performance

V4.16-rc5

merged at v4.17-rc1

netperf.Throughput_Mbps

[cpuidle] a37b969a61: netperf.Throughput_Mbps -25.3% regression

ip: ipv4

runtime: 300s

nr_threads: 1

cluster: cs-localhost

send_size: 10K

test: SCTP_STREAM_MANY

cpufreq_governor: performance

V4.16-rc7

The bad patch was merged at v4.17-rc1, the performance drop was due to the overhead of local_clock, fixed in v4.17-rc2.

pxz.throughput

[sched/numa] 789ba28013: pxz.throughput -5.8% regression

nr_threads: 25%

cpufreq_governor: performance

V4.17-rc5

The bad patch was merged at v4.17-rc5, the author is in progress to fix it.

pxz.throughput

[sched/fair] 082f764a2f: pxz.throughput 7.6% improvement

nr_threads: 25%

cpufreq_governor: performance

V4.16-rc4

merged at v4.17-rc1

stress-ng.open.ops

f657a666fd: stress-ng.open.ops 160.4% improvement  

nr_threads: 100%

testtime: 1s

class: filesystem

cpufreq_governor: performance

V4.16

merged at v4.17-rc1

unixbench.score

[[sched/fair] d519329f72: unixbench.score -9.9% regression

runtime: 300s

nr_task: 100%

test: execl

V4.16-rc7

The bad patch was merged at v4.17-rc1 and the author is working on it.

vm-scalability.throughput

[sched/numa] 7347fc87df: vm-scalability.throughput +45.6% improvement

runtime: 300s

size: 8T

test: anon-cow-seq

cpufreq_governor: performance

V4.16-rc5

merged at v4.17-rc1

will-it-scale.per_process_ops

[mm] e27be240df: will-it-scale.per_process_ops -27.2% regression

nr_task: 100%

mode: process

test: page_fault3

cpufreq_governor: performance

V4.17-rc1

The bad patch was merged at v4.17-rc1 and the fix patch has been landed in Linus’ tree.

 

  1. Shift-Left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v4.17 release cycle, 0-Day CI had reported 10 major performance regressions and 5 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized at report summary section.

  1. aim7.jobs-per-min

aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

 

  1. scenario: creat-clo test on btrfs

 

Commit 7d3c8e53f5 was reported to have 57.9% improvement of aim7.jobs-per-min when comparing to v4.17-rc4."

 

Correlated commits

7d3c8e53f5

Btrfs: stop creating orphan items for truncate

branch

linux-review/Omar-Sandoval/Btrfs-update-stale-comments-referencing-vmtruncate/20180512-153554

report

[LKP] [lkp-robot] [Btrfs] 7d3c8e53f5: aim7.jobs-per-min 57.9% improvement

status

Not merged yet

 

  1. fsmark.files_per_sec

The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.

  1. scenario: no synchronous write on xfs

 

Commit 499c0a9aec was reported to have +6.6% improvement of fsmark.files_per_sec when comparing to v4.17-rc4."

 

Correlated commits

499c0a9aec

xfs: allow writeback on pages without buffer heads

branch

hch-xfs/xfs-remove-bufferheads.3

report

[LKP] [lkp-robot] [xfs] 499c0a9aec: fsmark.files_per_sec +6.6% improvement

status

Not merged yet

 

  1. stress-ng.copy-file.ops_per_sec

stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

  1. scenario: 100%-1s-filesystem-performance

\Users\xincheng\Desktop\test-report\test-report\v4.17-committed_release\stress-ng\stress-ng.copy-file.ops_per_sec\100P-1s-filesystem-performance\test-status.png

Commit 9cb0698bda was reported to have -97.9% regression of stress-ng.copy-file.ops_per_sec when comparing to v4.17-rc3."

 

Correlated commits

9cb0698bda

copy_file_range: splice with holes

branch

linux-review/Goldwyn-Rodrigues/Holey-splice-copy_file_range-with-holes/20180504-104538

report

[LKP] [lkp-robot] [copy_file_range] 9cb0698bda: stress-ng.copy-file.ops_per_sec -97.9% regression

status

Not merged.

 

  1. unixbench.score

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

  1. scenario: execl test

\Users\xincheng\Desktop\test-report\test-report\v4.17-committed_release\unixbench\unixbench.score\300s-100P-execl-performance\test-status.png

Commit f6f47c66af was reported to have -14.5% regression of unixbench.score when comparing to v4.17-rc2."

 

Correlated commits

f6f47c66af

ACPI: ensure acpi_parse_entries_array() does not access non-existent table data

branch

linux-review/Al-Stone/mailbox-ACPI-Remove-incorrect-error-message-about-parsing-PCCT/20180426-020343

report

[LKP] [lkp-robot] [ACPI] f6f47c66af: unixbench.score -14.5% regression

status

Not merged.

 

  1. Report Summary

0-Day CI had reported 8 performance regressions and 6 improvements by doing shift-left testing on developer and maintainer repos.

 

Test Indicator

Mail

Test Scenario

Test Machine

aim7.jobs-per-min

[MD] 0ffbb1adf8: aim7.jobs-per-min -10.6% regression

disk: 4BRD_12G
md: RAID1
fs: xfs
test: sync_disk_rw
load: 600
cpufreq_governor: performance

lkp-ivb-ep01

aim7.jobs-per-min

[MD] 683362afd2: aim7.jobs-per-min -13.3% regression

disk: 4BRD_12G
md: RAID0
fs: xfs
test: sync_disk_rw
load: 600
cpufreq_governor: performance

lkp-ivb-ep01

aim7.jobs-per-min

[Btrfs] 7d3c8e53f5: aim7.jobs-per-min 57.9% improvement

disk: 1BRD_48G
fs: btrfs
test: creat-clo
load: 4
cpufreq_governor: performance

lkp-ivb-ep01

fsmark.files_per_sec

[f2fs] c20dcf3ce2: fsmark.files_per_sec 4.8% improvement

iterations: 8
disk: 1SSD
nr_threads: 4
fs: f2fs
filesize: 8K
test_size: 72G
sync_method: fsyncBeforeClose
nr_directories: 16d
nr_files_per_directory: 256fpd
cpufreq_governor: performance

lkp-hsw-ep4

fsmark.files_per_sec

[xfs] 499c0a9aec: fsmark.files_per_sec +6.6% improvement

iterations: 1x
nr_threads: 1t
disk: 1BRD_48G
fs: xfs
filesize: 4M
test_size: 40G
sync_method: NoSync
cpufreq_governor: performance

ivb44

pxz.throughput

[sched/numa] 789ba28013: pxz.throughput -5.8% regression

nr_threads: 25%
cpufreq_governor: performance

lkp-bdw-ep3

stress-ng.chdir.ops_per_sec

[Linux messages full of `random] 125bac9e15: stress-ng.chdir.ops_per_sec 38.8% improvement  

nr_threads: 100%
testtime: 1s
class: os
cpufreq_governor: performance

lkp-bdw-ep6

stress-ng.clone.ops_per_sec

[netns] a3498436b3: stress-ng.clone.ops_per_sec 77.5% improvement

nr_threads: 100%
testtime: 1s
class: scheduler
cpufreq_governor: performance

lkp-bdw-ep6

stress-ng.copy-file.ops_per_sec

[copy_file_range] 9cb0698bda: stress-ng.copy-file.ops_per_sec -97.9% regression

nr_threads: 100%
testtime: 1s
class: filesystem
cpufreq_governor: performance

lkp-bdw-ep6

unixbench.score

[ACPI] f6f47c66af: unixbench.score -14.5% regression

runtime: 300s
nr_task: 100%
test: execl
cpufreq_governor: performance

lkp-ivb-d01

vm-scalability.throughput

[mm/cma] a57a290bd3: vm-scalability.throughput -15.5% regression

runtime: 300s
test: lru-file-readonce
cpufreq_governor: performance

lkp-bdw-ep2

vm-scalability.throughput

[mm/can_skip_merge()] 0314b5363f: vm-scalability.throughput -3.9% regression

runtime: 300s
test: lru-file-readonce
cpufreq_governor: performance

lkp-bdw-ep2

vm-scalability.throughput

[mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

runtime: 300s
size: 1T
test: lru-shm
cpufreq_governor: performance

lkp-hsx04

vm-scalability.throughput

[LSM] acde387a6d: vm-scalability.throughput -58.0% regression

runtime: 300s
size: 128G
test: truncate
cpufreq_governor: performance

lkp-ivb-d02

 

  1. Test Machines

  1. IVB Desktop

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

8

memory

16G

 

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

4

memory

8G

 

  1. BDW EP

model

Broadwell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu number

88

memory

128G

 

  1. HSW EP

model

Haswell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu number

72

memory

128G

 

  1. IVB EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

cpu number

40

memory

384G

 

model

Ivytown Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

48

memory

64G

 

  1. HSX EX

model

Brickland Haswell-EX

brand

Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

cpu number

144

memory

512G