Sorry, you need to enable JavaScript to visit this website.

Feedback

Your feedback is important to keep improving our website and offer you a more reliable experience.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-Day CI Linux Kernel Performance Report (v4.14)

BY Xiaolong Ye ON Nov 29, 2017
  1. Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, regressions and improvements in upstream tree during v4.14 release cycle

  • Section 3, regressions and improvements from shift-left testing in developers’ and maintainers’ tree during v4.14 release cycle

  • Section 4, test machine list


  1. Linux Kernel v4.14 Release

The v4.14 release of the Linux kernel was on Nov 12 2017. Some of the most prominent features in this release include the ORC unwinder for more reliable tracebacks and live patching, the long-awaited thread mode for control groups, support for AMD's secure memory encryption, five-level page table support, a new zero-copy networking feature, the heterogeneous memory management subsystem, and so on. 0-Day CI observed 5 major regressions and 4 major improvements in the upstream tree during feature development phase for v4.14.


Below we share more detailed information together with related patches that led to the result for a few reports. The whole list is summarized in the observation summary section.

  1. aim7.jobs-per-min

aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

  1. scenario: disk read (-26.6% regression)


Commit 91f9943e1c was reported to have -26.6% regression of aim7.jobs-per-min when comparing to v4.13-rc7." It was merged to mainline at v4.14-rc1." The author responded to the regression report and provided the fix patch immediately which was merged as commit 942491c9 (xfs: fix AIM7 regression) in v4.14-rc7.


Related commits

91f9943e1c

fs: support RWF_NOWAIT for buffered reads

branch

linux/master

report

[LKP] [lkp-robot] [fs] 91f9943e1c: aim7.jobs-per-min -26.6% regression

status

merged at v4.14-rc1 and fixed by commit 942491c9 (xfs: fix AIM7 regression) in v4.14-rc7


  1. fio.read_bw_MBps

Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.

  1. scenario: rand read (+443.8% improvement)



Commit 486aff5e04 was reported to have +443.9% improvement of fio.read_bw_MBps when comparing to v4.13-rc6 and it was merged to mainline since v4.14-rc1.


Related commit

486aff5e04

xfs: perform dax_device lookup at mount

branch

linux/master

report

[LKP] [lkp-robot] [xfs] 486aff5e04: fio.read_bw_MBps +443.9% improvement

status

merged at v4.14-rc1


  1. stress-ng.lockofd.ops_per_sec

stress-ng is a tool to load and stress a computer system, it will stress a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer  as well as the various operating system kernel interfaces. stress-ng also has a wide range of CPU specific stress tests that exercise floating point, integer, bit manipulation and control flow.

  1. scenario: lockofd (-11.0% regression)

Commit 52306e882f was reported to have -11% regression of stress-ng.lockofd.ops_per_sec when comparing to v4.13-rc1. It was merged to mainline at v4.14-rc1. As of the writing (v4.15 merge window), we do not find there is patch from upstream to fix it yet.


Related commit

52306e882f

fs/locks: Use allocation rather than the stack in fcntl_getlk()

branch

linux/master

report

[LKP] [lkp-robot] [fs/locks] 52306e882f: stress-ng.lockofd.ops_per_sec -11% regression

status

merged at v4.14-rc1


  1. Will-it-scale.per_thread_ops

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.


  1. scenario: malloc1-performace (-16% regression)

Commit 9e52fc2b50 was reported to have -16% regression of will-it-scale.per_thread_ops when comparing to v4.13-rc6. It was merged to mainline at v4.14-rc1. As of the writing (v4.15 merge window), we do not find there is patch from upstream to fix it yet.


Related commit

9e52fc2b50

x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)

branch

linux/master

report

[LKP] [lkp-robot] [x86/mm] 9e52fc2b50: will-it-scale.per_thread_ops -16% regression

status

It was merged at v4.14-rc1. The author has been able to reproduce it and working on fixing it.


  1. Observation Summary

0-Day CI observed 5 regressions and 4 improvements from upstream trees during feature development phase for v4.14. The test was done from v4.14-rc1 release to v4.14 release while the related patches might be developed in the previous release cycle. The corresponding developers are still working on solving these regressions.


Test Indicator

Report

Test parameters

Development Base

Status

fio.read_bw_MBps

[xfs] 486aff5e04: fio.read_bw_MBps +443.9% improvement  

disk: 2pmem
fs: xfs
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: randread
bs: 4k
ioengine: libaio
test_size: 200G
cpufreq_governor: performance

v4.13-rc6

It is an improvement and was merged in v4.14-rc1.

vm-scalability.throughput

[mm] c79b57e462: vm-scalability.throughput 25% improvement  

runtime: 300s
size: 8T
test: anon-w-seq-mt
cpufreq_governor: performance

v4.13

It is an improvement and was merged in v4.14-rc1.

will-it-scale.per_thread_ops

[x86/mm] 9e52fc2b50: will-it-scale.per_thread_ops -16% regression

test: malloc1
cpufreq_governor: performance

v4.13-rc6

The bad commit was merged in v4.14-rc1. The author has been able to reproduce it and working on fixing

it.

netperf.Throughput_tps

[tcp] e7942d0633: netperf.Throughput_tps 2.5% improvement  t

ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: TCP_CRR
cpufreq_governor: performance

v4.13-rc1

It is an improvement and was merged in v4.14-rc1.

netperf.Throughput_tps

[sched/fair] 90001d67be: netperf.Throughput_tps -16.2% regression

ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: TCP_RR
cpufreq_governor: performance

v4.13-rc4

The bad commit was merged in v4.14-rc1. Fixed by commit d153b15344 (sched/core: Fix wake_affine() performance regression) which  was merged in v4.14-rc5.

stress-ng.lockofd.ops_per_sec

[fs/locks] 52306e882f: stress-ng.lockofd.ops_per_sec -11% regression

runtime: 300
thp_enabled: always
thp_defrag: always
nr_task: 8
nr_pmem: 1
test: swap-w-seq
cpufreq_governor: performance

v4.13-rc1

The bad commit was merged in v4.14-rc1.

We have queried the author for feedback, no response yet.

aim7.jobs-per-min

[fs] 91f9943e1c: aim7.jobs-per-min -26.6% regression

disk: 4BRD_12G
md: RAID1

fs: xfs
test: disk_rd

load: 9000 cpufreq_governor: performance

v4.13-rc7

The bad commit was merged in v4.14-rc1. It was fixed by commit 942491c9 (xfs: fix AIM7 regression) which was merged by v4.14-rc7.

fio.read_clat_90%_us

[dax] 91d25ba8a6: fio.read_clat_90%_us +120% regression

disk: 2pmem
fs: xfs
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: rw
bs: 4k
ioengine: mmap
test_size: 200G
cpufreq_governor: performance

v4.13

The bad commit was merged in v4.14-rc1.

No response from author yet.

hackbench.throughput

[sched/core] d153b15344: hackbench.throughput 13.8% improvement  

nr_threads: 50% mode: threads
ipc: pipe cpufreq_governor: performance

v4.14-rc4

It is an improvement and was merged in v4.14-rc.


  1. Shift-Left Testing

Beyond testing trees in upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce effort to fix them. We call it “shift-left” testing. During the v4.14 release cycle, 0-Day CI had reported 11 major performance regressions and 6 major improvement by doing shift-left testing on developers’ and maintainers’ trees. We will share more detailed information together with possible code changes that led to this result. The whole list is summarized at report summary section.


  1. Will-it-scale.per_thread_ops

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

  1. scenario: context switch

Commit c4c3c3c2d0 was reported to have -61.0% regression of will-it-scale.per_process_ops when comparing to v4.14-rc3."


Related commits

c4c3c3c2d0

x86/mm: Flush more aggressively in lazy TLB mode

branch

linux-review/Borislav-Petkov/x86-mm-Flush-more-aggressively-in-lazy-TLB-mode/20171011-115901

report

[LKP] [lkp-robot] [x86/mm] c4c3c3c2d0: will-it-scale.per_process_ops -61.0% regression

status

Community developer is working on this issue.


  1. scenario: page fault

Commit 60f2bbf7cf was reported to have -16% regression of will-it-scale.per_thread_ops when comparing to v4.14-rc4."


Related commits

60f2bbf7cf

x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping

branch

acpi/x86

report

[LKP] [lkp-robot] [x86/apic] 60f2bbf7cf: will-it-scale.per_thread_ops -16% regression

status

Not merged to mainline and no response from author at the time of this writing.


  1. Netperf.Throught_Mbps

Netperf is a benchmark that can be used to measure various aspect of networking performance.

We tested on multiple machines such as Broadwell EP server, during which we reported regressions on two test scenarios.

  1. scenario: brk1 test with 16 processes

This diagram shows commit 7674270022 leads to -19.3% regression comparing to v4.14-rc3 on a Broadwell EP server. The patch was tested during LKML review.


Related commits


7674270022

  1. mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem

branch

linux-review/Nadav-Amit/mm-migrate-prevent-racy-access-to-tlb_flush_pending/20170802-205715

report

[mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

status

Fixed by developer


  1. Unixbench.score

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system. We had tested it on multiple machines such as Ivybridge desktop and an Atom platform.

  1. scenario: test fsbuffer


Commit 18cadb9070 was reported to have -45.0% regression of unixbench.score when comparing to v4.14-rc2."


Related commits

18cadb9070

x86/paravirt: Convert native patch assembly code strings to macros

branch

jpoimboe/paravirt-alternatives

report

[LKP] [lkp-robot] [x86/paravirt] 18cadb9070: unixbench.score -45.0% regression

status

Not merged to mainline and no response from author at the time of this writing.


  1. fio.read_bw_MBps

Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.

  1. scenario: btrfs randwrite

Commit 79c55a72de was reported to have +120% improvement of fio.write_bw_MBps when comparing to v4.14-rc1."


Related commits

79c55a72de

Btrfs: kill the btree_inode

branch

josef-btrfs/new-kill-btree-inode

report

[LKP] [lkp-robot] [Btrfs] 79c55a72de: fio.write_bw_MBps +120% improvement

status

N/A (It is an improvement and status is not tracked).


  1. Report Summary

0-Day CI had reported 11 regressions and 6 improvement by doing shift-left testing on developers’ and maintainers’ trees during v4.14 release cycle.


Test Indicator

Report

Test Parameters

Test Machine

Status

aim7.jobs-per-min

[f2fs] cb09c151b6: aim7.jobs-per-min -3.7% regression

disk: 1BRD_48G
fs: f2fs
test: sync_disk_rw
load: 600
cpufreq_governor: performance

lkp-ivb-ep01

The author will drop it.

aim7.jobs-per-min

[btrfs] 159186b40c: aim7.jobs-per-min 36.7% improvement

disk: 4BRD_12G
md: RAID1
fs: btrfs
test: disk_cp
load: 1500
cpufreq_governor: performance

lkp-ivb-ep01

N/A

aim9.shell_rtns_1.ops_per_sec

[sched/fair] f207934fb7: aim9.shell_rtns_1.ops_per_sec -2.3% regression

testtime: 300s
test: shell_rtns_1
cpufreq_governor: performance

lkp-hsw-ep4

N/A (It is just a minor change measured by a micro benchmark)

ebizzy.throughput.per_thread.max

[sched/fair] 840c5abca4: ebizzy.throughput.per_thread.max 4.2% improvement  

nr_threads: 200%
iterations: 100x
duration: 10s
cpufreq_governor: performance

lkp-sb02

N/A

fio.read_bw_MBps

[NTB] 589b25b589: fio.read_bw_MBps -10.1% regression  

disk: 2pmem
fs: xfs
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: read
bs: 4k
ioengine: libaio
test_size: 200G
cpufreq_governor: performance

lkp-hsw-ep6

Not merged to mainline yet and no response from author yet.

fio.write_bw_MBps

[Btrfs] 79c55a72de: fio.write_bw_MBps +120% improvement

runtime: 300s
nr_task: 8t
disk: 1SSD
fs: btrfs
rw: randwrite
bs: 4k
ioengine: sync
test_size: 400g
cpufreq_governor: performance

lkp-bdw-ep3b

N/A

hackbench.throughput

[sched/fair] c7b5021681: hackbench.throughput -9.0% regression

nr_threads: 1600%
mode: process
ipc: socket
cpufreq_governor: performance

lkp-avoton3

The bad commit was merged by v4.15-rc1

hackbench.throughput

[sched/core] d153b15344: hackbench.throughput 13.8% improvement

nr_threads: 50%
mode: threads
ipc: pipe
cpufreq_governor: performance

lkp-denverton3

N/A

hackbench.throughput

[sched] 22523c8a8a: hackbench.throughput -9.6% regression

nr_threads: 50%
mode: process
ipc: pipe
cpufreq_governor: performance

lkp-bdw-ep3

Not merged to mainline yet and no response from author yet.

netperf.Throughput_Mbps

[sched/fair] 6c362d9657: netperf.Throughput_Mbps -10.1% regression

ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: TCP_SENDFILE
cpufreq_governor: performance

lkp-bdw-de1

Not merged to mainline yet and no response from author yet.

netperf.Throughput_Mbps

[EXPERIMENTAL] 72536b6862: netperf.Throughput_Mbps -6.0% regression

ip: ipv4
runtime: 900s
nr_threads: 1
cluster: cs-localhost
test: TCP_MAERTS
cpufreq_governor: performance

lkp-u410

Experimental patch, the author will drop it.

netperf.Throughput_Mbps

[sched] 0db9de2378: netperf.Throughput_Mbps 7.3% improvement

ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: TCP_SENDFILE
cpufreq_governor: performance

lkp-bdw-de1

N/A

netperf.Throughput_tps

[usb] 71abe46f55: netperf.Throughput_tps -4.3% regression

ip: ipv4
runtime: 300s
nr_threads: 25%
cluster: cs-localhost
test: TCP_CRR
cpufreq_governor: performance

lkp-ivb-d02

Not merged to mainline yet and no response from author yet.

unixbench.score

[x86/paravirt] 18cadb9070: unixbench.score -45.0% regression

runtime: 300s
nr_task: 100%
test: fsbuffer
cpufreq_governor: performance

lkp-bdw-de1

Not merged to mainline yet and no response from author yet.

vm-scalability.throughput

[mm] c79b57e462: vm-scalability.throughput 25% improvement

runtime: 300s
size: 8T
test: anon-w-seq-mt
cpufreq_governor: performance

lkp-bdw-ep2

N/A

will-it-scale.per_process_ops

[x86/mm] c4c3c3c2d0: will-it-scale.per_process_ops -61.0% regression  

nr_task: 50%
mode: process
test: context_switch1
cpufreq_governor: performance

lkp-bdw-ep3d

Community developer is working on this issue.

will-it-scale.per_thread_ops

[x86/apic] 60f2bbf7cf: will-it-scale.per_thread_ops -16% regression

nr_task: 100%
mode: thread
test: page_fault2
cpufreq_governor: performance

lkp-skl-2sp2

Not merged to mainline yet and no response from author yet.


  1. Test Machines

    1. Ivybridge Desktop

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

model

Ivy Bridge

cpu

4 - 8

memory

4G - 16G


  1. Sandybridge Desktop

model

Sandy Bridge Desktop

brand

Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz

cpu

4

memory

4G

name

lkp-sb02


  1. Haswell Desktop

brand

Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

model

Haswell Desktop

cpu

8

memory

8G



  1. Broadwell EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

model

Broadwell-EP

cpu

88

memory

64G - 128G


  1. Broadwell DE

brand

Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz

model

Broadwell-DE

cpu

16

memory

8G


  1. Haswell EP

brand

Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

model

Haswell-EP

cpu

56

72

memory

256G

128G

name

lkp-hsw-ep5/6

lkp-hsw-ep4


  1. Ivybridge EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu

48

40

memory

64G

384G

name

ivb44

lkp-ivb-ep01


  1. Atom Platform

model

Atom

brand

Intel(R) Atom(R) CPU 3958 @ 2.00GHz

cpu

12 - 16

memory

64G


  1. Avoton Platform

model

Avoton

brand

Intel(R) Atom(TM) CPU C2750 @ 2.40GHz

cpu

8

memory

16G

name

lkp-avoton3


  1. Skylake Platform

model

Skylake-4S

Skylake

brand

Purley 4S Platform

 

cpu

192

112

memory

768G

64G

name

lkp-skl-4sp1

lkp-skl-2sp2