Sorry, you need to enable JavaScript to visit this website.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-Day CI Linux Kernel Performance Report (v4.16)

BY Xiaolong Ye ON May 08, 2018

0-Day CI Linux Kernel Performance Report (v4.16)

  1. Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, merged regressions and improvements in v4.16 release

  • Section 3, captured regressions and improvements from shift-left testing during developers’ and maintainers’ tree during v4.16 release cycle

  • Section 4, test machine list

 

  1. Linux Kernel v4.16 Release

The v4.16 release of the Linux kernel was on April 1st. Some of the headline changes in this release include initial support for the Jailhouse hypervisor, the usercopy whitelisting hardening patches, and some improvements to the deadline scheduler. 0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 2 regressions and 0 improvement during feature development phase for v4.16. We will share more detailed information together with correlated patches that led to the result for a few reports, though the assessment is limited by the test coverage we have now. The whole list is summarized in the observation summary section.

  1. fsmark.files_per_sec

The fsmark benchmark tests synchronous write workloads. It can vary the number of files, directory depth, etc. It has detailed timings for reads, writes, unlinks and fsyncs that make it good for simulating mail servers and other setups.

  1. scenario:  synchronous write workloads

 

Commit c4f24df942 was reported to have -13.2% regression of fsmark.files_per_sec when comparing to v4.16-rc4. It was merged to mainline at v4.16-rc6. The patch is a bug fix to NFS.  As explained by the author, the performance impact is expected as a function fix.

 

Correlated commits

c4f24df942

NFS: Fix unstable write completion

branch

linux/master

report

[LKP] [lkp-robot] [NFS] c4f24df942: fsmark.files_per_sec -13.2% regression

status

merged at v4.16-rc6

 

  1. blogbench.write_score

Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server. It stresses the filesystem with multiple threads performing random reads, writes and rewrites in order to get a realistic idea of the scalability and the concurrency a system can handle.

  1. scenario:  stress write test on btrfs

 

Commit 9092c71bb7 was reported to have -12.3% regression of blogbench.write_score when comparing to v4.15. It was merged to mainline at v4.16-rc1. We are working in progress to root cause this issue with community.

 

Correlated commits

9092c71bb7

mm: use sc->priority for slab shrink targets

branch

linux/master

report

[lkp-robot] [mm] 9092c71bb7: blogbench.write_score -12.3% regression

status

merged at v4.16-rc1


 

  1. Observation Summary

0-Day CI observed 1 regression and 0 improvements during feature development phase for v4.16, which is in the time frame from v4.16-rc1 release to v4.16 release.

Test Indicator

Report

Test Scenario

Development Base

Status

fsmark.files_per_sec

c4f24df942: -13.2% regression

iterations: 1x

nr_threads: 1t

disk: 1BRD_48G

fs: xfs

fs2: nfsv4

filesize: 4M

test_size: 40G

sync_method: fsyncBeforeClose

cpufreq_governor: performance

v4.16-rc4

Merged in v4.16-rc6.

blogbench.write_score

[mm] 9092c71bb7: blogbench.write_score -12.3% regression

disk: 1SSD

fs: btrfs

cpufreq_governor: performance


 

v4.15

Merged in v4.16-rc1

 

  1. Shift-Left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce effort to fix them. We call it “shift-left” testing. During the v4.16 release cycle, 0-Day CI had reported 2 major performance regressions and 5 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized at report summary section.

  1. aim7.jobs-per-min

aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

  1. scenario: sync_disk_rw


 

Commit 84b89e5d94 was reported to have 91.4% improvement of aim7.jobs-per-min when comparing to v4.16-rc2. The patch was finally merged to mainline v4.17-rc1.

 

Correlated commits

84b89e5d94

f2fs: add auto tuning for small devices

branch

f2fs/dev-test

report

[LKP] [lkp-robot] [f2fs] 84b89e5d94: aim7.jobs-per-min 91.4% improvement

status

merged at v4.17-rc1

 

  1. blogbench.write_score

Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server. It stresses the filesystem with multiple threads performing random reads, writes and rewrites in order to get a realistic idea of the scalability and the concurrency a system can handle.

  1. scenario: random read/write with multiple threads

 

Commit 19957a1816 was reported to have 5.2% improvement of blogbench.write_score when comparing to v4.16-rc5. It was merged to mainline at v4.17-rc1.

 

Correlated commits

19957a1816

xfs: Correctly invert xfs_buftarg LRU isolation logic

branch

xfs-linux/xfs-4.17-merge

report

[LKP] [lkp-robot] [xfs] 19957a1816: blogbench.write_score 5.2% improvement

status

merged at v4.17-rc1


 

  1. netperf.Throughput_Mbps

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

  1. scenario: SCTP_STREAM_MANY

 

Commit adff52ea43 was reported to have +5.5% improvement of netperf.Throughput_Mbps when comparing to v4.16-rc7."

 

Correlated commits

adff52ea43

sched: idle: Select idle state before stopping the tick

branch

pm/idle-tick-v8

report

[LKP] [lkp-robot] [sched] adff52ea43: netperf.Throughput_Mbps +5.5% improvement

status

Currently not merged

 

  1. pxz.throughput

Parallel XZ is a compression utility that takes advantage of running LZMA (Lempel–Ziv–Markov chain algorithm) compression of different parts of an input file on multiple cores and processors simultaneously. Its primary goal is to utilize all resources to speed up compression time with minimal possible influence on compression ratio.

  1. scenario: LZMA compression

 

Commit 082f764a2f was reported to have 7.6% improvement of pxz.throughput when comparing to v4.16-rc2." It was merged to mainline at v4.17-rc1."

 

Correlated commits

082f764a2f

sched/fair: Do not migrate on wake_affine_weight() if weights are equal

branch

linux-next/master

report

[LKP] [lkp-robot] [sched/fair] 082f764a2f: pxz.throughput 7.6% improvement

status

merged at v4.17-rc1

 

  1. vm-scalability.throughput

vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. We tested on multiple machines such as HSW EP server, during which we reported improvement on one test scenario.

 

  1. scenario: swap-w-seq-mt

 

Commit 4405c5fd84 was reported to have +26.5% improvement of vm-scalability.throughput when comparing to v4.16-rc6."

 

Correlated commits

4405c5fd84

mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE

branch

linux-next/master

report

[LKP] [lkp-robot] [mm/cma] 4405c5fd84: vm-scalability.throughput +26.5% improvement

status

Currently not merged

 

  1. will-it-scale.per_process_ops

will-it-scale runs a test case from 1 through to n parallel copies to see if the test case will scale. We tested on multiple machines such as BDW EP server, during which we reported regressions on two test scenarios.

 

  1. scenario: pread2

 

Commit 6b2fa2b32a was reported to have -34.9% regression of will-it-scale.per_process_ops when comparing to v4.16-rc5."

 

Correlated commits

6b2fa2b32a

sched: idle: Select idle state before stopping the tick

branch

pm/idle-tick

report

[LKP] [lkp-robot] [sched] 6b2fa2b32a: will-it-scale.per_process_ops -34.9% regression

status

Currently not merged

 

  1. scenario: page_fault3

 

Commit 4b8fc7416c was reported to have -6.9% regression of will-it-scale.per_process_ops when comparing to v4.16-rc6."

 

Correlated commits

4b8fc7416c

mm: memcontrol: Use cgroup_rstat for stat accounting

branch

cgroup/review-memcg-swap.events

report

[LKP] [lkp-robot] [mm] 4b8fc7416c: will-it-scale.per_process_ops -6.9% regression

status

Currently not merged

 

  1. Report Summary

0-Day CI had reported 3 performance regressions and 5 improvements by doing shift-left testing on developer and maintainer repos.

 

Test Indicator

Report

Test Scenario

Test Machine

Status

aim7.jobs-per-min

84b89e5d94: 91.4% improvement

disk: 4BRD_12G

md: RAID1

fs: f2fs

test: sync_disk_rw

load: 600

cpufreq_governor: performance

lkp-ivb-ep01

 

Merged at v4.17-rc1

blogbench.write_score

19957a1816: 5.2% improvement

disk: 1SSD

fs: xfs          

cpufreq_governor: performance

lkp-bdw-de1

Merged at v4.17-rc1

netperf.Throughput_Mbps

adff52ea43: +5.5% improvement

ip: ipv4

runtime: 300s

nr_threads: 25%

cluster: cs-localhost

send_size: 10K

test: SCTP_STREAM_MANY

cpufreq_governor: performance

lkp-bdw-ex2

merged at v4.17-rc1

pxz.throughput

082f764a2f: 7.6% improvement

nr_threads: 25%

cpufreq_governor: performance

lkp-bdw-ep3

merged at v4.17-rc1

vm-scalability.throughput

4405c5fd84: +26.5% improvement

runtime: 300

thp_enabled: never

thp_defrag: always

nr_task: 8

nr_pmem: 4

priority: 1

test: swap-w-seq-mt

cpufreq_governor: performance

lkp-hsw-ep2

Currently not merged

will-it-scale.per_process_ops

6b2fa2b32a: -34.9% regression

nr_task: 100%

mode: process

test: pread2

cpufreq_governor: performance

lkp-skl-4sp1

Currently not merged

will-it-scale.per_process_ops

4b8fc7416c: -6.9% regression

test: page_fault3

cpufreq_governor: performance

lkp-ivb-d04

Currently not merged

  1. Test Machines

  1. IVB Desktop

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu

4

memory

4G

 

  1. BDW EP

model

Broadwell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu

88

memory

64G

 

  1. BDW EX

model

Broadwell-EX

brand

Intel(R) Xeon(R) CPU E7-8890 v4 @ 2.20GHz

cpu

16

memory

256G

 

  1. BDW DE

model

Broadwell-DE

brand

Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz

cpu

16

memory

8G

 

  1. HSW EP

model

Haswell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu

72

memory

128G

 

  1. IVB EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

cpu

40

memory

384G

 

  1. SKL Platform

model

Skylake-4S

brand

 

cpu

192

memory

768G