Sorry, you need to enable JavaScript to visit this website.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-Day CI Linux Kernel Performance Report (v5.14)

BY Beibei Si ON Oct 09, 2021

1.            Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

       Section 2, test parameter description

       Section 3, merged regressions and improvements in v5.14 release candidates

       Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.14 release cycle

       Section 5, performance comparison among different kernel releases

       Section 6, test machine list

 

2.            Test parameters descriptions

Here are the descriptions for each parameter/field used in the tests.

 

Classification

Name

Description

General

runtime

Run the test case within a certain time period (seconds or minutes)

 

nr_task

If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1.

If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number

 

nr_threads

Alias of nr_task

 

iterations

Number to repeat this job

 

test_size

Test disk size or memory size

 

set_nic_irq_affinity

Set NIC interrupt affinity

 

disable_latency_stats

Latency_stats may introduce too much noise if there are too many context switches, allow to disable it

 

 

transparent_hugepage

Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage)

 

boot_params:bp1_memmap

Boot parameters of memmap

 

disk:nr_pmem

number of pmem partitions used by test

 

swap:priority

Priority means the  priority  of  the  swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value.

Test Machine

model

Name of Intel processor microarchitecture

 

brand

Brand name of cpu

 

cpu_number

Number of cpu

 

memory

Size of memory

 

3.            Linux Kernel v5.14 Release Test

Linus has released the 5.14 kernel, and mentioned So I realize you must all still be busy with all the galas and fancy balls and all the other 30th anniversary events, but at some point you must be getting tired of the constant glitz, the fireworks, and the champagne. That ball gown or tailcoat isn't the most comfortable thing, either. The celebrations will go on for a few more weeks yet, but you all may just need a breather from them. And when that happens, I have just the thing for you - a new kernel release to test and enjoy.Headline features in 5.14 include: core scheduling (at last), the burstable CFS bandwidth controller, some initial infrastructure for BPF program loaders, the rq_qos I/O priority policy, some improvements to the SO_REUSEPORT networking option, the control-group "kill" button, the memfd_secret() system call, the quotactl_fd() system call, and much more. See the LWN merge-window summaries (part 1, part 2) for more details.

 

0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 9 regressions and 7 improvements during feature development phase for v5.14. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.

3.1.            Observation Summary

0-Day CI observed 9 regressions and 7 improvements during the feature development phase for v5.14, which is in the time frame from v5.14-rc1 to v5.14 release.

Test Indicator

Mail

Test Scenario

Test Machine

Development Base

Status

fsmark.files_per_sec

[xfs] a79b28c284: -4.6% regression

iterations: 1x

nr_threads: 32t

disk: 1SSD

fs: xfs

filesize: 8K

test_size: 400M

sync_method: fsyncBeforeClose

nr_directories: 16d

nr_files_per_directory: 256fpd

cpufreq_governor: performance

lkp-csl-2sp7

v5.13-rc4

 

merged at v5.14-rc1, no response from author yet

stress-ng.loop.ops_per_sec

[pipe] 3a34b13a88: -12.6% regression

nr_threads: 100%

iterations: 4

mode: process

ipc: pipe

cpufreq_governor: performance

lkp-icl-2sp1

 

v5.14-rc3

merged at v5.14-rc4, author accepted the regression and thought it’s expected

stress-ng.fallocate.ops_per_sec

[xfs] eef983ffea: -15.2% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: fallocate cpufreq_governor: performance

lkp-csl-2sp7

 

v5.13-rc4

merged at v5.14-rc1, no response from author, but the regression was gone in v5.14

tbench.throughput-MB/sec

[xfs] bad77c375e: -10.0% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: fallocate

cpufreq_governor: performance

lkp-csl-2sp7

 

v5.13-rc4

merged at v5.14-rc1, no response from author yet

stress-ng.link.ops_per_sec

[btrfs] ecc64fab7d: -81.7% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: btrfs

class: filesystem

test: link

cpufreq_governor: performance

lkp-csl-2sp7

 

v5.13-rc4

merged at v5.14-rc4, author accepted the regression and sent out a fixed patch 6e3688e66f2f, regression recovered to +443.3%

stress-ng.lockbus.ops_per_sec

[clocksource] db3a34e174: -10.1% regression

nr_threads: 100%

testtime: 60s

class: cpu-cache

test: lockbus

cpufreq_governor: performance

lkp-csl-2sp7

 

v5.13-rc4

merged at v5.14-rc1, no response from author but 0-Day CI team is following up

stress-ng.mknod.ops_per_sec

[xfs] 2bf1ec0ff0: -45.4% regression

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: mknod

cpufreq_governor: performance

lkp-csl-2sp7

v5.14-rc1

merged at v5.14-rc4, no response from author but 0-Day team thought it’s acceptable

stress-ng.sigio.ops_per_sec

[pipe] 3b844826b6: -99.3% regression

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: interrupt

test: sigio

cpufreq_governor: performance

lkp-csl-2sp7

v5.14-rc6

merged at v5.14-rc7, author accepted the regression, Linus sent out a fixed patch fe67f4dd8daa, merged at v5.14

will-it-scale.per_process_ops

[mm/memcg] 5387c90490: -21.3% regression

nr_task: 50%

mode: process

test: unix1

cpufreq_governor: performance

lkp-skl-fpga01

v5.13

merged at v5.14-rc1, no response from author but 0-Day CI team is following up

Improvement

 

 

 

 

 

fio.write_iops

[sched] 9edeaea1bc: 4.7% improvement

disk: 2pmem

fs: xfs

mount_option: dax

runtime: 200s

nr_task: 50%

time_based: tb

rw: write

bs: 2M

ioengine: mmap

test_size: 200G

cpufreq_governor: performance

lkp-csl-2sp6

 

v5.13-rc1

merged at v5.14-rc1

aim9.sync_disk_rw.ops_per_sec

[io] 49e7f0c789: 6.5% improvement

disk: 1SSD

fs: btrfs

runtime: 300s

nr_task: 8

rw: randwrite

bs: 4k

ioengine: io_uring

test_size: 256g

cpufreq_governor: performance

lkp-csl-2ap1

 

v5.14-rc1

merged at v5.14-rc6

hackbench.throughput

[mm/memcg] 5387c90490: 41.3% improvement

nr_threads: 100%

iterations: 4

mode: threads

ipc: socket

cpufreq_governor: performance

lkp-skl-fpga01

v5.13

merged at v5.14-rc1

stress-ng.fanotify.ops_per_sec

[mm/memcg] 68ac5b3c8d: 42.4% improvement

nr_threads: 100%

iterations: 4

mode: threads

ipc: socket

cpufreq_governor: performance

lkp-csl-2ap4

 

v5.13

merged at v5.14-rc1

netperf.Throughput_tps

[iommu/vt] e93a67f5a0: 28.9% improvement

ip: ipv4

runtime: 300s

nr_threads: 16

cluster: cs-localhost

test: TCP_CRR

cpufreq_governor: performance

lkp-csl-2ap3

 

v5.13-rc4

merged at v5.14-rc1

stress-ng.msg.ops_per_sec

[trace] 3d3d9c072e: 18.5% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: os

test: msg

cpufreq_governor: performance

lkp-csl-2sp5

 

v5.13-rc5

merged at v5.14-rc1

will-it-scale.per_thread_ops

[kprobes] ec6aba3d2b: 3.8% improvement

nr_task: 100%

mode: thread

test: getppid1

cpufreq_governor: performance

lkp-csl-2sp9

 

v5.13-rc1

merged at v5.14-rc1

4.            Shift-Left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.14 release cycle, 0-Day CI had reported 9 major performance regressions and 8 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.

4.1         Report Summary

0-Day CI had reported 9 performance regressions and 8 improvements by doing shift-left testing on developer and maintainer repos.

Test Indicator

Mail

Test Scenario

Test Machine

Status

aim7.jobs-per-min

[xfs] 6df693ed7b: -15.7% regression

disk: 4BRD_12G

md: RAID1

fs: xfs

test: disk_wrt

load: 3000

cpufreq_governor: performance

lkp-csl-2sp9

currently not merged, author accepted the regression and 0-Day CI team is working with the author to fix it

aim7.jobs-per-min

[memcg] 45208c9105: -14.0% regression

disk: 1BRD_48G

fs: xfs

test: disk_rr

load: 3000

cpufreq_governor: performance

lkp-icl-2sp2

currently not merged, author accepted the regression and is inprogress to fix it

fxmark.hdd_ext4_no_jnl_DWTL_1_directio.works/sec

[loop] 2112f5c133: -49.6% regression

disk: 1HDD

media: hdd

test: DWTL

fstype: ext4_no_jnl

directio: directio

cpufreq_governor: performance

lkp-knm02

merged at v5.15-rc1, no response from author yet

netperf.Throughput_tps

[bpf] b89fbfbb85: -21.3% regression

ip: ipv4

runtime: 300s

nr_threads: 16

cluster: cs-localhost

test: TCP_CRR

cpufreq_governor: performance

lkp-csl-2ap3

merged at v5.15-rc1, author couldn't reproduce in his environment, 0-Day CI team is following up

stress-ng.memhotplug.ops_per_sec

[mm/migrate] 9eeb73028c: -53.8% regression

10%-1HDD-60s-ext4-os-memhotplug-performance-ucode=0x5003006

lkp-csl-2sp5

currently not merged, author accepted the regression, and 0-Day CI team is working with the author to fix it

will-it-scale.per_process_ops

 

[memcg] 059dd9003a: -39.8% regression

nr_task: 100%

mode: process

test: lock1

cpufreq_governor: performance

lkp-icl-2sp1

currently not merged, no response from author yet

will-it-scale.per_process_ops

 

[memcg] 0f12156dff: -33.6% regression

nr_task: 50%

mode: process

test: lock1

cpufreq_governor: performance

lkp-skl-fpga01

merged at v5.15-rc1, author accepted the regression and the patch was reverted on latest kernel tree

will-it-scale.per_thread_ops

[posix] 63a17eea7d: -43.8% regression

nr_task: 100%

mode: thread

test: lseek1

cpufreq_governor: performance

lkp-skl-fpga01

currently not merged, no response from author yet

will-it-scale.per_thread_ops

[memcg] fa4e6b1ad5: -15.4% regression

nr_task: 50%

mode: thread

test: poll2

cpufreq_governor: performance

lkp-hsw-4ex1

currently not merged, no response from author yet

Improvement

 

 

 

 

aim7.jobs-per-min

[ext4] cc883236b7: 69.4% improvement

disk: 4BRD_12G

md: RAID0

fs: ext4

test: disk_rw

load: 3000

cpufreq_governor: performance

lkp-csl-2sp9

currently not merged

filebench.sum_operations/s

[sched] 260916b537: 5.6% improvement

disk: 1HDD

fs: f2fs

test: filemicro_writefsync.f

cpufreq_governor: performance

lkp-knm02

currently not merged

fsmark.files_per_sec

[SUNRPC] e38b3f2005: 1857.1% improvement

iterations: 1x

nr_threads: 1t

disk: 1BRD_48G

fs: f2fs

fs2: nfsv4

filesize: 4M

test_size: 24G

sync_method: NoSync

cpufreq_governor: performance

lkp-csl-2ap2

merged at v5.15-rc1

stress-ng.link.ops_per_sec

[btrfs] 6e3688e66f: 443.3% improvement

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: btrfs

class: filesystem

test: link

cpufreq_governor: performance

lkp-csl-2sp7

currently not merged

stress-ng.loop.ops_per_sec

[loop] 8883efd909: 148.4% improvement

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: device

test: loop

cpufreq_governor: performance

lkp-csl-2sp7

currently not merged

stress-ng.loop.ops_per_sec

[loop] acd1746478: 140.9% improvement

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: device

test: loop

cpufreq_governor: performance

lkp-csl-2sp7

currently not merged

stress-ng.netdev.ops_per_sec

[net] b0e99d0377: 7.7% improvement

nr_threads: 100%

testtime: 60s

class: network

test: netdev

cpufreq_governor: performance

lkp-csl-2sp5

merged at v5.15-rc1

will-it-scale.per_thread_ops

[fsnotify] e43de7f086: 10.2% improvement

nr_task: 100%

mode: thread

test: eventfd1

cpufreq_governor: performance

lkp-csl-2ap2

merged at v5.15-rc1

5        Test Machines

5.1         IVB Desktop

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

8

memory

16G

 

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

4

memory

8G

 

5.2         SKL SP

model

Skylake

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

80

memory

64G

 

5.3         BDW EP

model

Broadwell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu number

88

memory

128G

 

5.4         HSW EP

model

Haswell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu number

72

memory

128G

 

5.5         IVB EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

cpu number

40

memory

384G

 

model

Ivytown Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

48

memory

64G

 

5.6         HSX EX

model

Brickland Haswell-EX

brand

Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

cpu number

144

memory

512G