Sorry, you need to enable JavaScript to visit this website.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-Day CI Linux Kernel Performance Report (v5.10)

BY Rong Chen ON Jan 13, 2021
  1. Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, test parameter description 

  • Section 3, merged regressions and improvements in v5.10 release candidates

  • Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.10 release cycle

  • Section 5, performance comparison among different kernel releases

  • Section 6, test machine list

 

  1. Test parameters descriptions 

Here are the descriptions for each parameter/field used in the tests. 

 

Classification

Name

Description

General

runtime

Run the test case within a certain time period (seconds or minutes)

 

nr_task

If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1.

If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number

 

nr_threads

Alias of nr_task

 

iterations

Number to repeat this job

 

test_size

Test disk size or memory size

 

set_nic_irq_affinity

Set NIC interrupt affinity

 

disable_latency_stats

Latency_stats may introduce too much noise if there are too many context switches, allow to disable it

 

transparent_hugepage

Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage)

 

boot_params:bp1_memmap

Boot parameters of memmap

 

disk:nr_pmem

number of pmem partitions used by test

 

swap:priority

Priority means the  priority  of  the  swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value. 

Test Machine

model

Name of Intel processor microarchitecture

 

brand

Brand name of cpu

 

cpu_number

Number of cpu

 

memory

Size of memory

 

  1. Linux Kernel v5.10 Release Test

The 5.10 release of the Linux kernel was on December 14, 2020. Linus has released the 5.10 kernel. "I pretty much always wish that the last week was even calmer than it was, and that's true here too. There's a fair amount of fixes in here, including a few last-minute reverts for things that didn't get fixed, but nothing makes me go 'we need another week'. Things look fairly normal." Significant changes in this release include support for the Arm memory tagging extension, restricted rings for io_uring, sleepable BPF programs, the process_madvise() system call, ext4 "fast commits", and more. See the LWN merge-window summaries (part 1, part 2) and the KernelNewbies 5.10 page for more details.

 

0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 1 regressions and 8 improvements during feature development phase for v5.10. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.

  1. Observation Summary

0-Day CI observed 1 regressions and 8 improvements during the feature development phase for v5.10, which is in the time frame from v5.10-rc1 to v5.10 release. 

Test Indicator

Report

Test Scenario

Test Machine

Development Base

Status

phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second

[drm/i915/gem] 59dd13ad31: -54.0% regression

need_x: true 

test: jxrendermark-1.2.4 option_a: Radial Gradient Paint 

option_b: 1024x1024 

cpufreq_governor: performance

lkp-cfl-d1

v5.10-rc2

merged at 5.10-rc3, no response from author yet, but we have some disguesstion

aim7.jobs-per-min

[sched/fair] ec73240b16: 2.3% improvement

disk: 4BRD_12G 

md: RAID1 

fs: xfs 

test: sync_disk_rw 

load: 300 

cpufreq_governor: performance

lkp-cpx-4s1

v5.9-rc1

merged at v5.10-rc1

aim7.jobs-per-min

[f2fs] 79963d967b: 468.8% improvement

disk: 1BRD_48G 

fs: f2fs 

test: sync_disk_rw 

load: 200 

cpufreq_governor: performance

lkp-csl-2ap2

v5.7

merged at v5.10-rc1

fio.read_iops

[Fonts] 9522750c66: 7.5% improvement

disk: 2pmem 

fs: xfs 

mount_option: dax runtime: 200s 

nr_task: 50% time_based: tb 

rw: read 

bs: 2M 

ioengine: mmap test_size: 200G 

cpufreq_governor: performance

lkp-csl-2sp6

v5.10-rc2

merged at 5.10-rc3

stress-ng.sigsuspend.ops_per_sec

[perf/x86] e506d1dac0: 58.5% improvement

nr_threads: 100% 

disk: 1HDD 

testtime: 30s 

class: interrupt 

cpufreq_governor: performance

lkp-csl-2sp7

v5.10-rc2

merged at v5.10-rc4

vm-scalability.throughput

[mm/page_alloc] 7fef431be9: 87.8% improvement

runtime: 300s 

size: 512G 

test: anon-wx-rand-mt 

cpufreq_governor: performance

lkp-csl-2ap4

v5.9

merged at v5.10-rc1

will-it-scale.per_process_ops

[mm/shmem] 63ec1973dd: 21.7% improvement

nr_task: 16 

mode: process 

test: pread2 

cpufreq_governor: performance

lkp-hsw-4ex1

v5.9

merged at v5.10-rc1

will-it-scale.per_process_ops

[mm] 4df910620b: 37.7% improvement

nr_task: 50% 

mode: process 

test: page_fault2 

cpufreq_governor: performance

lkp-hsw-4ex1

v5.10-rc5

merged at v5.10-rc6

will-it-scale.per_thread_ops

[tracepoint] d25e37d89d: 1.4% improvement

nr_task: 100% 

mode: thread 

test: sched_yield 

cpufreq_governor: performance

lkp-hsw-4ex1

v5.9-rc3

merged at v5.10-rc1

 

  1. fio.read_iops

Fio will simulate a given I/O workload and enable flexible testing of the Linux I/O subsystem and schedulers.

  1. Scenario: read

Commit 9522750c66 was reported to have 7.5% improvement of fio.read_iops when comparing to v5.10-rc2. It was merged to mainline at v5.10-rc3.

 

Correlated commits

9522750c66

Fonts: Replace discarded const qualifier

branch

linus/master

report

[Fonts] 9522750c66: fio.read_iops 7.5% improvement

test scenario

disk: 2pmem 

fs: xfs 

mount_option: dax 

runtime: 200s 

nr_task: 50% 

time_based: tb 

rw: read 

bs: 2M 

ioengine: mmap 

test_size: 200G 

cpufreq_governor: performance

test machine

lkp-csl-2sp6

status

merged at v5.10-rc3

  1. stress-ng.sigsuspend.ops_per_sec

stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

 

3.3.1 scenario: 100%-1HDD-30s-interrupt

 

Commit e506d1dac0 was reported to have 58.5% improvement of stress-ng.sigsuspend.ops_per_sec when comparing to v5.10-rc2. It was merged to mainline at v5.10-rc4.

 

Correlated commits

e506d1dac0

perf/x86: Make dummy_iregs static

branch

linus/master

report

[perf/x86] e506d1dac0: stress-ng.sigsuspend.ops_per_sec 58.5% improvement

test scenario

nr_threads: 100% 

disk: 1HDD 

testtime: 30s 

class: interrupt 

cpufreq_governor: performance

test machine

lkp-csl-2sp7

status

merged at v5.10-rc4

 

  1. will-it-scale.per_thread_ops

 

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

3.4.1 Scenario: process thread2

  

Commit 63ec1973dd was reported to have 21.7% improvement of will-it-scale.per_process_ops when comparing to v5.9. It was merged to mainline at v5.10-rc1.

 

Correlated commits

63ec1973dd

mm/shmem: return head page from find_lock_entry

branch

linus/master

report

[mm/shmem] 63ec1973dd: will-it-scale.per_process_ops 21.7% improvement

test scenario

nr_task: 16 

mode: process 

test: pread2 

cpufreq_governor: performance

test machine

lkp-hsw-4ex1

status

merged at v5.10-rc1

 

  1. Shift-Left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.10 release cycle, 0-Day CI had reported 18 major performance regressions and 19 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.

  1. Report Summary

0-Day CI had reported 18 performance regressions and 19 improvements by doing shift-left testing on developer and maintainer repos.

 

Test Indicator

Mail

Test Scenario

Test Machine

Status

aim7.jobs-per-min

[locking/rwsem] c9847a7f94: -91.8% regression

disk: 4BRD_12G 

md: RAID0 

fs: f2fs 

test: sync_disk_rw 

load: 100 

cpufreq_governor: performance

lkp-csl-2sp2

currently not merged, no response from author yet

fio.write_iops

[btrfs] 96bed17ad9: -59.7% regression

disk: 1SSD 

fs: btrfs 

runtime: 300s 

nr_task: 8 

rw: write 

bs: 4k 

ioengine: sync 

test_size: 256g 

cpufreq_governor: performance 

lkp-csl-2ap1

currently not merged, no response from author yet 

fio.write_iops

[btrfs] e7a8dd2d95: -98.3% regression

runtime: 300s 

disk: 1HDD 

fs: btrfs 

nr_task: 100% 

test_size: 128G 

rw: write 

bs: 4k 

ioengine: sync 

direct: direct 

cpufreq_governor: performance

lkp-ivb-2ep1

currently not merged, no response from author yet 

fsmark.files_per_sec

[SUNRPC] 3bc6a407d1: -12.9% regression

iterations: 1x 

nr_threads: 64t 

disk: 1BRD_48G 

fs: xfs 

fs2: nfsv4 

filesize: 4M 

test_size: 40G 

sync_method: fsyncBeforeClose 

cpufreq_governor: performance

lkp-csl-2ap2

currently not merged, no response from author yet 

fxmark.hdd_btrfs_MWCL_9_bufferedio.works/sec

[btrfs] 196d59ab9c: -16.4% regression

disk: 1HDD 

media: hdd 

test: MWCL 

fstype: btrfs 

directio: bufferedio 

cpufreq_governor: performance

lkp-knm01

merged at v5.11-rc1, no response from author yet

netperf.Throughput_tps

[sched/fair] 8d86968ac3: -29.5% regression

ip: ipv4 

runtime: 300s 

nr_threads: 50% 

cluster: cs-localhost 

test: UDP_RR 

cpufreq_governor: performance

lkp-csl-2ap3

currently not merged, no response from author yet 

netperf.Throughput_tps

[sched/fair] d8fcb81f1a: -16.9% regression

ip: ipv4 

runtime: 300s 

nr_threads: 25% 

cluster: cs-localhost 

test: SCTP_RR 

cpufreq_governor: performance

lkp-cpl-4sp1

merged at v5.11-rc1, no response from author yet

stress-ng.zero.ops_per_sec

[locking/qspinlock] 0e8d8f4f12: -9.7% regression

nr_threads: 100% 

disk: 1HDD 

testtime: 30s 

class: device 

cpufreq_governor: performance

lkp-ivb-2ep1

currently not merged, no response from author yet 

unixbench.score

[fanotify] a23a7dc576: -3.7% regression

runtime: 300s 

nr_task: 30% 

test: pipe 

cpufreq_governor: performance

lkp-csl-2sp4

currently not merged, no response from author yet 

unixbench.score

[locking/qspinlock] 6f9a39a437: -17.3% regression

runtime: 300s 

nr_task: 30% 

test: context1 

cpufreq_governor: performance

lkp-csl-2sp4

author accept the regression and is in progress to fix

unixbench.score

[locking/rwsem] 10a59003d2: -25.5% regression

runtime: 300s 

nr_task: 30% 

test: shell8 

cpufreq_governor: performance

lkp-cfl-e1

currently not merged, no response from author yet 

unixbench.score

[locking/rwsem] 617f3ef951: -21.2% regression

runtime: 300s 

nr_task: 30% 

test: shell8 

cpufreq_governor: performance

lkp-cfl-e1

merged at v5.11-rc1, no response from author yet 

unixbench.score

[entry] a358d5636c: -1.9% regression

nr_task: 16 mode: process test: futex3 cpufreq_governor: performance

lkp-csl-2ap2

currently not merged, no response from author yet 

vm-scalability.throughput

[mm/swap] aae466b005: -2.7% regression

nr_task: 50% 

mode: process 

test: futex3 

cpufreq_governor: performance

lkp-csl-2ap1

currently not merged, no response from author yet 

will-it-scale.per_process_ops

[fs/userfaultfd] fec9227821: -5.5% regression

nr_task: 50% 

mode: process 

test: brk1 

cpufreq_governor: performance

lkp-skl-fpga01

currently not merged, no response from author yet 

will-it-scale.per_process_ops

[iov_iter] 9bd0e337c6: -4.8% regression

nr_task: 50% 

mode: process 

test: pwrite1 

cpufreq_governor: performance

lkp-ivb-2ep1

currently not merged, no response from author yet 

will-it-scale.per_thread_ops

[eventfd] 5cb13cb023: -6.3% regression

nr_task: 50% 

mode: thread 

test: eventfd1 

cpufreq_governor: performance

lkp-csl-2ap2

currently not merged, no response from author yet 

will-it-scale.per_thread_ops

[sched/hotplug] 2558aacff8: -1.6% regression

nr_task: 100% 

mode: thread 

test: sched_yield 

cpufreq_governor: performance

lkp-cpl-4sp1

merged at v5.11-rc1, no response from author yet

fio.write_iops

[btrfs] 19cfa79c62: 17.8% improvement

disk: 1SSD 

fs: btrfs 

runtime: 300s 

nr_task: 8 

rw: randwrite 

bs: 4k 

ioengine: sync 

test_size: 256g 

cpufreq_governor: performance

lkp-csl-2ap1

currently not merged

fio.write_iops

[x86] 97e8f0134a: 8.6% improvement

disk: 2pmem 

fs: xfs 

mount_option: dax 

runtime: 200s 

nr_task: 50% 

time_based: tb 

rw: randwrite 

bs: 4k

ioengine: sync 

test_size: 200G 

cpufreq_governor: performance

lkp-csl-2sp6

currently not merged 

fio.write_iops

[btrfs] ba7fa25e8c: 2.7% improvement

runtime: 300s 

disk: 1HDD 

fs: btrfs 

nr_task: 100% 

test_size: 128G 

rw: randwrite 

bs: 4k 

ioengine: sync 

cpufreq_governor: performance

lkp-cfl-e1

currently not merged 

fio.write_iops

[btrfs] 5951248cb0: 97.3% improvement

disk: 2pmem 

fs: btrfs 

runtime: 200s 

nr_task: 50% 

time_based: tb 

rw: randwrite 

bs: 4k 

ioengine: mmap 

test_size: 100G 

cpufreq_governor: performance

lkp-csl-2sp6

currently not merged

fio.write_iops

[crypto] daf88f3757: 5.9% improvement

disk: 2pmem 

fs: xfs 

mount_option: dax 

runtime: 200s 

nr_task: 50% 

time_based: tb 

rw: write 

bs: 4k 

ioengine: mmap 

test_size: 200G 

cpufreq_governor: performance

lkp-csl-2sp6

merged at v5.11-rc1

fsmark.files_per_sec

[btrfs] fac2f60d5f: 62.6% improvement

iterations: 1x 

nr_threads: 32t 

disk: 1SSD 

fs: btrfs 

filesize: 9B 

test_size: 400M 

sync_method: fsyncBeforeClose 

nr_directories: 16d 

nr_files_per_directory: 256fpd 

cpufreq_governor: performance

lkp-csl-2sp7

currently not merged 

fsmark.files_per_sec

[sched/numa] 77cbe725cf: 29.6% improvement

iterations: 8 

disk: 1SSD 

nr_threads: 4 

fs: f2fs 

filesize: 8K 

test_size: 72G 

sync_method: fsyncBeforeClose 

nr_directories: 16d 

nr_files_per_directory: 256fpd 

cpufreq_governor: performance

lkp-csl-2ap1

currently not merged

fsmark.files_per_sec

[locking/qspinlock] 0e8d8f4f12: 213.9% improvement

iterations: 1x 

nr_threads: 64t 

disk: 1BRD_48G 

fs: btrfs 

filesize: 4M 

test_size: 24G 

sync_method: NoSync 

cpufreq_governor: performance

lkp-csl-2ap2

currently not merged 

hackbench.throughput

[sched/fair] 8d86968ac3: 51.7% improvement

nr_threads: 100% 

iterations: 4 

mode: threads 

ipc: pipe 

cpufreq_governor: performance

lkp-csl-2ap4

currently not merged 

stress-ng.fallocate.ops_per_sec

[btrfs] 639bd575b7: 29.2% improvement

nr_threads: 10% 

disk: 1HDD 

testtime: 30s 

class: filesystem 

cpufreq_governor: performance 

fs: btrfs

lkp-csl-2sp7

merged at v5.11-rc1

stress-ng.sendfile.ops_per_sec

[mm/filemap.c] 06c0444290: 26.7% improvement

nr_threads: 100% 

disk: 1HDD 

testtime: 30s 

class: pipe 

cpufreq_governor: performance

lkp-csl-2sp5

merged at v5.11-rc1

stress-ng.spawn.ops_per_sec

[kernfs] 37746795a6: 7.1% improvement

100%-1HDD-30s-exec_spawn-performance-ucode=0x5003003

lkp-csl-2sp5

currently not merged

unixbench.score

[sched/numa] e7f28850ea: 1.5% improvement

runtime: 300s 

nr_task: 30% 

test: pipe 

cpufreq_governor: performance

lkp-csl-2sp4

currently not merged 

vm-scalability.throughput

[mm] dfe9ecdb6c: 88.9% improvement

runtime: 300s 

test: small-allocs-mt 

cpufreq_governor: performance

lkp-csl-2ap4

currently not merged

vm-scalability.throughput

[locking/qspinlock] 0dd6d5b8c0: 102.9% improvement

runtime: 300s 

size: 8T 

test: anon-cow-seq-mt 

cpufreq_governor: performance

lkp-csl-2ap4

currently not merged

vm-scalability.throughput

[locking/rwsem] 25d0c60b0e: 316.2% improvement

runtime: 300s 

test: small-allocs-mt 

cpufreq_governor: performance

lkp-csl-2ap4

currently not merged

vm-scalability.throughput

[locking/rwsem] 2f06f70292: 385.1% improvement

runtime: 300s 

test: small-allocs-mt 

cpufreq_governor: performance

lkp-csl-2ap4

merged at v5.11-rc1

will-it-scale.per_process_ops

[sched] 80340f8d5f: 2.0% improvement

nr_task: 16 

mode: process 

test: poll1 

cpufreq_governor: performance

lkp-csl-2ap3

currently not merged 

will-it-scale.per_thread_ops

[cpuidle] cbf796d1ec: 30.5% improvement

nr_task: 50% 

mode: thread 

test: context_switch1 

cpufreq_governor: performance

lkp-csl-2ap2

currently not merged 

  1. aim7.jobs-per-min

aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system. 

 

  1. scenario: disk_rw test on f2fs

Commit c9847a7f94 was reported to have -91.8% regression of aim7.jobs-per-min when comparing to v5.10-rc3.

 

Correlated commits

c9847a7f94

locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED

branch

linux-review/Waiman-Long/locking-rwsem-Rework-reader-optimistic-spinning/20201121-122118

report

[locking/rwsem] c9847a7f94: aim7.jobs-per-min -91.8% regression

test scenario

disk: 4BRD_12G 

md: RAID0 

fs: f2fs 

test: sync_disk_rw 

load: 100 

cpufreq_governor: performance

test machine

lkp-csl-2sp2

status

currently not merged, no response from author yet

 

  1. fio.write_iops

fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.

  1. scenario: randwrite on xfs

Commit 97e8f0134a was reported to have 8.6% improvement of fio.write_iops when comparing to v5.10-rc4.

 

Correlated commits

97e8f0134a

x86: rework arch_local_irq_restore() to not use popf

branch

linux-review/Juergen-Gross/x86-major-paravirt-cleanup/20201120-194934

report

[x86] 97e8f0134a: 8.6% improvement

test scenario

disk: 2pmem 

fs: xfs 

mount_option: dax 

runtime: 200s 

nr_task: 50% 

time_based: tb 

rw: randwrite 

bs: 4k

ioengine: sync 

test_size: 200G 

cpufreq_governor: performance

test machine

Lkp-csl-2sp6

status

currently not merged

  1. vm-scalability

vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. We tested on multiple machines such as HSW EP server, during which we reported improvement on one test scenario.

  1. Scenario: cow-seq-mt test

  

Commit 0dd6d5b8c0 was reported to have 102.9% improvement of vm-scalability.throughput when comparing to v5.10-rc3.

 

Correlated commits

0dd6d5b8c0

locking/qspinlock: Introduce CNA into the slow path of qspinlock

branch

linux-review/Alex-Kogan/Add-NUMA-awareness-to-qspinlock/20201118-072506

report

[locking/qspinlock] 0dd6d5b8c0: 102.9% improvement

test scenario

runtime: 300s 

size: 8T 

test: anon-cow-seq-mt 

cpufreq_governor: performance

test machine

lkp-csl-2ap4

status

currently not merged

 

  1. Latest Release Performance Comparing

 

This session gives some information about the performance difference among different kernel releases, especially between v5.10 and v5.9. There are 50+ performance benchmarks running in 0-Day CI, and we selected 9 benchmarks which historically showed the most regressions/improvements reported by 0-Day CI. Some typical configuration/parameters are used to run the test. For some of the regressions from the comparison, 0-Day did not successfully bisect it thus no related report sent out during the release development period, but it is still worth checking. The root cause to cause the regressions won’t be covered in this session. 

 

In the following figures, the value on the Y-axis is the relative performance number. We used the v5.9 data as the base (performance number is 100).

  1. test suite: vm-scalability

vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. Below 2 tests show the typical test results. 

 

vm-scalability Test 1

vm-scalability Test 2

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                           

 

vm-scalability Test 1 

vm-scalability Test 2

test machine

model: Coffee Lake

brand: Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz

cpu_number: 16

memory: 32G

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

runtime

300s

300s

size

2T

512G

vm-scalability test parameter

test case: shm-xread-seq

test case: anon-wx-rand-mt

performance summary

vm-scalability.median on kernel v5.10 has -3.22% regression when comparing to v5.9

vm-scalability.throughput on kernel v5.10 has 83.47% improvement when comparing to v5.9

 

  1. test suite: will-it-scale

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

 

will-it-scale  Test 1

Will-it-scale Test 2

   

Here are the parameters and performance test summary for above tests:                                                                                                                                                                                                                          

 

will-it-scale Test 1 

will-it-scale Test 3 

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

model: Haswell-EX

brand: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

cpu_number: 144

memory: 512G

nr_task

16

16

will-it-scale test parameter

mode: process

test: pread1

mode: process

test: read2

summary

will-it-scale.per_thread_ops on kernel v5.10 has -31.26% regression when comparing to v5.9

will-it-scale.per_process_ops on kernel v5.10 has 3.6% improvement when comparing to v5.9

 

  1. test suite: unixbench

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

 

Unixbench Test 1

Unixbench Test 2

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                           

 

Unixbench Test 1 

Unixbench Test 2

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) CPU @ 2.30GHz

cpu_number: 96

memory: 128G

model: Cascade Lake

brand: Intel(R) Xeon(R) CPU @ 2.30GHz

cpu_number: 96

memory: 128G

runtime

300s

300s

nr_task

30%

1

unixbench test parameter

test case: context1

test case: execl

performance summary

unixbench.score on kernel v5.10 has 7.95% improvement when comparing to v5.9

unixbench.score on kernel v5.10 has -4.59% regression when comparing to v5.9

 

  1. test suite: reaim

reaim updates and improves the existing Open Source AIM 7 benchmark. aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

 

reaim Test 1                             

reaim Test 2

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                      

 

reaim Test 1 

reaim Test 2

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz

cpu_number: 96

memory: 256G

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

runtime

300s

300s

nr_task

100%

100%

disk

1HDD

No requirement

fs

btrfs

No requirement

reaim test parameter

test case: disk

test case: short

performance  summary

reaim.jobs_per_min on kernel v5.10 has 19.46% improvement when comparing to v5.9

reaim.jobs_per_min on kernel v5.10 has -21.76% regression when comparing to v5.9

                                                                                                                                                                                                     

  1. test suite: pigz

pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

pigz Test 1 


 

 

Here are the test configuration and performance test summary for above tests:          

 

 

pigz Test 1

test machine

model: Knights Mill

brand: Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz

cpu_number: 288

memory: 80G

nr_threads

25%

pigz Test parameter

blocksize: 128K

performance  summary

pigz.throughput on kernel v5.10 has 4.31% improvement when comparing to v5.9

            

  1. test suite: netperf

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

 

netperf Test 1

netperf Test 2

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                          

 

netperf Test 1 

netperf Test 2 

test machine

model: Cooper Lake

brand: Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz

cpu_number: 144

memory: 128G

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

disable_latency_stats

1

1

set_nic_irq_affinity

1

1

runtime

300s

300s

nr_threads

1

200%

ip

ipv4

Ipv4

netperf test parameter

test case: UDP_RR

test case: SCTP_RR

performance  summary

netperf.Throughput_tps on kernel v5.10 has -7.59% regression when comparing to v5.9

netperf.Throughput_tps on kernel v5.10 has 57.78% improvement when comparing to v5.9

 

  1. test suite: hackbench

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's  main job  is  to  create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long  it takes for each pair to send data back and forth.

hackbench Test 1

hackbench Test 2

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                           

 

hackbench Test 1 

hackbench Test 2 

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

disable_latency_stats

1

1

nr_task

100%

100%

unixbench test parameter

iterations: 5

mode: process

ipc: socket

mode: threads

ipc: pipe

performance summary

hackbench.throughput on kernel v5.10 has -39.73% regression when comparing to v5.9

hackbench.throughput on kernel v5.10 has 24.94% improvement when comparing to v5.9

            

  1. test suite: fio

Fio was originally written to save me the hassle of writing special test case programs when I wanted to test a specific workload, either for performance reasons or to find/reproduce a bug.

fio Test 1 

  

fio Test 2

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                          

 

fio Test 1

fio Test 2

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) CPU @ 2.20GHz

cpu_number: 192

memory: 192G

model: Cascade Lake

brand: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz

cpu_number: 96

memory: 256G

runtime

300s

200s

file system

xfs

ext4

disk

1SSD

2pmem

boot_params

No requirement

bp1_memmap: 104G!8G

bp2_memmap: 104G!132G

nr_task

8

50%

time_based

No requirement

time_based: tb

fio test parameter

fio-setup-basic:

  rw: randread

  bs: 4k

  ioengine: sync

  test_size: 256g

fio-setup-basic:

  rw: randwrite

  bs: 4k

  ioengine: libaio

  test_size: 200G

performance  summary

fio.read_iops on kernel v5.10 has 3.23% improvement when comparing to v5.9

fio.write_bw_MBps on kernel v5.10 has 243.84% improvement when comparing to v5.9

 

  1. test suite: ebizzy

ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.

ebizzy Test 1

 

 

Here are the test configuration and performance test summary for above test:                                                                                                                                                                                                                          

 

ebizzy Test 1

test machine

  model: Knights Mill

  brand: Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz

  cpu_number: 288

  memory: 80G

transparent_hugepage

No requirement

nr_threads

200%

iterations

100x

ebizzy test parameter

duration: 10s

performance  summary

ebizzy.throughput on kernel v5.10 is almost the same as that in v5.9

 

  1. Test Machines

    1. IVB Desktop

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

8

memory

16G

 

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

4

memory

8G

 

  1. SKL SP

model

Skylake

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

80

memory

64G

 

  1. BDW EP

model

Broadwell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu number

88

memory

128G

 

  1. HSW EP

model

Haswell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu number

72

memory

128G

 

  1. IVB EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

cpu number

40

memory

384G

 

model

Ivytown Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

48

memory

64G

 

  1. HSX EX

model

Brickland Haswell-EX

brand

Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

cpu number

144

memory

512G