0-Day CI Linux Kernel Performance Report (v5.10)
-
Introduction
0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:
-
Section 2, test parameter description
-
Section 3, merged regressions and improvements in v5.10 release candidates
-
Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.10 release cycle
-
Section 5, performance comparison among different kernel releases
-
Section 6, test machine list
-
Test parameters descriptions
Here are the descriptions for each parameter/field used in the tests.
Classification |
Name |
Description |
General |
runtime |
Run the test case within a certain time period (seconds or minutes) |
nr_task |
If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1. If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number |
|
nr_threads |
Alias of nr_task |
|
iterations |
Number to repeat this job |
|
test_size |
Test disk size or memory size |
|
set_nic_irq_affinity |
Set NIC interrupt affinity |
|
disable_latency_stats |
Latency_stats may introduce too much noise if there are too many context switches, allow to disable it |
|
transparent_hugepage |
Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage) |
|
boot_params:bp1_memmap |
Boot parameters of memmap |
|
disk:nr_pmem |
number of pmem partitions used by test |
|
swap:priority |
Priority means the priority of the swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value. |
|
Test Machine |
model |
|
brand |
Brand name of cpu |
|
cpu_number |
Number of cpu |
|
memory |
Size of memory |
-
Linux Kernel v5.10 Release Test
The 5.10 release of the Linux kernel was on December 14, 2020. Linus has released the 5.10 kernel. "I pretty much always wish that the last week was even calmer than it was, and that's true here too. There's a fair amount of fixes in here, including a few last-minute reverts for things that didn't get fixed, but nothing makes me go 'we need another week'. Things look fairly normal." Significant changes in this release include support for the Arm memory tagging extension, restricted rings for io_uring, sleepable BPF programs, the process_madvise() system call, ext4 "fast commits", and more. See the LWN merge-window summaries (part 1, part 2) and the KernelNewbies 5.10 page for more details.
0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 1 regressions and 8 improvements during feature development phase for v5.10. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.
-
Observation Summary
0-Day CI observed 1 regressions and 8 improvements during the feature development phase for v5.10, which is in the time frame from v5.10-rc1 to v5.10 release.
Test Indicator |
Report |
Test Scenario |
Test Machine |
Development Base |
Status |
phoronix-test-suite.jxrendermark.RadialGradientPaint.1024x1024.operations_per_second |
need_x: true test: jxrendermark-1.2.4 option_a: Radial Gradient Paint option_b: 1024x1024 cpufreq_governor: performance |
lkp-cfl-d1 |
v5.10-rc2 |
merged at 5.10-rc3, no response from author yet, but we have some disguesstion |
|
aim7.jobs-per-min |
disk: 4BRD_12G md: RAID1 fs: xfs test: sync_disk_rw load: 300 cpufreq_governor: performance |
lkp-cpx-4s1 |
v5.9-rc1 |
merged at v5.10-rc1 |
|
aim7.jobs-per-min |
disk: 1BRD_48G fs: f2fs test: sync_disk_rw load: 200 cpufreq_governor: performance |
lkp-csl-2ap2 |
v5.7 |
merged at v5.10-rc1 |
|
fio.read_iops |
disk: 2pmem fs: xfs mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: read bs: 2M ioengine: mmap test_size: 200G cpufreq_governor: performance |
lkp-csl-2sp6 |
v5.10-rc2 |
merged at 5.10-rc3 |
|
stress-ng.sigsuspend.ops_per_sec |
nr_threads: 100% disk: 1HDD testtime: 30s class: interrupt cpufreq_governor: performance |
lkp-csl-2sp7 |
v5.10-rc2 |
merged at v5.10-rc4 |
|
vm-scalability.throughput |
runtime: 300s size: 512G test: anon-wx-rand-mt cpufreq_governor: performance |
lkp-csl-2ap4 |
v5.9 |
merged at v5.10-rc1 |
|
will-it-scale.per_process_ops |
nr_task: 16 mode: process test: pread2 cpufreq_governor: performance |
lkp-hsw-4ex1 |
v5.9 |
merged at v5.10-rc1 |
|
will-it-scale.per_process_ops |
nr_task: 50% mode: process test: page_fault2 cpufreq_governor: performance |
lkp-hsw-4ex1 |
v5.10-rc5 |
merged at v5.10-rc6 |
|
will-it-scale.per_thread_ops |
nr_task: 100% mode: thread test: sched_yield cpufreq_governor: performance |
lkp-hsw-4ex1 |
v5.9-rc3 |
merged at v5.10-rc1 |
-
fio.read_iops
Fio will simulate a given I/O workload and enable flexible testing of the Linux I/O subsystem and schedulers.
-
Scenario: read
Commit 9522750c66 was reported to have 7.5% improvement of fio.read_iops when comparing to v5.10-rc2. It was merged to mainline at v5.10-rc3.
Correlated commits
9522750c66 |
Fonts: Replace discarded const qualifier |
branch |
linus/master |
report |
|
test scenario |
disk: 2pmem fs: xfs mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: read bs: 2M ioengine: mmap test_size: 200G cpufreq_governor: performance |
test machine |
lkp-csl-2sp6 |
status |
merged at v5.10-rc3 |
-
stress-ng.sigsuspend.ops_per_sec
stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
3.3.1 scenario: 100%-1HDD-30s-interrupt
Commit e506d1dac0 was reported to have 58.5% improvement of stress-ng.sigsuspend.ops_per_sec when comparing to v5.10-rc2. It was merged to mainline at v5.10-rc4.
Correlated commits
e506d1dac0 |
perf/x86: Make dummy_iregs static |
branch |
linus/master |
report |
[perf/x86] e506d1dac0: stress-ng.sigsuspend.ops_per_sec 58.5% improvement |
test scenario |
nr_threads: 100% disk: 1HDD testtime: 30s class: interrupt cpufreq_governor: performance |
test machine |
lkp-csl-2sp7 |
status |
merged at v5.10-rc4 |
-
will-it-scale.per_thread_ops
Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.
3.4.1 Scenario: process thread2
Commit 63ec1973dd was reported to have 21.7% improvement of will-it-scale.per_process_ops when comparing to v5.9. It was merged to mainline at v5.10-rc1.
Correlated commits
63ec1973dd |
mm/shmem: return head page from find_lock_entry |
branch |
linus/master |
report |
[mm/shmem] 63ec1973dd: will-it-scale.per_process_ops 21.7% improvement |
test scenario |
nr_task: 16 mode: process test: pread2 cpufreq_governor: performance |
test machine |
lkp-hsw-4ex1 |
status |
merged at v5.10-rc1 |
-
Shift-Left Testing
Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.10 release cycle, 0-Day CI had reported 18 major performance regressions and 19 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.
-
Report Summary
0-Day CI had reported 18 performance regressions and 19 improvements by doing shift-left testing on developer and maintainer repos.
Test Indicator |
|
Test Scenario |
Test Machine |
Status |
aim7.jobs-per-min |
disk: 4BRD_12G md: RAID0 fs: f2fs test: sync_disk_rw load: 100 cpufreq_governor: performance |
lkp-csl-2sp2 |
currently not merged, no response from author yet |
|
fio.write_iops |
disk: 1SSD fs: btrfs runtime: 300s nr_task: 8 rw: write bs: 4k ioengine: sync test_size: 256g cpufreq_governor: performance |
lkp-csl-2ap1 |
currently not merged, no response from author yet |
|
fio.write_iops |
runtime: 300s disk: 1HDD fs: btrfs nr_task: 100% test_size: 128G rw: write bs: 4k ioengine: sync direct: direct cpufreq_governor: performance |
lkp-ivb-2ep1 |
currently not merged, no response from author yet |
|
fsmark.files_per_sec |
iterations: 1x nr_threads: 64t disk: 1BRD_48G fs: xfs fs2: nfsv4 filesize: 4M test_size: 40G sync_method: fsyncBeforeClose cpufreq_governor: performance |
lkp-csl-2ap2 |
currently not merged, no response from author yet |
|
fxmark.hdd_btrfs_MWCL_9_bufferedio.works/sec |
disk: 1HDD media: hdd test: MWCL fstype: btrfs directio: bufferedio cpufreq_governor: performance |
lkp-knm01 |
merged at v5.11-rc1, no response from author yet |
|
netperf.Throughput_tps |
ip: ipv4 runtime: 300s nr_threads: 50% cluster: cs-localhost test: UDP_RR cpufreq_governor: performance |
lkp-csl-2ap3 |
currently not merged, no response from author yet |
|
netperf.Throughput_tps |
ip: ipv4 runtime: 300s nr_threads: 25% cluster: cs-localhost test: SCTP_RR cpufreq_governor: performance |
lkp-cpl-4sp1 |
merged at v5.11-rc1, no response from author yet |
|
stress-ng.zero.ops_per_sec |
nr_threads: 100% disk: 1HDD testtime: 30s class: device cpufreq_governor: performance |
lkp-ivb-2ep1 |
currently not merged, no response from author yet |
|
unixbench.score |
runtime: 300s nr_task: 30% test: pipe cpufreq_governor: performance |
lkp-csl-2sp4 |
currently not merged, no response from author yet |
|
unixbench.score |
runtime: 300s nr_task: 30% test: context1 cpufreq_governor: performance |
lkp-csl-2sp4 |
||
unixbench.score |
runtime: 300s nr_task: 30% test: shell8 cpufreq_governor: performance |
lkp-cfl-e1 |
currently not merged, no response from author yet |
|
unixbench.score |
runtime: 300s nr_task: 30% test: shell8 cpufreq_governor: performance |
lkp-cfl-e1 |
merged at v5.11-rc1, no response from author yet |
|
unixbench.score |
nr_task: 16 mode: process test: futex3 cpufreq_governor: performance |
lkp-csl-2ap2 |
currently not merged, no response from author yet |
|
vm-scalability.throughput |
nr_task: 50% mode: process test: futex3 cpufreq_governor: performance |
lkp-csl-2ap1 |
currently not merged, no response from author yet |
|
will-it-scale.per_process_ops |
nr_task: 50% mode: process test: brk1 cpufreq_governor: performance |
lkp-skl-fpga01 |
currently not merged, no response from author yet |
|
will-it-scale.per_process_ops |
nr_task: 50% mode: process test: pwrite1 cpufreq_governor: performance |
lkp-ivb-2ep1 |
currently not merged, no response from author yet |
|
will-it-scale.per_thread_ops |
nr_task: 50% mode: thread test: eventfd1 cpufreq_governor: performance |
lkp-csl-2ap2 |
currently not merged, no response from author yet |
|
will-it-scale.per_thread_ops |
nr_task: 100% mode: thread test: sched_yield cpufreq_governor: performance |
lkp-cpl-4sp1 |
merged at v5.11-rc1, no response from author yet |
|
fio.write_iops |
disk: 1SSD fs: btrfs runtime: 300s nr_task: 8 rw: randwrite bs: 4k ioengine: sync test_size: 256g cpufreq_governor: performance |
lkp-csl-2ap1 |
currently not merged |
|
fio.write_iops |
disk: 2pmem fs: xfs mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: randwrite bs: 4k ioengine: sync test_size: 200G cpufreq_governor: performance |
lkp-csl-2sp6 |
currently not merged |
|
fio.write_iops |
runtime: 300s disk: 1HDD fs: btrfs nr_task: 100% test_size: 128G rw: randwrite bs: 4k ioengine: sync cpufreq_governor: performance |
lkp-cfl-e1 |
currently not merged |
|
fio.write_iops |
disk: 2pmem fs: btrfs runtime: 200s nr_task: 50% time_based: tb rw: randwrite bs: 4k ioengine: mmap test_size: 100G cpufreq_governor: performance |
lkp-csl-2sp6 |
currently not merged |
|
fio.write_iops |
disk: 2pmem fs: xfs mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: write bs: 4k ioengine: mmap test_size: 200G cpufreq_governor: performance |
lkp-csl-2sp6 |
merged at v5.11-rc1 |
|
fsmark.files_per_sec |
iterations: 1x nr_threads: 32t disk: 1SSD fs: btrfs filesize: 9B test_size: 400M sync_method: fsyncBeforeClose nr_directories: 16d nr_files_per_directory: 256fpd cpufreq_governor: performance |
lkp-csl-2sp7 |
currently not merged |
|
fsmark.files_per_sec |
iterations: 8 disk: 1SSD nr_threads: 4 fs: f2fs filesize: 8K test_size: 72G sync_method: fsyncBeforeClose nr_directories: 16d nr_files_per_directory: 256fpd cpufreq_governor: performance |
lkp-csl-2ap1 |
currently not merged |
|
fsmark.files_per_sec |
iterations: 1x nr_threads: 64t disk: 1BRD_48G fs: btrfs filesize: 4M test_size: 24G sync_method: NoSync cpufreq_governor: performance |
lkp-csl-2ap2 |
currently not merged |
|
hackbench.throughput |
nr_threads: 100% iterations: 4 mode: threads ipc: pipe cpufreq_governor: performance |
lkp-csl-2ap4 |
currently not merged |
|
stress-ng.fallocate.ops_per_sec |
nr_threads: 10% disk: 1HDD testtime: 30s class: filesystem cpufreq_governor: performance fs: btrfs |
lkp-csl-2sp7 |
merged at v5.11-rc1 |
|
stress-ng.sendfile.ops_per_sec |
nr_threads: 100% disk: 1HDD testtime: 30s class: pipe cpufreq_governor: performance |
lkp-csl-2sp5 |
merged at v5.11-rc1 |
|
stress-ng.spawn.ops_per_sec |
100%-1HDD-30s-exec_spawn-performance-ucode=0x5003003 |
lkp-csl-2sp5 |
currently not merged |
|
unixbench.score |
runtime: 300s nr_task: 30% test: pipe cpufreq_governor: performance |
lkp-csl-2sp4 |
currently not merged |
|
vm-scalability.throughput |
runtime: 300s test: small-allocs-mt cpufreq_governor: performance |
lkp-csl-2ap4 |
currently not merged |
|
vm-scalability.throughput |
runtime: 300s size: 8T test: anon-cow-seq-mt cpufreq_governor: performance |
lkp-csl-2ap4 |
currently not merged |
|
vm-scalability.throughput |
runtime: 300s test: small-allocs-mt cpufreq_governor: performance |
lkp-csl-2ap4 |
currently not merged |
|
vm-scalability.throughput |
runtime: 300s test: small-allocs-mt cpufreq_governor: performance |
lkp-csl-2ap4 |
merged at v5.11-rc1 |
|
will-it-scale.per_process_ops |
nr_task: 16 mode: process test: poll1 cpufreq_governor: performance |
lkp-csl-2ap3 |
currently not merged |
|
will-it-scale.per_thread_ops |
nr_task: 50% mode: thread test: context_switch1 cpufreq_governor: performance |
lkp-csl-2ap2 |
currently not merged |
-
aim7.jobs-per-min
aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.
-
scenario: disk_rw test on f2fs
Commit c9847a7f94 was reported to have -91.8% regression of aim7.jobs-per-min when comparing to v5.10-rc3.
Correlated commits
c9847a7f94 |
locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED |
branch |
linux-review/Waiman-Long/locking-rwsem-Rework-reader-optimistic-spinning/20201121-122118 |
report |
[locking/rwsem] c9847a7f94: aim7.jobs-per-min -91.8% regression |
test scenario |
disk: 4BRD_12G md: RAID0 fs: f2fs test: sync_disk_rw load: 100 cpufreq_governor: performance |
test machine |
lkp-csl-2sp2 |
status |
currently not merged, no response from author yet |
-
fio.write_iops
fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
-
scenario: randwrite on xfs
Commit 97e8f0134a was reported to have 8.6% improvement of fio.write_iops when comparing to v5.10-rc4.
Correlated commits
97e8f0134a |
x86: rework arch_local_irq_restore() to not use popf |
branch |
linux-review/Juergen-Gross/x86-major-paravirt-cleanup/20201120-194934 |
report |
|
test scenario |
disk: 2pmem fs: xfs mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: randwrite bs: 4k ioengine: sync test_size: 200G cpufreq_governor: performance |
test machine |
Lkp-csl-2sp6 |
status |
currently not merged |
-
vm-scalability
vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. We tested on multiple machines such as HSW EP server, during which we reported improvement on one test scenario.
-
Scenario: cow-seq-mt test
Commit 0dd6d5b8c0 was reported to have 102.9% improvement of vm-scalability.throughput when comparing to v5.10-rc3.
Correlated commits
0dd6d5b8c0 |
locking/qspinlock: Introduce CNA into the slow path of qspinlock |
branch |
linux-review/Alex-Kogan/Add-NUMA-awareness-to-qspinlock/20201118-072506 |
report |
|
test scenario |
runtime: 300s size: 8T test: anon-cow-seq-mt cpufreq_governor: performance |
test machine |
lkp-csl-2ap4 |
status |
currently not merged |
-
Latest Release Performance Comparing
This session gives some information about the performance difference among different kernel releases, especially between v5.10 and v5.9. There are 50+ performance benchmarks running in 0-Day CI, and we selected 9 benchmarks which historically showed the most regressions/improvements reported by 0-Day CI. Some typical configuration/parameters are used to run the test. For some of the regressions from the comparison, 0-Day did not successfully bisect it thus no related report sent out during the release development period, but it is still worth checking. The root cause to cause the regressions won’t be covered in this session.
In the following figures, the value on the Y-axis is the relative performance number. We used the v5.9 data as the base (performance number is 100).
-
test suite: vm-scalability
vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. Below 2 tests show the typical test results.
vm-scalability Test 1 |
vm-scalability Test 2 |
Here are the test configuration and performance test summary for above tests:
vm-scalability Test 1 |
vm-scalability Test 2 |
|
test machine |
model: Coffee Lake brand: Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz cpu_number: 16 memory: 32G |
model: Cascade Lake brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
runtime |
300s |
300s |
size |
2T |
512G |
vm-scalability test parameter |
test case: shm-xread-seq |
test case: anon-wx-rand-mt |
performance summary |
vm-scalability.median on kernel v5.10 has -3.22% regression when comparing to v5.9 |
vm-scalability.throughput on kernel v5.10 has 83.47% improvement when comparing to v5.9 |
-
test suite: will-it-scale
Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.
will-it-scale Test 1 |
Will-it-scale Test 2 |
Here are the parameters and performance test summary for above tests:
will-it-scale Test 1 |
will-it-scale Test 3 |
|
test machine |
model: Cascade Lake brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
model: Haswell-EX brand: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz cpu_number: 144 memory: 512G |
nr_task |
16 |
16 |
will-it-scale test parameter |
mode: process test: pread1 |
mode: process test: read2 |
summary |
will-it-scale.per_thread_ops on kernel v5.10 has -31.26% regression when comparing to v5.9 |
will-it-scale.per_process_ops on kernel v5.10 has 3.6% improvement when comparing to v5.9 |
-
test suite: unixbench
UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.
Unixbench Test 1 |
Unixbench Test 2 |
Here are the test configuration and performance test summary for above tests:
Unixbench Test 1 |
Unixbench Test 2 |
|
test machine |
model: Cascade Lake brand: Intel(R) Xeon(R) CPU @ 2.30GHz cpu_number: 96 memory: 128G |
model: Cascade Lake brand: Intel(R) Xeon(R) CPU @ 2.30GHz cpu_number: 96 memory: 128G |
runtime |
300s |
300s |
nr_task |
30% |
1 |
unixbench test parameter |
test case: context1 |
test case: execl |
performance summary |
unixbench.score on kernel v5.10 has 7.95% improvement when comparing to v5.9 |
unixbench.score on kernel v5.10 has -4.59% regression when comparing to v5.9 |
-
test suite: reaim
reaim updates and improves the existing Open Source AIM 7 benchmark. aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.
reaim Test 1 |
reaim Test 2 |
Here are the test configuration and performance test summary for above tests:
reaim Test 1 |
reaim Test 2 |
|
test machine |
model: Cascade Lake brand: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz cpu_number: 96 memory: 256G |
model: Cascade Lake brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
runtime |
300s |
300s |
nr_task |
100% |
100% |
disk |
1HDD |
No requirement |
fs |
btrfs |
No requirement |
reaim test parameter |
test case: disk |
test case: short |
performance summary |
reaim.jobs_per_min on kernel v5.10 has 19.46% improvement when comparing to v5.9 |
reaim.jobs_per_min on kernel v5.10 has -21.76% regression when comparing to v5.9 |
-
test suite: pigz
pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.
pigz Test 1 |
Here are the test configuration and performance test summary for above tests:
pigz Test 1 |
|
test machine |
model: Knights Mill brand: Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz cpu_number: 288 memory: 80G |
nr_threads |
25% |
pigz Test parameter |
blocksize: 128K |
performance summary |
pigz.throughput on kernel v5.10 has 4.31% improvement when comparing to v5.9 |
-
test suite: netperf
Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.
netperf Test 1 |
netperf Test 2 |
Here are the test configuration and performance test summary for above tests:
netperf Test 1 |
netperf Test 2 |
|
test machine |
model: Cooper Lake brand: Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz cpu_number: 144 memory: 128G |
model: Cascade Lake brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
disable_latency_stats |
1 |
1 |
set_nic_irq_affinity |
1 |
1 |
runtime |
300s |
300s |
nr_threads |
1 |
200% |
ip |
ipv4 |
Ipv4 |
netperf test parameter |
test case: UDP_RR |
test case: SCTP_RR |
performance summary |
netperf.Throughput_tps on kernel v5.10 has -7.59% regression when comparing to v5.9 |
netperf.Throughput_tps on kernel v5.10 has 57.78% improvement when comparing to v5.9 |
-
test suite: hackbench
Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's main job is to create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long it takes for each pair to send data back and forth.
hackbench Test 1 |
hackbench Test 2 |
Here are the test configuration and performance test summary for above tests:
hackbench Test 1 |
hackbench Test 2 |
|
test machine |
model: Cascade Lake brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
model: Cascade Lake brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz cpu_number: 192 memory: 192G |
disable_latency_stats |
1 |
1 |
nr_task |
100% |
100% |
unixbench test parameter |
iterations: 5 mode: process ipc: socket |
mode: threads ipc: pipe |
performance summary |
hackbench.throughput on kernel v5.10 has -39.73% regression when comparing to v5.9 |
hackbench.throughput on kernel v5.10 has 24.94% improvement when comparing to v5.9 |
-
test suite: fio
Fio was originally written to save me the hassle of writing special test case programs when I wanted to test a specific workload, either for performance reasons or to find/reproduce a bug.
fio Test 1
|
fio Test 2 |
Here are the test configuration and performance test summary for above tests:
fio Test 1 |
fio Test 2 |
|
test machine |
model: Cascade Lake brand: Intel(R) Xeon(R) CPU @ 2.20GHz cpu_number: 192 memory: 192G |
model: Cascade Lake brand: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz cpu_number: 96 memory: 256G |
runtime |
300s |
200s |
file system |
xfs |
ext4 |
disk |
1SSD |
2pmem |
boot_params |
No requirement |
bp1_memmap: 104G!8G bp2_memmap: 104G!132G |
nr_task |
8 |
50% |
time_based |
No requirement |
time_based: tb |
fio test parameter |
fio-setup-basic: rw: randread bs: 4k ioengine: sync test_size: 256g |
fio-setup-basic: rw: randwrite bs: 4k ioengine: libaio test_size: 200G |
performance summary |
fio.read_iops on kernel v5.10 has 3.23% improvement when comparing to v5.9 |
fio.write_bw_MBps on kernel v5.10 has 243.84% improvement when comparing to v5.9 |
-
test suite: ebizzy
ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.
ebizzy Test 1
|
Here are the test configuration and performance test summary for above test:
ebizzy Test 1 |
|
test machine |
model: Knights Mill brand: Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz cpu_number: 288 memory: 80G |
transparent_hugepage |
No requirement |
nr_threads |
200% |
iterations |
100x |
ebizzy test parameter |
duration: 10s |
performance summary |
ebizzy.throughput on kernel v5.10 is almost the same as that in v5.9 |
-
Test Machines
-
IVB Desktop
-
model |
Ivy Bridge |
brand |
Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz |
cpu number |
8 |
memory |
16G |
model |
Ivy Bridge |
brand |
Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz |
cpu number |
4 |
memory |
8G |
-
SKL SP
model |
Skylake |
brand |
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz |
cpu number |
80 |
memory |
64G |
-
BDW EP
model |
Broadwell-EP |
brand |
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz |
cpu number |
88 |
memory |
128G |
-
HSW EP
model |
Haswell-EP |
brand |
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz |
cpu number |
72 |
memory |
128G |
-
IVB EP
model |
Ivy Bridge-EP |
brand |
Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz |
cpu number |
40 |
memory |
384G |
model |
Ivytown Ivy Bridge-EP |
brand |
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz |
cpu number |
48 |
memory |
64G |
-
HSX EX
model |
Brickland Haswell-EX |
brand |
Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz |
cpu number |
144 |
memory |
512G |