Sorry, you need to enable JavaScript to visit this website.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-DAY CI LINUX KERNEL PERFORMANCE REPORT (V5.16)

BY Beibei Si ON Jan 11, 2022

1.            Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

        Section 2, test parameter description

        Section 3, merged regressions and improvements in v5.16 release candidates

        Section 4, captured regressions and improvements by shift-left testing during developers’ and maintainers’ tree during v5.16 release cycle

        Section 5, performance comparison among different kernel releases

        Section 6, test machine list

 

2.            Test parameters descriptions

Here are the descriptions for each parameter/field used in the tests.

 

Classification

Name

Description

General

runtime

Run the test case within a certain time period (seconds or minutes)

 

nr_task

If it is an integer, which means the number of processes/threads (to run the workload) of this job. Default is 1.

If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number

 

nr_threads

Alias of nr_task

 

iterations

Number to repeat this job

 

test_size

Test disk size or memory size

 

set_nic_irq_affinity

Set NIC interrupt affinity

 

disable_latency_stats

Latency_stats may introduce too much noise if there are too many context switches, allow to disable it

 

 

transparent_hugepage

Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage)

 

boot_params:bp1_memmap

Boot parameters of memmap

 

disk:nr_pmem

number of pmem partitions used by test

 

swap:priority

Priority means the  priority  of  the  swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value.

Test Machine

model

Name of Intel processor microarchitecture

 

brand

Brand name of cpu

 

cpu_number

Number of cpu

 

memory

Size of memory

 

3.            Linux Kernel v5.16 Release Test

Linus Torvalds has released the 5.16 kernel, as expected. Significant changes in 5.16 include the futex_waitv() system call, cluster-aware CPU scheduling, some internal memcpy() hardening, memory folios, the DAMON operating schemes user-space memory-management mechanism, and much more. See the LWN merge-window summaries (part 1, part 2) and the KernelNewbies 5.16 page for details.

0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 8 regressions and 7 improvements during feature development phase for v5.16. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.

3.1.            Observation Summary

0-Day CI observed 8 regressions and 7 improvements during the feature development phase for v5.16, which is in the time frame from v5.16-rc1 to v5.16 release.

Test Indicator

Mail

First bad commit

Test Scenario

Test Machine

Development Base

Status

fxmark.ssd_f2fs_dbench_client_4_bufferedio.works/sec

[block] cb2ac2912a: -66.0% regression

cb2ac2912a ("block: reduce kblockd_mod_delayed_work_on() CPU consumption")

disk: 1SSD

media: ssd

test: dbench_client

fstype: f2fs

directio: bufferedio

cpufreq_governor: performance

lkp-snr-a1

v5.16-rc1

 

merged at v5.16-rc1, author accepted the regression, fixed patch 87959fa16c was merged at v5.16-rc6

netperf.Throughput_tps

[mm/vmscan] 8cd7c588de: -10.5% regression

8cd7c588de ("mm/vmscan: throttle reclaim until some writeback completes if congested")

ip: ipv4

runtime: 300s

nr_threads: 16

cluster: cs-localhost

test: TCP_CRR

cpufreq_governor: performance

lkp-cpl-4sp1

v5.15

 

merged at v5.16-rc1, no response from the author

stress-ng.bigheap.ops_per_sec

[mm/vmscan] a19594ca4a: -20.1% regression

a19594ca4a ("mm/vmscan: increase the timeout if page reclaim is not making progress")

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: ext4

class: vm

test: bigheap

cpufreq_governor: performance

lkp-csl-2sp7

v5.15

 

merged at v5.16-rc1, author accepted the regression and is in progress to fix it

stress-ng.lockf.ops_per_sec

[mm, slub] b47291ef02: -24.0% regression

b47291ef02 ("mm, slub: change percpu partial accounting from objects to pages")

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: xfs

class: filesystem

test: lockf

cpufreq_governor: performance

lkp-csl-2sp7

v5.15

 

merged at v5.16-rc1, no response from the author

will-it-scale.per_process_ops

[x86/asm] 0507503671: -4.9% regression

0507503671 ("x86/asm: Avoid adding register pressure for the init case in static_cpu_has()")

nr_task: 50%

mode: process

test: mmap2

cpufreq_governor: performance

lkp-icl-2sp2

v5.15-rc1

merged at v5.16-rc1, author doubt the regression and 0-Day CI team is following up

will-it-scale.per_thread_ops

[hugetlbfs] a4a118f2ee: -14.9% regression

a4a118f2ee ("hugetlbfs: flush TLBs correctly after huge_pmd_unshare")

nr_task: 100%

mode: thread

test: context_switch1

cpufreq_governor: performance

lkp-skl-fpga01

v5.16-rc2

merged at v5.16-rc3, author accepted the regression and thought it expected

will-it-scale.per_thread_ops

[x86/signal] 3aac3ebea0: -11.9% regression

3aac3ebea0 ("x86/signal: Implement sigaltstack size validation")

nr_task: 16

mode: thread

test: signal1

cpufreq_governor: performance

lkp-hsw-4ex1

v5.15-rc5

merged at v5.16-rc1, author doubt the regression and 0-Day CI team is following up

will-it-scale.per_thread_ops

[fget] 054aa8d439: -5.7% regression

054aa8d439 ("fget: check that the fd still exists after getting a ref to it")

nr_task: 50%

mode: thread

test: poll2

cpufreq_governor: performance

lkp-ivb-2ep1

v5.16-rc3

merged to mainline at v5.16-rc4, author accepted the regression,  fixed patch e386dfc56f was merged at v5.16-rc6

Improvement

 

 

 

 

 

 

aim7.jobs-per-min

[sched/fair] c5b0a7eefc: 9.6% improvement

c5b0a7eefc ("sched/fair: Remove sysctl_sched_migration_cost condition")

disk: 1BRD_48G

fs: xfs

test: sync_disk_rw

load: 600

cpufreq_governor: performance

lkp-cpl-4sp1

v5.15-rc4

merged at v5.16-rc1

fio.write_iops

[block] dc5fc361d8: 9.8% improvement

dc5fc361d8 ("block: attempt direct issue of plug list")

disk: 1SSD

fs: xfs

runtime: 300s

nr_task: 32

rw: randwrite

bs: 4k

ioengine: sync

test_size: 256g

cpufreq_governor: performance

lkp-csl-2ap1

v5.15-rc6

merged at v5.16-rc1

hackbench.throughput

[memcg, kmem] 58056f7750: 10.3% improvement

58056f7750 ("memcg, kmem: further deprecate kmem.limit_in_bytes")

nr_threads: 100%

iterations: 4

mode: process

ipc: socket

cpufreq_governor: performance

lkp-cpl-4sp1

v5.15

merged at v5.16-rc1

stress-ng.flock.ops_per_sec

[locks] 90f7d7a0d0: 24.2% improvement

90f7d7a0d0 ("locks: remove LOCK_MAND flock lock support")

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: btrfs

class: filesystem

test: flock

cpufreq_governor: performance

lkp-csl-2sp7

v5.14

merged at v5.16-rc1

stress-ng.loop.ops_per_sec

[loop] e3f9387aea: 78.9% improvement

e3f9387aea ("loop: Use pr_warn_once() for loop_control_remove() warning")

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: device

test: loop

cpufreq_governor: performance

lkp-icl-2sp1

v5.16-rc1

merged at v5.16-rc4

will-it-scale.per_process_ops

[rcu] 925da92ba5: 2.5% improvement

925da92ba5 ("rcu: Avoid unneeded function call in rcu_read_unlock()")

nr_task: 16

mode: process

test: getppid1

cpufreq_governor: performance

lkp-csl-2sp9

v5.15-rc1

merged at v5.16-rc1

will-it-scale.per_thread_ops

[signal] 6c3118c321: 13.2% improvement

6c3118c3212 ("signal: Skip the altstack update when not needed")

nr_task: 100%

mode: thread

test: signal1

cpufreq_governor: performance

lkp-hsw-4ex1

v5.16-rc4

merged at v5.16-rc6

3.2.            Fxmark.ssd_f2fs_dbench_client_4_bufferedio.works/sec

FxMark is a filesystem benchmark that test multicore scalability.

3.2.1   Scenario: dbench_client on f2fs

 

Commit cb2ac2912a was reported to have -66.0% regression of fxmark.ssd_f2fs_dbench_client_4_bufferedio.works/sec when comparing to v5.16-rc1. It was merged to mainline at v5.16-rc6.

 

Correlated commits

cb2ac2912a

block: reduce kblockd_mod_delayed_work_on() CPU consumption

branch

linus/master

report

[block] cb2ac2912a: -66.0% regression

test scenario

disk: 1SSD

media: ssd

test: dbench_client

fstype: f2fs

directio: bufferedio

cpufreq_governor: performance

test machine

lkp-snr-a1

status

merged at v5.16-rc1, author accepted the regression and the reverted patch 87959fa16c has been merged at v5.16-rc6

 

3.3.            Will-it-scale.per_thread_ops

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

 

3.2.2   Scenario: thread poll2

 

Commit 054aa8d439 was reported to have -5.7% regression of will-it-scale.per_thread_ops when comparing to v5.16-rc3. It was merged to mainline at v5.16-rc4.

 

Correlated commits

054aa8d439

fget: check that the fd still exists after getting a ref to it

branch

linus/master

report

[fget] 054aa8d439: -5.7% regression

test scenario

nr_task: 50%

mode: thread

test: poll2

cpufreq_governor: performance

test machine

lkp-ivb-2ep1

status

merged to mainline at v5.16-rc4, author accepted the regression  and the fix patch e386dfc56f has been merged at v5.16-rc6

3.4.            Stress-ng.loop.ops_per_sec

Stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

3.2.3   Scenario: loop test on localhost

 

Commit e3f9387aea was reported to have 78.9% improvement of stress-ng.loop.ops_per_sec when comparing to v5.16-rc1. It was merged to mainline at v5.16-rc4.

 

Correlated commits

e3f9387aea

loop: Use pr_warn_once() for loop_control_remove() warning

branch

linux/master

report

[loop] e3f9387aea: 78.9% improvement

test scenario

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: device

test: loop

cpufreq_governor: performance

test machine

lkp-icl-2sp1

status

merged at v5.16-rc4

 

4.            Shift-Left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v5.16 release cycle, 0-Day CI had reported 5 major performance regressions and 7 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized in the report summary section.

4.1         Report Summary

0-Day CI had reported 5 performance regressions and 7 improvements by doing shift-left testing on developer and maintainer repos.

 

Test Indicator

Mail

First bad commit

Test Scenario

Test Machine

Status

aim7.jobs-per-min

[f2fs] e029ce2460: -35.8% regression

e029ce2460 ("[PATCH 2/6] f2fs: do not expose unwritten blocks to user by DIO")

disk: 1BRD_48G

fs: f2fs

test: disk_cp

load: 3000

cpufreq_governor: performance

lkp-icl-2sp1

currently not merged, no response from the author

aim7.jobs-per-min

[f2fs] 4fa18391ae: -35.5% regression

4fa18391ae ("f2fs: do not expose unwritten blocks to user by DIO")

disk: 1BRD_48G

fs: f2fs

test: disk_cp

load: 3000

cpufreq_governor: performance

lkp-cpl-4sp1

currently not merged, no response from the author

phoronix-test-suite.tiobench.RandomWrite.64MB.8.mb_s

[sched/fair] b4d95a034c: -26.3% regression

b4d95a034c ("[PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs")

test: tiobench-1.3.1

option_a: Random Write

option_b: 64MB

option_c: 8

cpufreq_governor: performance

lkp-csl-2sp8

currently not merged, author accepted the regression and is in progress to fix it

stress-ng.sigsuspend.ops_per_sec

[sched/fair] 3b238896b2: -94.4% regression

3b238896b2 ("sched/fair: Make place_entity() use wall-time")

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: interrupt

test: sigsuspend

cpufreq_governor: performance

lkp-csl-2sp7

currently not merged, no response from the author

stress-ng.sockdiag.ops_per_sec

[af_unix] afd20b9290: -26.3% regression

afd20b9290 ("af_unix: Replace the big lock with small locks.")

nr_threads: 100%

testtime: 60s

class: network

test: sockdiag

cpufreq_governor: performance

lkp-icl-2sp6

currently not merged, no response from the author

Improvement

 

 

 

 

 

hackbench.throughput

[sched/fair] b08e888d97: 46.6% improvement

b08e888d97 ("sched/fair: Force progress on min_vruntime")

nr_threads: 100%

iterations: 4

mode: threads

ipc: pipe

cpufreq_governor: performance

lkp-icl-2sp2

currently not merged

netperf.Throughput_Mbps

[tcp] f35f821935: 2.7% improvement

f35f821935 ("tcp: defer skb freeing after socket lock is released")

ip: ipv4

runtime: 900s

nr_threads: 25%

cluster: cs-localhost

test: TCP_MAERTS

cpufreq_governor: performance

lkp-icl-2sp2

currently not merged

phoronix-test-suite.neatbench.CPU.fps

[sched/fair] 5235451f72: 19.7% improvement

5235451f72 ("sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs")

test: neatbench-1.0.4

option_a: CPU

cpufreq_governor: performance

lkp-csl-2sp8

currently not merged

stress-ng.ioprio.ops_per_sec

[f2fs] d4dd19ec1e: 418.8% improvement

d4dd19ec1e ("f2fs: do not expose unwritten blocks to user by DIO")

nr_threads: 10%

disk: 1HDD

testtime: 60s

fs: f2fs

class: filesystem

test: ioprio

cpufreq_governor: performance

lkp-csl-2sp7

currently not merged

stress-ng.mcontend.ops_per_sec

[sched/fair] 9d1ca1b43c: 32.1% improvement

9d1ca1b43c ("sched/fair: Simple runqueue order on migrate")

nr_threads: 100%

testtime: 60s

class: memory

test: mcontend

cpufreq_governor: performance

lkp-icl-2sp6

currently not merged

stress-ng.sem.ops_per_sec

[sched/fair] 8d0920b981: 11.9% improvement

8d0920b981 ("[PATCH] sched/fair: Fix detection

of per-CPU kthreads waking a task")

nr_threads: 100%testtime: 60ssc_pid_max: 4194304

class: scheduler

test: sem

cpufreq_governor: performance

lkp-csl-2sp7

currently not merged

unixbench.score

[tracing] 85c62c8c37: 2.4% improvement

85c62c8c37 ("tracing: Have existing event_command.parse() implementations use helpers")

runtime: 300s

nr_task: 30%

test: pipe

cpufreq_governor: performance

lkp-icl-2sp2

currently not merged

4.2         Stress-ng.sigsuspend.ops_per_sec

Stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

 

4.2.1   Scenario: sigsuspend test on localhost

Commit 3b238896b2 was reported to have -94.4% regression of stress-ng.sigsuspend.ops_per_sec when comparing to v5.16-rc6.

 

Correlated commits

3b238896b2

sched/fair: Make place_entity() use wall-time

branch

peterz-queue/sched/wip.migrate

report

[sched/fair] 3b238896b2: -94.4% regression

test scenario

nr_threads: 100%

disk: 1HDD

testtime: 60s

class: interrupt

test: sigsuspend

cpufreq_governor: performance

test machine

lkp-csl-2sp7

status

currently not merged, no response from author

4.3         Phoronix-test-suite.tiobench.RandomWrite.64MB.8.mb_s

 

Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added.

4.3.1   Scenario: tiobench-1.3.1 test

 

b4d95a034c was reported to have -26.3% regression of phoronix-test-suite.tiobench.RandomWrite.64MB.8.mb_s when comparing to v5.16-rc1.

 

Correlated commits

b4d95a034c

sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs

branch

linux-review/Mel-Gorman/Adjust-NUMA-imbalance-for-multiple-LLCs/20211125-232336

report

[sched/fair] b4d95a034c: -26.3% regression

test scenario

test: tiobench-1.3.1

option_a: Random Write

option_b: 64MB

option_c: 8

cpufreq_governor: performance

test machine

lkp-csl-2sp8

status

currently not merged, author accepted the regression and is in progress to fix it

4.4         Hackbench.throughput

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's  main job  is  to  create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long  it takes for each pair to send data back and forth.

4.4.1   Scenario: throughput on localhost

 

Commit b08e888d97 was reported to have 46.6% improvement of hackbench.throughput when comparing to v5.16-rc6.

 

Correlated commits

b08e888d97

sched/fair: Force progress on min_vruntime

branch

peterz-queue/sched/wip.migrate

report

[sched/fair] b08e888d97: 46.6% improvement

test scenario

nr_threads: 100%

iterations: 4

mode: threads

ipc: pipe

cpufreq_governor: performance

test machine

lkp-icl-2sp2

status

currently not merged

5        Latest Release Performance Comparing

 

This session gives some information about the performance difference among different kernel releases, especially between v5.16 and v5.12. There are 50+ performance benchmarks running in 0-Day CI, and we selected 9 benchmarks which historically showed the most regressions/improvements reported by 0-Day CI. Some typical configuration/parameters are used to run the test. For some of the regressions from the comparison, 0-Day did not successfully bisect it thus no related report sent out during the release development period, but it is still worth checking. The root cause to cause the regressions won’t be covered in this session.

 

In the following figures, the value on the Y-axis is the relative performance number. We used the v5.12 data as the base (performance number is 100).

5.1         test suite: vm-scalability

Vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel.

 

Below 2 tests show the typical test results.

vm-scalability Test 1

 

 

 

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                          

 

vm-scalability Test 1

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

runtime

300s

size

256G

vm-scalability test parameter

test: msync-mt

performance summary

vm-scalability.throughput on kernel v5.16 is almost the same as that in v5.15

 

5.2         test suite: will-it-scale

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

 

 

will-it-scale  Test 1

 

 

 

Here are the parameters and performance test summary for above tests:                                                                                                                                                                                                                         

 

will-it-scale Test 1

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

nr_task

100%

will-it-scale test parameter

mode: process

test: context_switch1

summary

will-it-scale.per_process_ops on kernel v5.16 has 67.22% improvement when comparing to v5.15

5.3         test suite: unixbench

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

Unixbench Test 1

 

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                          

 

Unixbench Test 1

test machine

model: Ice Lake

brand: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz

cpu_number: 128

memory: 256G

runtime

300s

nr_task

1

unixbench test parameter

test: fstime

performance summary

unixbench.score on kernel v5.16 is almost the same as that in v5.15

5.4         test suite: reaim

Reaim updates and improves the existing Open Source AIM 7 benchmark. aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

                                  reaim Test 1

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                     

 

reaim Test 1

test machine

model: Coffee Lake

brand: Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz

cpu_number: 16

memory: 32G

runtime

300s

nr_task

100%

disk

No requirement

fs

No requirement

reaim test parameter

test: dbase

performance  summary

reaim.jobs_per_min_child on kernel v5.16 has 9.83% undefined when comparing to v5.15

 

5.5         test suite: pigz

Pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

pigz Test 1

 

 

Here are the test configuration and performance test summary for above tests:         

 

pigz Test 1

test machine

model: Knights Mill

brand: Intel(R) Xeon Phi(TM) CPU 7255 @ 1.10GHz

cpu_number: 272

memory: 112G

nr_threads

100%

pigz Test parameter

blocksize: 512K

performance  summary

pigz.throughput on kernel v5.16 is almost the same as that in v5.15

5.6         test suite: netperf

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

netperf Test 1

 

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                         

 

netperf Test 1

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

disable_latency_stats

1

set_nic_irq_affinity

1

runtime

300s

nr_threads

200%

ip

ipv4

netperf test parameter

test: SCTP_RR

performance  summary

netperf.Throughput_tps on kernel v5.16 is almost the same as that in v5.15

5.7         test suite: hackbench

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's  main job  is  to  create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long  it takes for each pair to send data back and forth.

hackbench Test 1

 

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                          

 

hackbench Test 1

test machine

model: Cascade Lake

brand: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

cpu_number: 192

memory: 192G

disable_latency_stats

1

nr_threads

100%

unixbench test parameter

mode: process

ipc: pipe

performance summary

hackbench.throughput on kernel v5.16 has 25.5% improvement when comparing to v5.15

5.8         test suite: fio

Fio was originally written to save me the hassle of writing special test case programs when I wanted to test a specific workload, either for performance reasons or to find/reproduce a bug.

fio Test 1

 

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                         

 

fio Test 1

test machine

model: Coffee Lake

brand: Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz

cpu_number: 16

memory: 32G

runtime

300s

file system

btrfs

disk

1HDD

boot_params

No requirement

nr_task

100%

time_based

No requirement

fio test parameter

fio-setup-basic:

  rw: write

  bs: 4k

  ioengine: pvsync

  test_size: 128G

performance  summary

fio.write_bw_MBps on kernel v5.16 has -42.4% regression when comparing to v5.15

5.9         test suite: ebizzy

Ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.

 

                                       ebizzy Test 1

 

 

Here are the test configuration and performance test summary for above test:                                                                                                                                                                                                                         

 

ebizzy Test 1

test machine

model: Snow Ridge

brand: Intel Atom(R) P5362 processor

cpu_number: 24

memory: 64G

transparent_hugepage

No requirement

nr_threads

200%

iterations

100x

ebizzy test parameter

duration: 10s

performance  summary

ebizzy.throughput on kernel v5.16 is almost the same as that in v5.15

6        Test Machines

6.1         IVB Desktop

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

8

memory

16G

 

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

4

memory

8G

 

6.2         SKL SP

model

Skylake

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

80

memory

64G

 

6.3         BDW EP

model

Broadwell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu number

88

memory

128G

 

6.4         HSW EP

model

Haswell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu number

72

memory

128G

 

6.5         IVB EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

cpu number

40

memory

384G

 

model

Ivytown Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

48

memory

64G

 

6.6         HSX EX

model

Brickland Haswell-EX

brand

Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

cpu number

144

memory

512G