Sorry, you need to enable JavaScript to visit this website.

Feedback

Your feedback is important to keep improving our website and offer you a more reliable experience.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

0-Day CI Linux Kernel Performance Report (v4.20)

BY Philip Li ON Jan 07, 2019

0-Day CI Linux Kernel Performance Report (v4.20)

  1. Introduction

0-Day CI is an automated Linux kernel test service that provides comprehensive test coverage of the Linux kernel. It covers kernel build, static analysis, boot, functional, performance and power tests. This report shows the recent observations of kernel performance status on IA platform based on the test results from 0-Day CI service. It is structured in the following manner:

  • Section 2, test parameter description

  • Section 3, merged regressions and improvements in v4.20 release candidates
  • Section 4, captured regressions and improvements by shift-left testing on developers’ and maintainers’ tree during v4.20 release cycle

  • Section 5, performance comparison among different kernel releases

  • Section 6, test machine list

  1. Test parameters

Here are the descriptions for each parameter/field used in the tests.

 

Classification

Name

Description

General

runtime

Run the test case within a certain time periods (seconds or minutes)

nr_task

If it is an integer, means the number of processes/threads (to run the workload) of this job. Default is 1.

If it is a percentage, e.g. 200% means the number of processes/threads is double of cpu number

nr_threads

Alias of nr_task

iterations

Number to repeat this job

test_size

Test disk size or memory size

set_nic_irq_affinity

Set NIC interrupt affinity

disable_latency_stats

Latency_stats may introduce too much noise if there are too many context switches, allow to disable it

transparent_hugepage

Set transparent hugepage policy (/sys/kernel/mm/transparent_hugepage)

boot_params:bp1_memmap

Boot parameters of memmap

disk:nr_pmem

number of pmem partitions used by test

swap:priority

Priority means the  priority of the swap device. priority is a value between -1 and 32767, the default is -1 and higher priority with higher value.

Test Machine

model

Name of Intel processor microarchitecture

brand

Brand name of cpu

cpu_number

Number of cpu

memory

Size of memory

  1. Linux Kernel v4.20 Test

The v4.20 release of the Linux kernel was on December 23. Some of the headline features in 4.20 include network flow dissectors in BPF, the taprio traffic scheduler, peer-to-peer DMA support in the PCI layer, C-SKY architecture support, the pressure-stall instrumentation mechanism, the XArray data structure, and much more. The KernelNewbies 4.20 page is coming together with more information. 0-Day CI monitored the release closely to trace down the performance status on IA platform. 0-Day observed 6 regressions and 5 improvements during feature development phase for v4.20. We will share more detailed information together with correlated patches that led to the results. Note that the assessment is limited by the test coverage 0-Day has now. The list is summarized in the observation summary section.

  1. Observation Summary

0-Day CI observed 6 regressions and 5 improvements during feature development phase for v4.20, which is in the time frame from v4.20-rc1 to v4.20 release.

Test Indicator

Report

Test Scenario

Test Machine

Development Base

Status

blogbench.read_score

[Btrfs] 5239834016: -7.2% regression

disk: 1SSD
fs: btrfs
ucode: 0xb00002e
cpufreq_governor: performance


 

lkp-bdw-ep3b

v4.19-rc8

merged at v4.20-rc1, it's still acceptable

netperf.Throughput_Mbps

LKP] [tcp] a337531b94: -6.1% regression

ip: ipv4

runtime: 900s

nr_threads: 200%

cluster: cs-localhost

test: TCP_STREAM

ucode: 0x7000013

cpufreq_governor: performance

lkp-bdw-de1

v4.19-rc5

merged at v4.20-rc1, but fixed with commit 041a14d267

netperf.Throughput_tps

[LKP] [function_graph] 8114865ff8: -1.1% regression

runtime: 300s

size: 8T

test: anon-cow-seq

cpufreq_governor: performance

lkp-hsw-d01

v4.20-rc3

Merged at v4.20-rc5,  the author does not think  it’s a true regression

unixbench.score

[LKP] [x86/mm/tlb] 5462bc3a9a: 7.0% improvement

runtime: 300s

nr_task: 1

test: context1

ucode: 0x20

cpufreq_governor: performance

lkp-ivb-d01

v4.19-rc5

Merged at v4.20-rc1


 

unixbench.score

[LKP] [sched/fair] c469933e77: 9.3% improvement

runtime: 300s

nr_task: 100%

test: execl

ucode: 0x7000013

cpufreq_governor: performance

lkp-bdw-de1

v4.19

Merged at v4.20-rc3

vm-scalability.throughput

[LKP] [mm] ac5b2c1891: -61.3% regression

runtime: 300

thp_enabled: always

thp_defrag: always

nr_task: 32

nr_ssd: 1

test: swap-w-seq

ucode: 0x3d

cpufreq_governor: performance

lkp-hsw-ep4

v4.19

Merged at v4.20-rc1, but reverted by commit 2f0799a0ff

will-it-scale.per_process_ops

[LKP] [x86/xen] f030aade91: 5.6% improvement

nr_task: 100%

mode: process

test: poll2

ucode: 0x20

cpufreq_governor: performance

lkp-ivb-d01

v4.19-rc2

Merged at v4.20-rc1

will-it-scale.per_process_ops

[LKP] [cpuidle] 23e8ceb9ce: 8.9% improvement

nr_task: 16

mode: process

test: context_switch1

cpufreq_governor: performance

lkp-bdw-ep3d

v4.19-rc3

Merged at v4.20-rc1

will-it-scale.per_thread_ops

[LKP] [fsnotify] 60f7ed8c7c: -5.9% regression

nr_task: 16
mode: thread
test: unlink2
cpufreq_governor: performance

lkp-bdw-ep3d

v4.19-rc2

Merged at v4.20-rc1, but fixed by the author

will-it-scale.per_thread_ops

[LKP] [x86/pti/64] bf904d2762: 1.7% improvement

nr_task: 16

mode: thread

test: pwrite1

cpufreq_governor: performance

lkp-bdw-ep3d

v4.19-rc2

Merged at v4.20-rc1

will-it-scale.per_thread_ops

[LKP] [mm] 9bc8039e71: -64.1% regression

nr_task: 100%

mode: thread

test: brk1

ucode: 0x20

cpufreq_governor: performance

lkp-ivb-d01

v4.19

Merged at v4.20-rc1,but  the author is working on the fix

  1. blogbench.read_score

Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server. It stresses the filesystem with multiple threads performing random reads, writes and rewrites in order to get a realistic idea of the scalability and the concurrency a system can handle.

scenario: stress write test on btrfs

 

Commit 5239834016 was reported to have -7.2% regression of blogbench.read_score when comparing to v4.19-rc8." It was merged to mainline at v4.20-rc1."

 

Correlated commits

5239834016

Btrfs: kill btrfs_clear_path_blocking

branch

kdave/for-4.20-part1

report

[LKP] [lkp-robot] [Btrfs] 5239834016: blogbench.read_score -7.2% regression

Test Scenario

disk: 1SSD
fs: btrfs

Test Machine

lkp-bdw-ep3b

status

merged at v4.20-rc1, it's still acceptable

 

  1. netperf.Throughput_Mbps

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

scenario: ipv4 TCP STREAM test in local

 

Commit a337531b94 was reported to have -6.1% regression of netperf.Throughput_Mbps when comparing to v4.19-rc5." It was merged to mainline at v4.20-rc1."

 

Correlated commits

a337531b94

tcp: up initial rmem to 128KB and SYN rwin to around 64KB

branch

net-next/master

report

[LKP] [tcp] a337531b94: netperf.Throughput_Mbps -6.1% regression

Test Scenario

ip: ipv4
runtime: 900s
nr_threads: 200%
cluster: cs-localhost
test: TCP_STREAM

Test Machine

lkp-bdw-de1

status

merged at v4.20-rc1, but  fixed with commit 041a14d267

 

  1. vm-scalability.throughput

vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. We tested on multiple machines such as HSW EP server, during which we reported improvement on one test scenario.

Scenario: swap-w-seq test

 

Commit ac5b2c1891 was reported to have -61.3% regression of vm-scalability.throughput when comparing to v4.19." It was merged to mainline at v4.20-rc1."

 

Correlated commits

ac5b2c1891

mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

branch

linus/master

report

[LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

Test Scenario

runtime: 300
thp_enabled: always
thp_defrag: always
nr_task: 32
nr_ssd: 1
test: swap-w-seq

Test Machine

lkp-hsw-ep4

status

merged at v4.20-rc1, but reverted by commit 2f0799a0ff

 

  1. will-it-scale.per_thread_ops

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

Scenario: thread brk1

 

Commit 9bc8039e71 was reported to have -64.1% regression of will-it-scale.per_thread_ops when comparing to v4.19." It was merged to mainline at v4.20-rc1."

 

Correlated commits

9bc8039e71

mm: brk: downgrade mmap_sem to read when shrinking

branch

linus/master

report

[LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

Test Scenario

nr_task: 100%
mode: thread
test: brk1

Test Machine

lkp-ivb-d01

status

merged at v4.20-rc1 and  the author is working on the fix


 

 

  1. Shift-Left Testing

Beyond testing trees in the upstream kernel, 0-Day CI also tests developers’ and maintainers’ trees, which can catch issues earlier and reduce wider impact. We call it “shift-left” testing. During the v4.20 release cycle, 0-Day CI had reported 15 major performance regressions and 4 major improvements by doing shift-left testing. We will share more detailed information together with possible code changes that led to this result for some of these, though the assessment is limited by the test coverage we have now. The whole list is summarized at report summary section.

  1. Report Summary

0-Day CI had reported 15 performance regressions and 4 improvements by doing shift-left testing on developer and maintainer repos.

 

Test Indicator

Mail

Test Scenario

Test Machine

Status

aim7.jobs-per-min

[LKP] [f2fs] 089842de57: 15.4% improvement

disk: 4BRD_12G

md: RAID1

fs: f2fs

test: disk_rw

load: 3000

cpufreq_governor: performance

lkp-ivb-ep01

Currently not merged yet, maintainer  thought it shouldn’t be a improvement

fio.read_bw_MBps

[mm] a40354cbd6: -33.9% regression

disk: 2pmem

fs: ext4

runtime: 200s

nr_task: 50%

time_based: tb

rw: read

bs: 4k

ioengine: sync

test_size: 200G

ucode: 0x3d

cpufreq_governor: performance

lkp-hsw-ep6

Currently not merged, no response from author yet

fio.read_bw_MBps

[LKP] [mm] 724b4de402: -2.7% regression

disk: 2pmem

fs: btrfs

runtime: 200s

nr_task: 50%

time_based: tb

rw: randread

bs: 4k

ioengine: libaio

test_size: 100G

ucode: 0x3d

cpufreq_governor: performance

lkp-hsw-ep6

Currently not merged yet, no response from author yet

fio.write_bw_MBps

[LKP] [btrfs] e86d34808b: -7.0% regression

disk: 1SSD

fs: btrfs

runtime: 300s

nr_task: 8

rw: randwrite

bs: 4k

ioengine: sync

test_size: 512g

ucode: 0x3d

cpufreq_governor: performance

lkp-hsw-ep2

Currently not merged yet, no response from author yet

fio.write_bw_MBps

[LKP] [btrfs] 4fd93529bc: -6.9% regression

disk: 1SSD

fs: btrfs

runtime: 300s

nr_task: 8

rw: randwrite

bs: 4k

ioengine: sync

test_size: 512g

ucode: 0x3d

cpufreq_governor: performance

lkp-hsw-ep2


 

Currently not merged yet, no response from author yet

fio.write_clat_95%_us

[mm] 25c79c10c8: +298.6% regression

runtime: 300s

disk: 1SSD

fs: ext4

nr_task: 64

rw: randwrite

bs: 4k

ioengine: sync

test_size: 400g

ucode: 0x7000013

cpufreq_governor: performance

lkp-bdw-de1

Currently not merged yet, no response from author yet

netperf.Throughput_Mbps

[LKP] [splice] d5b5602899: -40.3% regression

ip: ipv4

runtime: 300s

nr_threads: 200%

cluster: cs-localhost

send_size: 5K

test: TCP_SENDFILE

ucode: 0x25

cpufreq_governor: performance

lkp-hsw-d01

Currently not merged yet, the author is working on it.

netperf.Throughput_total_tps

[LKP] [vfs] 38f27d0787: -61.6% regression

ip: ipv4

runtime: 300s

nr_threads: 50%

cluster: cs-localhost

test: TCP_CRR

cpufreq_governor: performance

lkp-bdw-ep2

Currently not merged yet, no response from author yet

netperf.Throughput_tps

[LKP] [function_graph] 8114865ff8: -1.1% regression

ip: ipv4

runtime: 300s

nr_threads: 200%

cluster: cs-localhost

test: TCP_CRR

ucode: 0x25

cpufreq_governor: performance

lkp-hsw-d01

Merged at v4.20-rc5, the author thinks it’s a false positive

stress-ng.eventfd.ops

[LKP] [fs/locks] fd7732e033: 21.9% improvement

nr_threads: 100%

disk: 1HDD

testtime: 1s

class: filesystem

ucode: 0xb00002e

cpufreq_governor: performance

lkp-bdw-ep3

Currently not merged yet, the author thought shouldn’t use stress-ng as a benchmark

unixbench.score

[LKP] [kernel/sched] f880552d38: -3.9% regression

runtime: 300s

nr_task: 100%

test: spawn

ucode: 0x7000013

cpufreq_governor: performance

lkp-bdw-de1

Currently not merged yet, no response from author yet

unixbench.score

[LKP] [sched/fair] bbc640be45: 10.0% improvement

runtime: 300s

nr_task: 100%

test: execl

ucode: 0x7000013

cpufreq_governor: performance

lkp-bdw-de1

Currently not merged yet, no response from author yet

unixbench.score

[LKP] [sched/fair] c469933e77: 9.3% improvement

runtime: 300s

nr_task: 100%

test: execl

ucode: 0x7000013

cpufreq_governor: performance

lkp-bdw-de1

Merged at v4.20-rc3

vm-scalability.throughput

[LKP] [mm] a2a937d4c4: -29.6% regression

runtime: 300s

test: lru-file-mmap-read

ucode: 0x3d

cpufreq_governor: performance

lkp-hsw-ep5

Currently not merged yet, no response from author yet

vm-scalability.throughput

[LKP] [filemap] 48dc11646a: -9.5% regression

runtime: 300s

test: lru-file-mmap-read-rand

cpufreq_governor: performance

lkp-bdw-ep2

Currently not merged yet, no response from author yet

vm-scalability.throughput

[LKP] [hugetlbfs] 9c83282117: -4.3% regression

runtime: 300s

size: 8T

test: anon-cow-seq-hugetlb

cpufreq_governor: performance

ucode: 0x200004d

lkp-skl-2sp4

Currently not merged yet, no response from author yet

will-it-scale.per_process_ops

[LKP] [fs/locks] 816f2fb5a2: -93.1% regression

nr_task: 100%

mode: process

test: lock1

ucode: 0xb00002e

cpufreq_governor: performance

lkp-bdw-ep3b

Currently not merged yet, no response from author yet

will-it-scale.per_thread_ops

[LKP] [fs/locks] 3c19f2312f: -65.2% regression

nr_task: 100%

mode: thread

test: lock1

ucode: 0xb00002e

cpufreq_governor: performance

lkp-bdw-ep3b

Currently not merged yet, the issue has been fixed

will-it-scale.per_thread_ops

[fs/locks] 83b381078b: -62.5% regression

nr_task: 16

mode: thread

test: lock1

cpufreq_governor: performance

lkp-bdw-ep3d

Currently not merged yet, the issue has been fixed

  1. aim7.jobs-per-min

aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

 

  1. scenario: disk_rw test on f2fs

 

Commit 089842de57 was reported to have of aim7.jobs-per-min when comparing to v4.20-rc4."

 

Correlated commits

089842de57

f2fs: remove codes of unused wio_mutex

branch

f2fs/dev-test

report

[LKP] [f2fs] 089842de57: aim7.jobs-per-min 15.4% improvement

Test Scenario

disk: 4BRD_12G
md: RAID1
fs: f2fs
test: disk_rw
load: 3000

Test Machine

lkp-ivb-ep01

status

Not merged at v4.20, but maintainer  thought it shouldn’t be a improvement

  1. unixbench.score

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

  1. scenario: execl test

 

Commit c469933e77 was reported to have of unixbench.score when comparing to v4.19." It was merged to mainline at v4.20-rc3."

 

Correlated commits

c469933e77

sched/fair: Fix cpu_util_wake() for 'execl' type workloads

branch

linus/master

report

[LKP] [sched/fair] c469933e77: unixbench.score 9.3% improvement

Test Scenario

runtime: 300s

nr_task: 100%

test: execl

Test Machine

lkp-bdw-de1

status

merged at v4.20-rc3

 

  1. vm-scalability.throughput

vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. We tested on multiple machines such as HSW EP server, during which we reported improvement on one test scenario.

scenario: ipv4 TCP STREAM test in local

Commit 48dc11646a was reported to have -9.5% regression of vm-scalability.throughput when comparing to v4.20-rc6."

 

Correlated commits

48dc11646a

filemap: drop the mmap_sem for all blocking operations

branch

linux-review/UPDATE-20181214-093104/Josef-Bacik/drop-the-mmap_sem-when-doing-IO-in-the-fault-path/20181214-073658

report

[LKP] [filemap] 48dc11646a: vm-scalability.throughput -9.5% regression

Test Scenario

runtime: 300s

test: lru-file-mmap-read-rand

Test Machine

lkp-bdw-ep2

status

Not merged at v4.20, and no response from author yet

 

  1. will-it-scale.per_thread_ops

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

scenario: thread lock1

Commit 83b381078b was reported to have -62.5% regression of will-it-scale.per_thread_ops when comparing to v4.20-rc2."

 

Correlated commits

83b381078b

fs/locks: always delete_block after waiting.

branch

jlayton/locks-next

report

[LKP] [fs/locks] 83b381078b: will-it-scale.per_thread_ops -62.5% regression

Test Scenario

nr_task: 16

mode: thread

test: lock1

Test Machine

lkp-bdw-ep3d

status

Not merged at v4.20, and the issue has been fixed

 

  1. Latest Release Performance Comparing

 

This session gives some information about the performance difference among different kernel releases, especially between v4.20 and v4.19. There are 50+ performance benchmarks running in 0-Day CI, and we selected 9 benchmarks which historically showed the most regressions/improvements reported by 0-Day CI. Some typical configuration/parameters are used to run the test. For some of the regressions from the comparing, 0-Day did not successfully bisect it thus no related report sent out during the release development period, but it is still worth to check.  

 

In the following figures, the value on the Y-axis is the relative performance number. We used the v4.19 data as the base (performance number is 100).

  1. test suite: vm-scalability

vm-scalability exercises functions and regions of the mm subsystem of the Linux kernel. Below 4 tests show the typical test results.

 

vm-scalability Test 1

vm-scalability Test 2

vm-scalability Test 3

vm-scalability Test 4

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                           

 

vm-scalability Test 1

vm-scalability Test 2

vm-scalability Test 3

vm-scalability Test 4

Test machine

model: Skylake
brand: Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz
cpu_number: 104
memory: 64G


 

model: Brickland Haswell-EX
brand: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz
cpu_number: 144
memory: 512G

model: Broadwell-EX
brand: Intel(R) Xeon(R) CPU E7-8890 v4 @ 2.20GHz
cpu_number: 160
memory: 256G

model: Skylake-4S
cpu_number: 192
memory: 704G

transparent_hugepage

default system configuration

thp_enabled: never
thp_defrag: never

 

thp_enabled: never
thp_defrag: never

runtime

300s

No requirement

300s

No requirement

test_size

8T

No requirement

8T

No requirement

nr_task

1

8

1

1

boot_params

No memmap for the boot

bp1_memmap: 120G!4G
bp2_memmap: 120G!130G
bp3_memmap: 120G!258G
bp4_memmap: 120G!386G

No memmap for the boot

bp1_memmap: 120G!4G
bp2_memmap: 120G!130G
bp3_memmap: 120G!258G
bp4_memmap: 120G!386G

disk

No pmem for this test

nr_pmem: 4

No pmem for this test

nr_pmem: 4

swap

default priority

priority: 1

default priority

priority: 1

perf-profile

No delay

delay: 20s

No delay

delay: 20s

vm-scalability Test parameter

test case: anon-w-seq-hugetlb

test case: swap-w-seq

test case:

anon-w-seq-mt

test case: swap-w-seq

Performance Summary

vm-scalability.throughput on kernel v4.20 has -7.34% regression when comparing to v4.19

vm-scalability.throughput on kernel v4.20 has -7.31% regression when comparing to v4.19

vm-scalability.throughput on kernel v4.20 has 4.69% improvement when comparing to v4.19

vm-scalability.throughput on kernel v4.20 has 53.84% improvement when comparing to v4.19

 

  1. test suite: will-it-scale

Will-it-scale takes a test case and runs it from 1 through to n parallel copies to see if the test case will scale. It builds both process and threads based tests in order to see any differences between the two.

 

will-it-scale  Test 1

will-it-scale  Test 2



 

will-it-scale  Test 3

will-it-scale  Test 4

 

Here are the parameters and performance test summary for above tests:                                                                                                                                                                                                                          

 

will-it-scale Test 1

will-it-scale Test 2

will-it-scale Test 3

will-it-scale Test 4

Test machine

model: Broadwell-EP

brand: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu_number: 88

memory: 64G

model: Broadwell-EP

brand: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu_number: 88

memory: 64G

model: Ivy Bridge
brand: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz

cpu_number: 8
memory: 16G

model: Broadwell-EP

brand: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu_number: 88

memory: 64G

nr_task

16

100%

100%

50%

will-it-scale Test parameter

mode: process

test case: page_fault1

mode: process

test case: page_fault2

mode: process
test: pthread_mutex1

mode: thread
test: brk1

Summary

will-it-scale.per_process_ops on kernel v4.20 has -49.82% regression when comparing to v4.19

will-it-scale.per_process_ops on kernel v4.20 has -40.75% regression when comparing to v4.19

will-it-scale.per_process_ops on kernel v4.20 has 197.21% improvement when comparing to v4.19

will-it-scale.per_thread_ops on kernel v4.20 has 41.67% improvement when comparing to v4.19

 

  1. test suite: unixbench

UnixBench is a system benchmark to provide a basic indicator of the performance of a Unix-like system.

 

Unixbench Test 1

Unixbench Test 2

Unixbench Test 3

Unixbench Test 4

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                           

 

Unixbench Test 1

Unixbench Test 2

Unixbench Test 3

Unixbench Test 4

Test machine

model: Haswell-EP

brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

cpu_number: 56

memory: 256G

model: Knights Mill

brand: Intel(R) Xeon Phi(TM) CPU 7255 @ 1.10GHz

cpu_number: 272

memory: 112G

model: Broadwell-EP
brand: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
cpu_number: 88
memory: 64G

model: Ivy Bridge

brand: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
cpu_number: 8
memory: 16G

runtime

300

300

300

300

nr_task

100%

1

1

1

unixbench Test parameter

test case: execl

test case: fstime

test case: whetstone-double

test case: dhry2reg

Performance Summary

unixbench.score on kernel v4.20 has -33.72% regression when comparing to v4.19

unixbench.score on kernel v4.20 has -5.91% regression when comparing to v4.19

unixbench.score on kernel v4.20 has 94.96% improvement when comparing to v4.19

unixbench.score on kernel v4.20 has 14.72% improvement when comparing to v4.19

 

  1. test suite: reaim

reaim updates and improves the existing Open Source AIM 7 benchmark. aim7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of a multiuser system.

 

reaim Test 1                             

reaim Test 2


 

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                          

 

reaim Test 1

reaim Test 2

Test machine

model: Haswell-EP

brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

cpu_number: 56

memory: 256G

model: Haswell-EP

brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

cpu_number: 56

memory: 256G

runtime

300s

300s

nr_task

5000

1000

reaim Test parameter

test: shared_memory

nr_job: 5000

test: pipe_cpy

Performance  Summary

reaim.jobs_per_min on kernel v4.20 has -3.75% regression when comparing to v4.19

reaim.jobs_per_min on kernel v4.20 has 3.49% improvement when comparing to v4.19

 

  1. test suite: pigz

pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

pigz Test 1


 

pigz Test 2

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                          

 

pigz Test 1

pigz Test 2

Test machine

model: Broadwell-EP

brand: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu_number: 88

memory: 128G

model: Broadwell-EP

brand: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu_number: 88

memory: 128G

nr_threads

100%

100%

pigz Test parameter

blocksize: 512K

blocksize: 128K

Performance  Summary

pigz.throughput on kernel v4.20 has 0.53% improvement when comparing to v4.19

pigz.throughput on kernel v4.20 has 0.23% improvement when comparing to v4.19

 

  1. test suite: netperf

Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

 

netperf Test 1




 

netperf Test 2



 

netperf Test 3


 

netperf Test 4

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                          

 

netperf Test 1

netperf Test 2

netperf Test 3

netperf Test 4

Test machine

model: Broadwell-DE

brand: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz

cpu_number: 16

memory: 8G

model: Skylake
cpu_number: 104
memory: 192G

model: Haswell-EP

brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

cpu_number: 56

memory: 256G

model: Haswell-EP

brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

cpu_number: 56

memory: 256G

disable_latency_stats

1

1

1

1

set_nic_irq_affinity

1

1

1

1

runtime

900s

300s

300s

300s

nr_threads

200%

200%

25%

25%

ip

ipv4

ipv4

ipv4

ipv4

netperf Test parameter

send_size: 5K
test: TCP_SENDFILE

test: TCP_CRR

test: UDP_RR

test: SCTP_RR

Performance  Summary

netperf.Throughput_Mbps on kernel v4.20 has -47.59% regression when comparing to v4.19

netperf.Throughput_tps on kernel v4.20 has -7.77% regression when comparing to v4.19

netperf.Throughput_tps on kernel v4.20 has 14.52% improvement when comparing to v4.19

netperf.Throughput_tps on kernel v4.20 has 10.56% improvement when comparing to v4.19

 

  1. test suite: hackbench

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. It's  main job is to create a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how long  it takes for each pair to send data back and forth.

hackbench Test 1



 

hackbench Test 2


 

hackbench Test 3

hackbench Test 4

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                          

 

hackbench Test 1

hackbench Test 2

hackbench Test 3

hackbench Test 4

Test machine

model: Haswell-EP

brand: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu_number: 72

memory: 128G

model: Ivy Bridge
brand: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz

cpu_number: 8
memory: 16G

model: Ivy Bridge
brand: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz

cpu_number: 8
memory: 16G

model: Skylake

brand: Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz

cpu_number: 104

memory: 64G

disable_latency_stats

1

1

1

1

nr_threads

50%

50%

1600%

100%

hackbench Test parameter

iterations: 12

mode: process

ipc: pipe

mode: process

ipc: socket

mode: threads
ipc: socket

mode: process

ipc: pipe

Performance  Summary

hackbench.throughput on kernel v4.20 has 3.19% improvement when comparing to v4.19

hackbench.throughput on kernel v4.20 has 3.0% improvement when comparing to v4.19

hackbench.throughput on kernel v4.20 has 2.79% improvement when comparing to v4.19

hackbench.throughput on kernel v4.20 has 2.26% improvement when comparing to v4.19

 

  1. test suite: fio

Fio was originally written to save me the hassle of writing special test case programs when I wanted to test a specific workload, either for performance reasons or to find/reproduce a bug.

fio Test 1

 


 

fio Test 2




 

fio Test 3

 

Here are the test configuration and performance test summary for above tests:                                                                                                                                                                                                                          

 

fio Test 1

fio Test 2

fio Test 3

fio Test 4

Test machine

model: Haswell-EP
brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
cpu_number: 56
memory: 256G

model: Haswell-EP
brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
cpu_number: 56
memory: 256G

model: Haswell-EP
brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
cpu_number: 56
memory: 256G

model: Broadwell-DE

brand: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz

cpu_number: 16

memory: 8G

boot_params

bp1_memmap: 104G!4G
bp2_memmap: 104G!132G

bp1_memmap: 104G!4G
bp2_memmap: 104G!132G

bp1_memmap: 104G!4G
bp2_memmap: 104G!132G

No memmap for the boot

disk

2pmem

2pmem

2pmem

No pmem for this test

runtime

200s

200s

300s

300s

disk

No requirement

No requirement

No requirement

1SSD

file system

ext4

mount_option: dax

btrfs

ext4

mount_option: dax

btrfs

nr_task

50%

50%

50%

8

time_based

tb

tb

tb

No requirement

fio Test parameter

fio-setup-basic:
 rw: rw
 bs: 4k
 ioengine: mmap
 test_size: 200G

fio-setup-basic:

 rw: write

 bs: 4k

 ioengine: libaio

 test_size: 100G

fio-setup-basic:
 rw: randread
 bs: 4k
 ioengine: mmap
 test_size: 200G

fio-setup-basic:

 rw: randwrite

 bs: 4k

 ioengine: sync

 test_size: 400g

Performance  Summary

fio.read_clat_90%_us on kernel v4.20 has 66.67% regression when comparing to v4.19

fio.write_clat_99%_us on kernel v4.20 has -11.01% improvement when comparing to v4.19

fio.read_clat_mean_us on kernel v4.20 has -10.23% improvement when comparing to v4.19

fio.write_bw_MBps on kernel v4.20 has 33.94% improvement when comparing to v4.19

 

  1. test suite: ebizzy

ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.

ebizzy Test 1

 

 

Here are the test configuration and performance test summary for above test:                                                                                                                                                                                                                          

 

ebizzy Test 1

Test machine

model: Broadwell-EX
brand: Intel(R) Xeon(R) CPU E7-8890 v4 @ 2.20GHz
cpu_number: 160
memory: 256G

nr_threads

200%

iterations

100x

ebizzy Test parameter

duration: 10s

Performance  Summary

ebizzy.throughput on kernel v4.20 has 0.08% improvement when comparing to v4.19

 

  1. Test Machines

  1. IVB Desktop

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

8

memory

16G

 

model

Ivy Bridge

brand

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz

cpu number

4

memory

8G

 

  1. SKL Desktop

model

Skylake

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

80

memory

64G

 

  1. BDW EP

model

Broadwell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

cpu number

88

memory

128G

 

  1. HSW EP

model

Haswell-EP

brand

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

cpu number

72

memory

128G

 

  1. IVB EP

model

Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

cpu number

40

memory

384G

 

model

Ivytown Ivy Bridge-EP

brand

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz

cpu number

48

memory

64G

 

  1. HSX EX

model

Brickland Haswell-EX

brand

Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

cpu number

144

memory

512G