Sorry, you need to enable JavaScript to visit this website.

Feedback

Your feedback is important to keep improving our website and offer you a more reliable experience.

Linux Kernel Performance

Linux development evolves rapidly. The performance and scalability of the OS kernel has been a key part of its success. However, discussions have appeared on LKML (Linux Kernel Mailing List) regarding large performance regression between kernel versions. These discussions underscore the need for a systematic and disciplined way to characterize, improve, and test Linux kernel performance. Our goal is to work with the Linux community to further enhance the Linux kernel with consistent performance increases (avoiding degradations) across releases. The information available on this site gives community members better information about what 0-Day and LKP (Linux Kernel Performance) are doing to preserve performance integrity of the kernel.

 

0-Day CI (Continuous Integration) is an automated Linux kernel regression testing service available to the Linux community that performs commit level tests and builds and does static semantics testing of the source and boot testing on various kernel and hardware configurations. 0-Day performance test adds Linux kernel performance/power and functional regression testing after kernel passes the 0-Day build/boot tests. 

LKP (Linux Kernel Performance) test tool (https://github.com/01org/lkp-tests) is a tool integrated by 0-Day CI for performance test. LKP test tool can also be running as a standalone tool. It integrates ~50 popular industry open source benchmarks and provides a standard interface to do installation, execution and result analysis.  

Below are the key test suites used by 0-Day performance test and LKP test tool.

 

### pft

**What is it?**

Pft is originally from Christoph Lameter's "Page Fault Test" tool. It was posted to LKML, and then modified by Lee Schermerhorn for mem policy testing.

It is changed to allocate a single large region before creating worker threads/tasks. Then, it carves up the region to give each worker a piece to fault in. This causes the workers to contend for the cache lines holding the in-kernel memory policy structure, the zone locks and page lists. In multi-thread mode, the workers will also contend for the single test task's mmap semaphore.

**Homepage**

https://github.com/gormanm/pft

**Parameters**

memory: total size of test region

**results**

pft.faults_per_sec_per_cpu:

### aim9

**What is it?**

The AIM Independent Resource Benchmark independently exercises and times each component of a UNIX computer system. The benchmark uses 58 subtests to generate absolute processing rates, in operations per second, for subsystems, I/O transfers, function calls, and UNIX system calls. GPL'd and can be published under the "nonaudited" clause.

**homepage**

https://sourceforge.net/projects/aimbench/files/aim-suite9/"

**parameters**

test: micro test name

testtime:

**results**

aim9.ops_per_sec:

### blogbench

**What is it?**

Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server. It stresses the filesystem with multiple threads performing random reads, writes, and rewrites to get a realistic idea of what scalability and concurrency a system can handle. Blogbench was initially designed to mimic the behavior of the Skyblog.com blog service.

Four different types of threads are started:

  • The writers. They create new blogs (directories) with a random amount of fake articles and fake pictures.
  • The rewriters. They add or they modify articles and pictures of existing blogs.
  • The “commenters”. They add fake comments to existing blogs in random order.
  • The readers. They read articles, pictures and comments of random blogs. They sometimes even try to access non-existent files.

New files are written atomically. The content is pushed with 8 KB chunks in a temporary file that gets renamed if everything completes. 8 KB is the default PHP buffer size for writes.

Reads are performed with a 64 KB buffer.

Concurrent writers and rewriters can quickly create fragmentation if the preallocation is not optimal. But it is very interesting to check how different file systems reacts to fragmentation.

Every blog is a new directory withing the same parent directory. Since some filesystems are unable to manage more than 32 KB or 64 KB links to the same directory (an example is UFS), you should not force the test to run a silly amount of time on these filesystems.

**homepage**

https://www.pureftpd.org/project/blogbench

**parameters**

**results**

blogbench.write_score:

blogbench.read_score:

### iperf

**What is it?**

iperf is a tool for active measurements of the maximum achievable bandwidth on IP networks. It supports tuning of various parameters related to timing, protocols, and buffers. For each test it reports the bandwidth, loss, and other parameters.

**homepage**

https://iperf.fr/

 

**parameters**

runtime:

protocol:

**results**

iperf.tcp.receiver.bps:

iperf.tcp.sender.bps:

iperf.udp.bps:

### will-it-scale

**What is it?**

Will It Scale takes a testcase and runs it from 1 to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.

We rely on hwloc for a platform independent way of laying out tasks on cores. It can be found at www.open-mpi.org/projects/hwloc/.

Care is taken to try and reduce run to run variability. By using hwloc we ensure each task is on its own core and won't get bounced around by the scheduler. The wrapper script (runtest.py) turns off address space randomisation, which can cause huge differences in pagetable related benchmarks (one run may fit within one pte page, the next may span two). There is a warmup period before which an average is taken. The averaging period can be changed with the -s option, which by default is 5 seconds.

**homepage**

https://github.com/antonblanchard/will-it-scale

**parameters**

runtime:

test:

mode: ["process", "thread", "both"]

nr_task:

**results**

will-it-scale.scalability:

will-it-scale.per_process_ops:

will-it-scale.per_thread_ops:

### fio-basic

**What is it?**

Fio is a tool that spawns a number of threads or processes doing a particular type of I/O action as specified by the user. The typical use of fio is to write a job file that matches the I/O load one wants to simulate.

Fio was created to allow benchmarking specific disk IO workloads. It can issue its IO requests using one of many synchronous and asynchronous IO APIs, and can also use various APIs which allow many IO requests to be issued with a single API call. You can also tune how large the files fio uses are, at what offsets in those files IO is to happen at, how much delay if any there is between issuing IO requests, and what if any filesystem sync calls are issued between each IO request. The options to fio allow you to issue very precisely defined IO patterns and see how long it takes your disk subsystem to complete these tasks.

**homepage**

https://github.com/axboe/fio

**parameters**

runtime: limit run time to runtime seconds

rw: type of I/O pattern

bs: block size for I/O units. Default to 4 KB

ioengine: defines how the job issues I/O

iodepth: number of I/O units to keep in flight against the file

direct: a boolean parameter, if true, use non-buffered I/O (usually O_DIRECT). Default to false.

test_size: total size of I/O for this job, fio will run until this many bytes have been transfered, unless limited by other options (runtime, for instance)

nr_task: number of clones (processes/threads performing the same workload) of this job. Default to 1

fallocate: whether pre-allocation is performed when laying down files

**results**

fio.write_bw_MBps: average write bandwidth

fio.write_iops: average write IOPS

fio.read_bw_MBps: average read bandwith

fio.read_iops: average read IOPS

### trinity

**What is it?**

The basic idea is fairly simple. As fuzz testing suggests, we call syscalls at random, with random arguments. Not an original idea, and one that has been done many times before on Linux*, and on other operating systems. Where Trinity differs is that the arguments it passes are not purely random.

We found some bugs in the past by just passing random values, but once the really dumb bugs were found, these dumb fuzzers would just run and run. The problem was if a syscall took, for example, a file descriptor as an argument, one of the first things it would try to do was validate that file descriptor. Being garbage, the kernel would just reject it as -EINVAL of course. So on startup, Trinity creates a list of file descriptors, by opening pipes, scanning sysfs, procfs, /dev, and creates a bunch of sockets using random network protocols. Then when a syscall needs an fd, it gets passed one of these at random.

File descriptors aren't the only thing Trinity knows about. Every syscall has its arguments annotated, and where possible it tries to provide something at least semi-sensible. For example, length arguments get passed one of a whole bunch of potentially interesting values.

Trinity also shares those file descriptors between multiple processes, which causes havoc sometimes.

If a child process successfully creates an mmap, the pointer is stored, and fed to subsequent syscalls, sometimes with hilarious results.

Trinity supports Alpha, Aarch64, ARM, i386, IA-64, MIPS, PowerPC-32, PowerPC-64, S390, S390x, SPARC-64, x86-64. Adding support for additional architectures is a small amount of work mostly involving just defining the order of the syscall table.

**homepage**

http://codemonkey.org.uk/projects/trinity/

**parameters**

runtime:

seed:

**results**

### piglit

**What is it?**

Piglit is a collection of automated tests for OpenGL* implementations. The goal of Piglit is to help improve the quality of open source OpenGL drivers by providing developers with a simple means to perform regression tests.

The current status is that the framework is working (though rough at the edges). It contains the Glean tests, some tests adapted from Mesa* as well as some specific regression tests for certain bugs. HTML summaries can be generated, including the ability to compare different test runs.

**homepage**

https://people.freedesktop.org/~nh/piglit/

**parameters**

group:

**results**

### reaim

**What is it?**

REAIM is an updated and improved version of AIM 7 benchmark. It forks many processes called tasks, each of which concurrently runs in random order a set of subtests called jobs. Each job exercises a different aspect of the operating system, such as disk-file operations, process creation, user virtual memory operations, pipe I/O, and compute-bound arithmetic loops.

**homepage**

https://sourceforge.net/projects/re-aim-7/

**parameters**

test: micro test name

nr_task:

nr_job:

iterations:

runtime:

**results**

reaim.jobs_per_min:

### locktorture

**What is it?**

The CONFIG LOCK_TORTURE_TEST config option provides a kernel module that runs torture tests on core kernel locking primitives. The kernel module, locktorture, may be built after the fact on the running kernel to be tested, if desired. The tests periodically output status messages via printk(), which can be examined via the dmesg (perhaps grepping for "torture"). The test is started when the module is loaded, and stops when the module is unloaded. This program is based on how RCU is tortured, via rcutorture.

This torture test consists of creating a number of kernel threads which acquire the lock and hold it for specific amount of time, thus simulating different critical region behaviors. The amount of contention on the lock can be simulated by either enlarging this critical region hold time and/or creating more kthreads.

**homepage**

https://www.kernel.org/doc/Documentation/locking/locktorture.txt

**parameters**

runtime:

**results**

### kernel_selftests

**What is it?**

The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.

On some systems, hot-plug tests could hang forever waiting for CPU and memory to be ready to be offlined. A special hot-plug target is created to run full range of hot-plug tests. In default mode, hot-plug tests run in safe mode with a limited scope. In limited mode, the cpu-hotplug test is run on a single CPU as opposed to all hotplug capable CPUs, and memory hotplug test is run on 2% of hotplug capable memory instead of 10%.

**homepage**https://www.kernel.org/doc/Documentation/kselftest.txt

**parameters**

**results**

### phoronix-test-suite

**What is it?**

The Phoronix Test Suite* is the most comprehensive testing and benchmarking platform available for Linux, Solaris*, Mac OS X*, and BSD* operating systems. The Phoronix Test Suite allows for carrying out tests in a fully automated manner from test installation to execution and reporting. All tests are meant to be easily reproducible, easy-to-use, and support fully automated execution. The Phoronix Test Suite is open-source under the GNU* GPLv3 license and is developed by Phoronix Media* in cooperation with partners. Version 1.0 of the Phoronix Test Suite was publicly released in 2008.

The Phoronix Test Suite client itself is a test framework that provides seamless execution of test profiles and test suites. There are more than 200 tests available by default, which are transparently available via OpenBenchmarking.org integration. Of these default test profiles there is a range of subsystems that can be tested and a range of hardware from mobile devices to desktops and worksrtations/servers. New tests can be easily introduced via the Phoronix Test Suite's extensible test architecture, with test profiles consisting of XML files and shell scripts. Test profiles can produce a quantitative result or other qualitative/abstract results like image quality comparisons and pass/fail. Using Phoronix Test Suite modules, other data can also be automatically collected at runtime such as the system power consumption, disk usage, and other software/hardware sensors. Test suites contain references to test profiles to execute as part of a set or can also reference other test suites. Test suites are defined via an XML schema.

Running the Phoronix Test Suite for the first time can be as simple as issuing a command such as phoronix-test-suite benchmark c-ray, which would proceed to install a simple CPU test, execute the test, and report the results. Along with the results, the system's hardware/software information is collected in a detailed manner, along with relevant system logs and other important system attributes such as compiler flags and system state. Users can optionally upload their results to OpenBenchmarking.org to share their results with others, compare results against other systems, and to carry out further analysis.

**homepage**

http://www.phoronix-test-suite.com/

**parameters**

test:

need_x:

**results**

### ftq

**What is it?**

FTQ is a simple benchmark that probes interference within a system. It does so as follows.

For a given sampling period (say, 1 ms), we would like to know how many units of work can be achieved by a user application. The unit of work is defined below. We can iteratively sample the system, checking for a series of 1 ms sample intervals, how much work is actually achieved in each. For a high interference system, we will see a great deal of variability in they output data. For a low interference system, we will see less variability. Of course, there is the distinct possibility that a system could experience constant interference, which would be identical to the low interference case, except with a lower than ideal mean value. A method for investigating whether or not this is the case is discussed below.

Traditionally, a simple benchmark to quantify this effect was what we call the Fixed Work Quantum benchmark. That method basically fixes a work quantum (say, 10000 integer operations), and samples the amount of time it takes to execute over a series of samples. The problem with this method is that although interference is revealed, the lack of a firm time axis (think about it for a second—instead of sampling intervals being defined in terms of time, they're defined in terms of work!) makes sophisticated analysis in things like the frequency domain difficult and at times, impossible.

FTQ fixes this with two subtle tweaks to FWQ. First off, instead of performing a single work quantum that requires on the order of the time we wish the sampling period to be, FTQ uses many samples of work quanta that are significantly smaller than the desired sampling period. Second, due to the fact that FTQ is a self sampling system, it is entirely possible (actually, it's almost guaranteed) that a sample may exceed the sampling interval by some amount. FTQ is constructed to compensate for that by shrinking subsequent samples to get the time series back on track.

In the end, FTQ provides data that both reveals interference and is in a form where sophisticated analysis is possible depending on the desires of the user.

**homepage**

https://github.com/rminnich/ftq

**parameters**

nr_task:

samples:

freq:

**results**

ftq.max:

ftq.mean:

ftq.noise:

### aim7

**What is it?**

AIM7 is a program written in C that forks many processes called tasks, each of which concurrently runs a set of subtests called jobs in random order. There are 53 kinds of jobs, each of which exercises a different aspect of the operating system, such as disk-file operations, process creation, user virtual memory operations, pipe I/O, and compute-bound arithmetic loops.

**homepage**

**parameters**

test: micro test name

load:

**results**

aim7.jobs-per-min:

### vm-scalability

**What is it?**

The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel that are of interest to us. There are many more regions that have not been covered by the test cases.

The test suite was developed with the help of gcov code coverage analysis tool. Gcov can be enabled as a configuration option in Linux kernel 2.6 and upwards. GCOV gives a per-line execution count for the source files. The supporting tool versions that were used in the development of the suite are:

gcov   - 4.6.3 gcc    - 4.6.3 Kernel - 3.4

The test kernel needs to be configured with the following options set:

 CONFIG_LOCK_STAT=y

       CONFIG_GCOV_KERNEL=y
       CONFIG_GCOV_PROFILE_ALL=y

       CONFIG_PERF_EVENTS=y
       CONFIG_FTRACE_SYSCALLS=y

       CONFIG_TRANSPARENT_HUGEPAGE=y

Once the test kernel has been compiled and installed, a debugfs is mounted on /sys/kernel. Writing to the file /sys/kernel/debug/gcov/reset resets all the counters. The directory /sys/kernel/debug/gcov/ also has a link to the build directory on the test system. For more information about setting up gcov, consult the gcov documentation.

The cases in the suite call an exucutable file with options. Most of the cases work on usemem. Some of the cases that call other executables have been written in seperate files to modularise the code and have been named based on the kernel functionality they exercise.

Some of the cases merely call trivial system calls and do not do anything else. They can be extended suitably as per need. Some cases like case-migration, case-mbind etc. need a numa setup. This was achieved using the numa=fake=. The value is the number of nodes to be emulated. The suite was tested with value = 2, which is the minimum value for intercore page migration. This is passed as a kernel boot option. Those cases that require the numa setup need to be linked with the -lnuma flag, and the libnuma has to be installed in the system. The executables that these cases call have been taken from the numactl documentation and slightly modified. They have been found to work on a two-node, numa-emulated machine.

Cases that require the sysfs parameters to be using the echo <value> > sysfs_parameter set may need tweaking based on the system configuration. The default values used in the case scripts might not scale well when systems parameters are scaled. For example, for systems with higher memory, the /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan may be needed to be set to a higher value, or the scan_sleep_millisecs needs to be reduced, or both. Failure to scale the values may result in disproportional or sometimes, no observable coverage in corresponding functions.

Cases can be run individually using ./case-name with the suite directory as pwd. The scripts work under this assumption. Also, care has to taken to make sure that the sparse root is mounted. The run_cases script takes care of mounting the sparse partition before running the scripts.

Huge pages are assumed to be 2 MB in size.

**homepage**https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/

**parameters**

runtime:

nr_task:

test:

size:

unit_size:

fs:

pre_setup: [0, 1]

**results**

vm-scalability.throughput:

vm-scalability.migrate_mbps:

### ebizzy

**What is it?**

ebizzy is designed to generate a workload resembling common web application server workloads. It is highly threaded, has a large in-memory working set, and allocates and deallocates memory frequently.

**homepage**http://ebizzy.sourceforge.net/

**parameters**

nr_threads:

duration:

iterations:

**results**

ebizzy.throughput:

### autotest

**What is it?**

Autotest is a framework for fully automated testing. It is designed primarily to test the Linux kernel, although it is useful for many other functions such as qualifying new hardware. It's an open source project under the GPL and is used and developed by a number of organizations, including Google*, IBM*, Red Hat*, and many others.

Autotest is composed of a number of modules that will help you do stand alone tests or set up a fully automated test grid, depending on what you are up to. A non extensive list of modules is:

  • Autotest client - The engine that executes the tests (dir client). Each autotest test is a directory inside (client/tests) and it is represented by a python class that implements a minimum number of methods. The client is what you need if you are a single developer trying out autotest and executing some tests. Autotest client executes client side control files, which are regular python programs, and leverage the API of the client.
  • Autotest server - A program that copies the client to remote machines and controls their execution. Autotest server executes server side control files, which are also regular python programs, but leverage a higher level API, since the autotest server can control test execution in multiple machines. If you want to perform tests slightly more complex involving more than one machine you might want the autotest server
  • Autotest database - For test grids, we need a way to store test results, and that is the purpose of the database component. This DB is used by the autotest scheduler and the frontends to store and visualize test results.
  • Autotest scheduler - For test grids, we need an utility that can schedule and trigger job execution in test machines, the autotest scheduler is that utility.
  • Autotest web frontend - For test grids, A web app, whose backend is written in django (http://www.djangoproject.com/) and UI written in gwt (http://code.google.com/webtoolkit/), lets users to trigger jobs and visualize test results
  • Autotest command line interface - Alternatively, users also can use the autotest CLI, written in python

**homepage**

https://github.com/autotest/autotest

**parameters**

test:

**results**

### netperf

**What is it?**

Netperf is a benchmark that can be used to measure various aspect of networking performance. The primary foci are bulk (aka unidirectional) data transfer and request/response performance using either TCP or UDP and the Berkeley Sockets interface.

**homepage**

http://www.netperf.org/netperf/

**parameters**

runtime:

nr_threads:

ip:

test:

send_size:

**results**

netperf.Throughput:

### apachebench

**What is it?**

apachebench is a tool for benchmarking your Apache* Hypertext Transfer Protocol (HTTP) server. It is designed to give you an impression of how your current Apache installation performs. This especially shows you how many requests per second your Apache installation is capable of serving.

**homepage**

https://httpd.apache.org/docs/2.4/programs/ab.html

**parameters**

runtime:

concurrency:

**results**

### fsmark

**What is it?**

The fsmark benchmark tests synchronous write workloads. It can vary the number of files, directory depth, etc. It has detailed timings for creates, writes, unlinks, close, and fsyncs that make it good for simulating mail servers and other setups.

**homepage**

https://sourceforge.net/projects/fsmark/

**parameters**

nr_threads: number of processes to write concurrently

filesize:

nr_directories:

nr_files_per_directory: number of files in each subdirectory to write before moveing to next subdirectory in Round Robin mode

test_size: number of files to write in total is test_size / filesize

sync_method: ["Sync", "fsyncBeforeClose", "syncFsync", "PostReverseFsync", "syncPostReverseFsync", "PostFsync", "syncPostFsync"]

iterations:

**results**

fsmark.files_per_sec: number of files written per second

fsmark.app_overhead: time in microseconds spent in the test not doing file writing related system calls

### pigz

**What is it?**

Pigz compresses using threads to make use of multiple processors and cores. The input is broken up into 128 KB chunks, with each chunk compressed in parallel. The individual check value for each chunk is also calculated in parallel. The compressed data is written in order to the output, and a combined check value is calculated from the individual check values.

The compressed data for mat generated is in the gzip, zlib, or single-entry zip for mat using the deflate compression method. The compression produces partial raw deflate streams that are concatenated by a single write thread and wrapped with the appropriate header and trailer, where the trailer contains the combined check value.

Each partial raw deflate stream is terminated by an empty stored block (using the Z_SYNC_FLUSH option of zlib) to end that partial bit stream at a byte boundary. That allows the partial streams to be concatenated simply as sequences of bytes. This adds a very small four to five byte overhead to the output for each input chunk.

The default input block size is 128 KB, but can be changed with the -b option. The number of compress threads is set by default to the number of online processors, which can be changed using the -p option. Specifying -p 1 avoids the use of threads entirely.

The input blocks, while compressed independently, have the last 32 KB of the previous block loaded as a preset dictionary to preserve the compression effectiveness of deflating in a single thread. This can be turned off using the -i or --independent option, so that the blocks can be decompressed independently for partial error recovery or for random access.

Decompression can’t be parallelized, at least not without specially prepared deflate streams for that purpose. As a result, pigz uses a single thread (the main thread) for decompression, but will create three other threads for reading, writing, and check calculation, which can speed up decompression under some circumstances. Parallel decompression can be turned off by specifying one process ( -dp 1 or -tp 1 ). Compressed files can be restored to their original for m using pigz -d or unpigz.

**homepage**

https://github.com/madler/pigz

**parameters**

runtime:

nr_threads:

blocksize:

**results**

pigz.throughput:

### qperf

**What is it?**

qperf measures bandwidth and latency between two nodes. It can work over TCP/IP as well as the RDMA transports. On one of the nodes, qperf is typically run with no arguments, designating it the server node. You can then run qperf on a client node to obtain measurements such as bandwidth, latency and CPU utilization.

In its most basic form, qperf is run on one node in server mode by invoking it with no arguments. On the other node, it is run with two arguments: the name of the server node followed by the name of the test. A list of tests can be found in the section, TESTS. A variety of options may also be specified.

**homepage**

https://www.openfabrics.org/downloads/qperf/

**parameters**

runtime:

**results**

qperf.sctp.bw:

qperf.tcp.bw:

qperf.udp.recv_bw:

qperf.udp.send_bw:

### xfstests

**What is it?**

xfstests is a file system regression test suite written by SGI* to test XFS. Just like XFS, it was originally targeted for lrix and later ported to Linux. Currently, it can be used to test all major file systems on Linux, include xfs, ext2/3/4, cifs, btrfs, f2fs, reiserfs, gfs2, jfs, udf, nfs, tmpfs. It uses golden output pass/fail rule, group file is used to contain test grouping info. Up until now, the test suite contains about 135 tests and the number is still growing. All tests are numbered.

The test suite may cause the following failures:

  • Not Run, something the test needed is missing
  • Fail, golden output mismatch
  • Filesystem inconsistency
  • Test/machine hung - Machine oops

**homepage**

git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git

**parameters**

test:

**results**

### fileio

**What is it?**

SysBench is a modular, cross-platform, multi-threaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load.

The idea of this benchmark suite is to quickly get an impression about system performance without setting up complex database benchmarks or even without installing a database at all.

**homepage**

https://github.com/akopytov/sysbench

**parameters**

nr_threads:

period:

 

rwmode: ["seqwr", "seqrewr", "seqrd", "rndrd", "rndwr", "rndrw"]

iomode: ["sync", "async", "mmap"]

size:

filenum:

**results**

fileio.requests_per_sec:

fileio.transactions:

### dbench

**What is it?**

DBENCH is a tool to generate I/O workloads to either a filesystem or to a networked CIFS or NFS server. It can even talk to an iSCSI target. DBENCH can be used to stress a filesystem or a server to see which workload it becomes saturated and can also be used for prediction analysis to determine "How many concurrent clients/applications performing this workload can my server handle before the response starts to lag?"

DBENCH provides a similar benchmarking and client emulation that is implemented in SMBTORTURE using the BENCH-NBENCH test for CIFS, but DBENCH can play these loadfiles onto a local filesystem instead of to a CIFS server. Using a different type of loadfiles, DBENCH can also generate and measure latency for NFS.

**homepage**

https://dbench.samba.org/

**parameters**

nr_threads:

**results**

dbench.throughput-MB/sec:

### rcutorture

**What is it?**

The CONFIG_RCU_TORTURE_TEST config option is available for all RCU implementations. It creates an rcutorture kernel module that can be loaded to run a torture test. The test periodically outputs status messages via printk() that can be examined via the dmesg command (perhaps grepping for "torture"). The test is started when the module is loaded, and stops when the module is unloaded.

It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will result in the tests being loaded into the base kernel. In this case, the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify whether the RCU torture tests are to be started immediately during boot or whether the /proc/sys/kernel/rcutorture_runnable file is used to enable them. This /proc file can be used to repeatedly pause and restart the tests, regardless of the initial state specified by the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option.

**homepage**

https://www.kernel.org/doc/Documentation/RCU/torture.txt

**parameters**

runtime:

**results**

### hackbench

**What is it?**

Hackbench is both a benchmark and a stress test for the Linux kernel scheduler. Its main job is to create a specified number of pairs of schedulable entities (either threads or traditional processes) that communicate via either sockets or pipes and time how long it takes for each pair to send data back and forth.

**homepage**

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c

**parameters**

runtime:

nr_threads: number of pairs of schedulable entities (either threads or processes)

mode: ["process", "thread"]

ipc: ["pipe"]

iterations:

**results**

hackbench.throughput: unit-KB/s

### netpipe

**What is it?**

NetPIPE uses a simple series of ping-pong tests over a range of message sizes to provide a complete measure of the performance of a network. It bounces messages of increasing size between two processes, whether across a network or within an SMP system. Message sizes are chosen at regular intervals, and with slight perturbations, to provide a complete evaluation of the communication system. Each data point involves many ping-pong tests to provide an accurate timing. Latencies are calculated by dividing the round trip time in half for small messages (less than 64 bytes).

The communication time for small messages is dominated by the overhead in the communication layers, meaning that the transmission is latency bound. For larger messages, the communication rate becomes bandwidth limited by some component in the communication subsystem (for example, the PCI bus, network card link, or network switch).

These measurements can be done at the message-passing layer (MPI, MPI-2, and PVM) or at the native communications layers that they run upon (TCP/IP, GM for Myrinet* cards, InfiniBand*, SHMEM for the Cray T3E* systems, and LAPI for IBM SP* systems). Recent work is being aimed at measuring some internal system properties such as the memcpy module that measures the internal memory copy rates, or a disk module under development that measures the performance to various I/O devices.

**homepage**

http://bitspjoule.org/netpipe/

**parameters**

test:

**results**

netpipe.less_8K_usec.avg:

netpipe.bigger_5M_Mbps.avg:

### linpack

**What is it?**

The LINPACK Benchmarks,/em> are a measure of a system's floating point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering.

The latest version of these benchmarks is used to build the TOP500 list, ranking the world's most powerful supercomputers.

The aim is to approximate how fast a computer will perform when solving real problems. It is a simplification, since no single computational task can reflect the overall performance of a computer system. Nevertheless, the LINPACK benchmark performance can provide a good correction over the peak performance provided by the manufacturer. The peak performance is the maximal theoretical performance a computer can achieve, calculated as the machine's frequency, in cycles per second, times the number of operations per cycle it can perform. The actual performance will always be lower than the peak performance. The performance of a computer is a complex issue that depends on many interconnected variables. The performance measured by the LINPACK benchmark consists of the number of 64-bit floating-point operations, generally additions and multiplications, that a computer can perform per second, also known as FLOPS. However, a computer's performance when running actual applications is likely to be far behind the maximal performance it achieves running the appropriate LINPACK benchmark.

**homepage**

http://registrationcenter.intel.com/irc_nas/3914/l_lpk_p_11.1.2.005.tgz

**parameters**

memory:

**results**

linpack.GFlops:

### unixbench

**What is it?**

UnixBench is the original BYTE UNIX benchmark suite, updated and revised by many people over the years.

The purpose of UnixBench is to provide a basic indicator of the performance of a Unix*-like system. Multiple tests are used to test various aspects of the system's performance. These test results are then compared to the scores from a baseline system to produce an index value, which is generally easier to handle than the raw scores. The entire set of index values is then combined to make an overall index for the system.

Some very simple graphics tests are included to measure the 2D and 3D graphics performance of the system.

Multiple CPU systems are handled. If your system has multiple CPUs, the default behaviour is to run the selected tests twice: once with one copy of each test program running at a time, and once with N copies, where N is the number of CPUs. This is designed to allow you to assess

  • the performance of your system when running a single task
  • the performance of your system when running multiple tasks
  • the gain from your system's implementation of parallel processing

Be aware that this is a system benchmark, not a CPU, RAM, or disk benchmark. The results will depend not only on your hardware, but on your operating system, libraries, and even your compiler.

**homepage**

https://github.com/kdlucas/byte-unixbench,/p>

**parameters**

runtime:

test:

nr_task:

**results**

unixbench.score:

### ltp

**What is it?**

Linux test project is a joint project started by sgi, osdl and bull that is developed and maintained by IBM, Cisco*, Fujitsu*, SUSE*, Red Hat, Oracle* and others. The project goal is to deliver tests to the open source community that validate the reliability, robustness, and stability of Linux.

The LTP testsuite contains a collection of tools for testing the Linux kernel and related features. Our goal is to improve the Linux kernel and system libraries by bringing test automation to the testing effort. Interested open source contributors are encouraged to join.

**homepage**

http://linux-test-project.github.io/

**parameters**

test: ["dio", "fs_readonly"]

**results**