Sorry, you need to enable JavaScript to visit this website.

Feedback

Your feedback is important to keep improving our website and offer you a more reliable experience.

 

By David Stewart on September 1, 2015

 

Having a pulse is a pretty basic measure of life. If you don't have one, you probably are not alive (unless you are a plant or single-celled creature.)

My resting heart rate is a very convenient 60 beats per minute. This is convenient because if I'm trying to go to sleep, I can easily tell if my pulse rate is faster than that by comparing it with the ticking of my alarm clock. If I'm beating faster than that, I might be mentally tensed up or sick or just dehydrated. I can tell a lot about my health and performance based on checking my pulse.

What is the pulse rate of the Internet?

Releases of the Linux kernel come about every 80 days or so (although this has varied over time). The most popular open source compiler, gcc, has a major release about once a year in the Spring. Distributions of Linux such as Ubuntu or the Yocto Project release every six months like clockwork. Other projects and distros clock out at different rates.

In reality though, the core components of the Internet get updated constantly. Every time the source changes, the health and performance can change. A single source code change can fail to build, can break compatibility with existing code and can change the performance anywhere from a fraction of a percent up to 10% or more on major customer workloads.

We're trying to read the pulse of our core components (Python, PHP, HHVM) every day. We call this our "0-day Lab". Here's how it works:

  • Every night our team in Bucharest downloads the latest source from these language projects and builds them.
  • After the build, we run a number of performance measurements on the latest Intel Xeon hardware.
  • After collecting these results, we email them out to their respective development communities.
  • We also have a "manager dashboard" which I look at constantly to see at a glance what kinds of trends we're seeing over time.

We call it the 0-day Lab for a couple of reasons:

  • If you consider open source software to be buildable and releasable every day, then you should check it out on Day 0 of its release. Every day seems to be about the right tempo for this.
  • We're indebted to another team at Intel which started this idea some years ago on the Linux kernel, and they call their project "0-day" as well. So this is a bit of an homage to them.

Here's an example of the daily email, sent to python-checkins:

Results for project python_default-nightly, build date 2015-09-01 06:02:01 commit: afdfb9d53bec97c2e64166718424b9098ad17350 revision date: 2015-08-31 18:45:11 environment: Haswell-EP cpu: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz 2x18 cores, stepping 2, LLC 45 MB mem: 128 GB os:CentOS 7.1 kernel: Linux 3.10.0-229.4.2.el7.x86_64 Baseline results were generated using release v3.4.3, with hash
b4cbecbc0781e89a309d03b60a1f75f8499250e6 from 2015-02-25 12:15:33+00:00

------------------------------------------------------------------------------------------
               benchmark        RSD*      change since      change since  current rev with
                                              last run            v3.4.3      regrtest PGO
------------------------------------------------------------------------------------------
:-)            django_v2    0.18601%          0.09676%          8.23460%         17.20786%
:-(              pybench    0.13592%          0.03344%         -2.42500%          8.25125%
:-(             regex_v8    2.87124%         -0.00005%         -3.17370%          5.04940%
:-|                nbody    0.11478%          0.50369%         -0.82900%          9.08337%
:-|         json_dump_v2    0.26878%         -0.32432%         -1.37670%         12.03586%
:-|       normal_startup    0.75209%          0.02933%         -0.07303%          4.98845%
------------------------------------------------------------------------------------------

Note: Benchmark results are measured in seconds. * Relative Standard Deviation (Standard Deviation/Average) Our lab does a nightly source pull and build of the Python project and measures performance changes against the previous stable version and the previous nightly measurement. This is provided as a service to the community so that quality issues with current hardware can be identified quickly. Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document may contain information on products, services and/or processes in development. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. (C) 2015 Intel Corporation.

(This was just today's email. Depending on when you see this post, the actual mails could look quite different).

This shows a number of interesting things:

  • If we had a successful build, we're able to show performance changes against the previous successful run and against the previous release.
  • Since some of these measurements are not really proper benchmarks per se, there is a lot of run-to-run variance we observe. So we run the workload multiple times and report the standard deviation (RSD). Note in our example how we got a 3.17% regression in regex_v8 from the 3.4.3 release. But since the standard deviation was measured at 2.8%, I'm not too excited.
  • I'm also very excited about the value we get from turning on Profile Guided Optimization (PGO) in our builds, and we submitted a patch for this recently. You can see we do get a nice boost from this, and will encourage the Python community to adopt this as the standard way to build Python.

This acts as a nice "canary in the coal mine." If someone commits a patch which should improve performance but it actually decreases performance in our measurement, maybe something should be addressed!

And certainly if the performance suddenly degrades badly, we should address it quickly rather than letting it fester. We're doing this for Python 3 (the mail above), Python 2.7 (on which there is a lot of legacy code), PHP and HHVM.

I consider 0-day to be the central engine around which our entire optimization effort turns. And as such, we are constantly working on improving the workloads we run, the way we measure things to get the optimal results. I also use this to make sure we have the absolutely optimal settings and methodology to get the best results.

We're really looking for your participation as well. If you have ideas on how to improve these mails and make them a more useful pulse reading, we're really interested. Comment here on in the email lists where we submit our results.

Source: Intel Software | Article: The 0-Day Challenge: What is the Pulse of the Internet? by David Stewart (Intel)