Improving Python* performance for scientific tools and libraries
Python is an interpreted, interactive, object-oriented programming language. It is flexible and easy to learn even for non-programmers, making it a popular choice for developers. These advantages, plus specialized modules like NumPy and SciPy, have increased its popularity in the scientific community. However, compared to C or Fortran, Python does have performance limitations because of its nature as an interpreted language.
This work presents how Clear Linux* Project for Intel® Architecture gets significant performance improvements in NumPy and SciPy by using OpenBLAS and Intel® Advanced Vector Extensions 2 (Intel® AVX2).
NumPy and SciPy
Many high-level languages have special tools and libraries for scientific computing purposes. These tools are designed to implement the most common math routines, such as basic linear algebra or statistical operations. Developers can also write their own custom code when dealing with less common problems. One of the most common languages used to solve these uncommon problems is Python, which is easily extensible with both C and Fortran.
Python offers a full set of open-source software for mathematics, science and engineering called SciPy. This ecosystem is comprised of multiple packages. In this article we will take a closer look at the NumPy and SciPy libraries, specifically.
External linear algebra libraries
According to the Numpy manual, "NumPy does not require any external linear algebra libraries to be installed." However, it is possible to use them and gain some speed advantages. There are some useful libraries available like ATLAS, OpenBLAS and LAPACK. This article is only focused on the improvements made in OpenBLAS in Clear Linux Project for Intel Architecture.
OpenBLAS is an open-source implementation of BLAS (Basic Linear Algebra Subprograms), which provides standard interfaces for linear algebra. Using this library to replace the default linear algebra libraries used by NumPy is beneficial for increasing speed, especially when computing dot function for matrices multiplication used in diverse scientific fields.
Optimizing OpenBLAS with Intel AVX2
Intel AVX2 technology is a glossary extension for x86 processors released with the 4th Generation Intel® Core™ processor family that provides simultaneous execution over vectors of 256 bits (4 operations of 64 bits). This new capacity improves the application performance related to high-performance computing, databases and video processing, and it is exceptionally good for matrix multiplication and therefore for NumPy.
In Clear Linux Project for Intel Architecture, the OpenBLAS package includes the files for systems with Intel AVX2 support and files for systems without support. Having two versions of OpenBLAS allows workload handling improvements according to the available processor capabilities.
To measure the improvements in Clear Linux OS for Intel Architecture using Intel AVX2, five different benchmarks were executed: Cholesky, DGEMM, Inversion, QR and SVD. Those benchmarks were obtained from matrix operations containing anywhere from 3500 to 40000 elements.
As we can see in the graph, the benefits gained from enabling OpenBLAS with Intel AVX2 acceleration in NumPy are significant. Some of them even show up to a 69 percent improvement using OpenBLAS with Intel AVX2 support. The smallest improvement is up to 20 percent. In all, the experiments using Intel AVX technology show a considerable improvement in the execution time of large matrix operations (commonly used in cloud scientific computing).
The cost of using Intel AVX technology in OpenBlas is low. The compilation flags and the configuration options can be seen in the source code of Clear Linux Project for Intel Architecture. These optimizations can be easily applied to other Linux distributions as well.
The benefits of OpenBLAS with AVX2 are not limited to NumPy; many other scientific programming and analytics tools can be also improved. The R programming language (an important tool for researches in the numerical analysis and machine learning spaces) also uses OpenBlas as a basic linear algebra library. Thanks to this, the latest results of R benchmark shown on the Phoronix website, for example, yielded 3X performance improvement.
Notices and disclaimers
†Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Test and System Configurations: Machine: Processor: 2 x Intel® Xeon® E5-2603 v3 processors @ 1.60GHz (12 Cores), Motherboard: Dell* 0599V5, Memory: 258048 MB, Disk: 16001 GB PERC H730P. OS: Clear Linux OS for Intel Architecture build 7400.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
Intel, the Intel logo, Intel Core, Intel Xeon, Clear Linux Project for Intel Architecture, and Intel Advanced Vector Extensions 2 are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© 2016 Intel Corporation.