Sorry, you need to enable JavaScript to visit this website.

Data Analytics Acceleration Library

Intel® DAAL and Intel® MKL – Complementary High Performance Machine Learning & Analytics Libraries

BY Kent Moffat ON Jun 09, 2016

Intel® DAAL and Intel® MKL – Complementary High Performance Machine Learning & Analytics Libraries

By Andrey Nikolaev, Intel DAAL Lead Architect

 

Intel® Data Analytics Acceleration Library (Intel® DAAL) is a new developer tool for machine learning and big data analytics. Despite being new, the library has already attracted attention. See, for instance, the presentation by Capital One* at the Intel Developers Forum in 2015 that discussed 200x performance improvement for their recommendation system over their existing implementation. Recently, an open source version of the library was also released.

Intel® Math Kernel Library (Intel® MKL,) originally focused on HPC developer needs, has now been extended to support analytics problems as well, from initial analysis of the datasets through neural network computations.

Below we will clarify similarities and differences between the libraries to help users choose the library that best fits the needs of their specific application.

 

Functionality

Intel MKL is a library of primitives that are used in different areas, including engineering, finance, simulations and analytical problems. The library supports low level C and Fortran interfaces for generic operations such as matrix multiplication and decompositions, FFT, random number generation, and vector mathematical functions. Interfaces for matrix and linear algebra operations follow the de facto standard Netlib conventions. The library is highly optimized for Intel architecture devices.

Intel DAAL provides high level C++, Java and Python interfaces to solve analytical problems such as principal component analysis, regression, classification, clustering and market basket analysis. Using this library you can train the model using the chosen algorithm and then score data sets against that model. Like Intel MKL, Intel DAAL is also highly tuned for Intel architecture by using, behind the scenes, primitives available in Intel MKL as well as other optimization techniques.

A subset of the algorithms are available in both Intel MKL and Intel DAAL such as matrix decompositions -- SVD, QR and Cholesky -- that are critical in analytical problems. We will highlight the differences between versions of these algorithms in the computational modes section, which will help you better understand their features.

Additionally, Intel DAAL introduces the idea of an intermediate data type used in computations. If you store data in single precision but want to train your model in double precision, you specify the respective compile-time parameter to the Intel DAAL algorithm.

 

Working with data

Both libraries support homogenous data represented with either double or single precision as well as both dense and sparse data.  Intel MKL supports a rich collection of sparse layouts while only one, CSR format, is available in the current version of Intel DAAL.

Representation in homogenous floating point data format, however, is not sufficient to support the broad spectrum of data used in different analytical areas. That’s why Intel DAAL introduced the concept of the Numeric Table. This construct allows you to represent and work with heterogeneous data that requires the extended representation that relies on use of a mix of data types, floating point, integer and character. Additionally, Intel DAAL extended the concept of a Numeric Table with a Data Source, a set of interfaces that support work with well-known data storage types such as csv, Spark*, or Sql*.

The presence of missing points or noisy data in the dataset is typical in real datasets, and both libraries properly process such data.

 

Computation modes

Most algorithms in Intel MKL are called in batch computation mode where the data fits into memory. Intel DAAL additionally introduces online and distributed computation modes.

In online (or streaming) computation mode Intel DAAL allows you to update previously computed results (the model or statistical estimate) by processing new data blocks. In this mode you do not need to hold the whole dataset processed so far in memory. For example, you will be able to incrementally update SVD and QR decomposition for the latest available data block.

You can also use Intel DAAL in distributed computation mode where the dataset is spread across computational nodes. By design, communication technology such as MPI* is out of the scope of the library. Instead the library provides the interfaces that help you get partial results on the computational nodes and the interfaces to combine the partial results together to produce the final result (Intel DAAL QR and SVD are examples for such interfaces). Use of the specific communication technology to support data transfer between the computational nodes is up to the user. Intel MKL also supports distributed computational mode in some components such as FFT and Lapack. Unlike Intel DAAL, Intel MKL relies on MPI* technology internally.

 

The analytical flow

Data analysis is a multi-stage iterative flow that involves the steps from acquisition from the data source, initial pre-processing and preparation for the computations and running the algorithms. In this flow Intel MKL addresses the computational component only, and the rest of the steps are out of scope.

Intel DAAL provides features that address all of these stages including compression/de-compression and serialization routines to optimize the data transfer, the interfaces to represent heterogeneous data structures, and the algorithms for data processing.

If your application spends a significant amount of time in the computation of the covariance matrix or general matrix multiplication for single or double precision matrices, Intel MKL is the right option.

If your application heavily relies on processing of the data stored, for example, in HDFS and represented with continuous, discrete and categorical attributes and involves analytical algorithms, use Intel DAAL.

Together both libraries provide options that can be used separately or together in different stages of data analysis, depending on usage scenarios and application requirements.

For additional information about the libraries:

·       Intel MKL at https://software.intel.com/en-us/intel-mkl

·       Intel DAAL at https://software.intel.com/en-us/intel-daal

 

 

 

 

*Other brands and names may be claimed as the property of others.