The Intel® C for Metal development package is a software development tool for Intel® Graphics Technology. For many years, it has been used internally to develop the best-in-class media-processing technology on Intel Processor Graphics. Now, we are finally able to open-source this tool to the external users. This development package includes the Intel® C for Metal Compiler, the Intel® C for Medtal Runtime, Intel® Media Driver for VAAPI, and reference examples, which can be used to develop applications accelerated by Intel® Graphics Media Accelerator. A typical application contains two kinds of source code, kernel and host. The kernel is written in Intel® C for Metal language, compiled to GPU ISA binary by the Intel® C for Metal Compiler, and executed on the GPU. Host manages workloads through the Intel® C for Metal Runtime and user mode media driver.
C for Metal is a general GPU programming language that allows user to achieve close-to-assembly performance on Intel Processor Graphics. It is NOT limited to media-processing. C for Metal programming model is explicit SIMD which differs from other GPU programming languages such as CUDA or OpenCL that only allows defining a kernel that would operate on a single unit of the data space. Each of those kernel instantiations operating on individual data units is referred to as a thread in CUDA or a work-item in OpenCL and rely on the intel GPU compiler packs those work-items to SIMD instructions, as per hardware-supported SIMD width, where each SIMD lane corresponds to a thread or work-item. This model is simple and relatively easy to program for purely data-parallel programs and can satisfy most of usage cases. However, when programs have a mix of serial and parallel segments or need irregular memory access it gets very difficult to optimize the program. As the vectorization is done by the compiler, performance bottlenecks are often obscured and hence hard to find and analyze. This makes it difficult for the developers to write highly efficient code to utilize the full potential of the GPU. Instead C for Metal exposes the view of HW Execution Unit (EU) to users so that they can directly harness the full potential power of the underneath intel Graphics architecture. For people who want to harvest the full power of Intel Graphics Processors, e.g., performance critical lib or application developers, C for Metal is the right programming tool. The following is the summary list of what C for Metal offers in explicit-SIMD programming model:
- One C for Metal thread is equivalent to one Gen HW thread which is similar to CUDA warp or one OCL subgroup
- Predefined vector and matrix types. C for Metal compiler maps vector and matrix to Gen General Register File (GRF). Users directly manage when/how data are moved to or cached in EU.
- Parallelism is expressed through vector and matrix operations
- Cross SIMD-lane operations are expressed via vector/matrix operations
- Mixed SIMD widths are allowed in C for Metal kernels
- Data layout (e.g., transpose) in GRF is controlled by users
To learn C for Metal programming, you may start with the tutorials and examples included in the download package. The tutorials have the following content ordered in terms of complexity.
- Tutorial 1. Basic Host Programming
- Tutorial 2. Basic Kernel Programming
- Tutorial 3. Enqueuing Multiple Kernels
- Tutorial 4. Using Media Walker with Thread Dependence
- Tutorial 5. Builtin Matrix and Vector Operations
- Tutorial 6. Shared Local Memory and Thread Group
- Tutorial 7. Using Printf in Kernel
- Tutorial 8. Using CmBuffer
- Tutorial 9. Zero-Copy with User-Provided Surfaces
- Tutorial 10. Event-Driven Synchronization
- Tutorial 11. Kernel Programming: Register Usage
- Tutorial 12. Kernel Deep-Dive: BitonicSort
- Tutorial 13. Kernel Deep-Dive: RadixSort
- Tutorial 14. Kernel Example - PrefixSum
- Tutorial 15. Kernel Example - Graph-Cut
You can also download the tutorials from "CM Programming reference".