Sorry, you need to enable JavaScript to visit this website.

C for Metal Development Package

The Intel® C for Metal development package is a software development package for Intel® Graphics Technology. It includes the Intel® C for Metal Compiler, the Intel® C for Metal Runtime, Intel® Media Driver for VAAPI, and reference examples, which can be used to develop applications accelerated by Intel® Graphics Media Accelerator. A typical application contains two kinds of source code, kernel and host. The kernel is written in Intel® C for Media language, compiled to GPU ISA binary by the Intel® C for Metal Compiler, and executed on the GPU. Host manages workloads through the Intel® C for Metal Runtime and user mode media driver.

Using CmBuffer

BY Li Huang ON Jun 13, 2019

Tutorial 8. Using CmBuffer

In the previous examples, we have been using CmSurface to store image data. In this tutorial, we show the usage of CmBuffer to store generic data, and use oword-block read and write to access such data. The following is what we do in the nbody example

Host Program: Set up CmBuffers before enqueue

  // CmBuffer represents a 1D surface in video memory.
  // This function creates a CmBuffer in memory with linear layout.
  CmBuffer *surf1 = nullptr;
  device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf1);
  cm_result_check(surf1->WriteSurface((unsigned char *)h_pos, nullptr));

  // Gets the input surface index.
  SurfaceIndex *input_surface_idx1 = nullptr;
  cm_result_check(surf1->GetIndex(input_surface_idx1));

  CmBuffer *surf2 = nullptr;
  device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf2);
  cm_result_check(surf2->WriteSurface((unsigned char *)h_vel, nullptr));

  // Gets the input surface index.
  SurfaceIndex *input_surface_idx2 = nullptr;
  cm_result_check(surf2->GetIndex(input_surface_idx2));

  CmBuffer *surf3 = nullptr;
  device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf3);

  // Gets the output surface index.
  SurfaceIndex *output_surface_idx1 = nullptr;
  cm_result_check(surf3->GetIndex(output_surface_idx1));

  CmBuffer *surf4 = nullptr;
  device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf4);

  // Gets the output surface index.
  SurfaceIndex *output_surface_idx2 = nullptr;
  cm_result_check(surf4->GetIndex(output_surface_idx2));

Host Program: read output buffer after enqueue

  // Reads the output surface content to the system memory using the CPU.
  // The size of data copied is the size of data in Surface.
  // It is a blocking call. The function will not return until the copy
  // operation is completed.
  // The dependent event "sync_event" ensures that the reading of the surface
  // will not happen until its state becomes CM_STATUS_FINISHED.
  cm_result_check(surf3->ReadSurface((unsigned char *)new_pos, sync_event));
  cm_result_check(surf4->ReadSurface((unsigned char *)new_vel, sync_event));

Kernel Program: buffer reads and writes

Here we only show the use of block reads and block writes from single address. CM also provide various scattered reads and writes using a vector of addresses.

Read example

    for (int i = 0; i < BODIES_CHUNK; i += BODIES_PER_RW) {
        read(INPOS, (thisMB_ID * BODIES_CHUNK + i) * BODY_SIZE,
             chunk.select<ELEMS_RW, 1>(ELEMS_BODY * i));
    }

Write example


            
    for (int i = 0; i < BODIES_CHUNK; i += BODIES_PER_RW) {
        write(OUTPOS, (thisMB_ID * BODIES_CHUNK + i) * BODY_SIZE,
              chunk.select<ELEMS_RW, 1>(ELEMS_BODY * i));
    }