Sorry, you need to enable JavaScript to visit this website.

Feedback

Your feedback is important to keep improving our website and offer you a more reliable experience.

C for Metal Development Package

The Intel® C for Metal development package is a software development package for Intel® Graphics Technology. It includes the Intel® C for Metal Compiler, the Intel® C for Metal Runtime, Intel® Media Driver for VAAPI, and reference examples, which can be used to develop applications accelerated by Intel® Graphics Media Accelerator. A typical application contains two kinds of source code, kernel and host. The kernel is written in Intel® C for Media language, compiled to GPU ISA binary by the Intel® C for Metal Compiler, and executed on the GPU. Host manages workloads through the Intel® C for Metal Runtime and user mode media driver.

Zero-Copy with User-Provided Surfaces

BY Li Huang ON Jun 13, 2019

Tutorial 9. Zero-Copy with User-Provided Surfaces

CM also provides a way for user to create surface in system memory. This way, CPU and GPU share the physical memory. CPU access memory through pointer, GPU access memory through surface handle. It is user’s responsibility to avoid data race between GPU and CPU

Also be aware that media-block read/write from user-provided surface can be slower because, unlike regular 2D surfaces which has tiled layout, user-provided surface has a linear layout.

CreateSurface2DUP – 2D user provided memory

linear_up_walker is an example that uses 2D user provided memory.

    // Gets necessary information in order to create and use CmSurface2DUP as
    // input surface later.
    // To create CmSurface2DUP, user needs to allocate such amount of system
    // memory which is equal to or larger than physical size returned here.
    // When accessing the system memory, user needs to be aware of the pitch,
    // which is equal to pixel_width * byte_per_pixel + necessary_padding.
    unsigned int input_surface_pitch = 0;
    unsigned int input_surface_size = 0;
    cm_result_check(device->GetSurface2DInfo(width * 3 / 4,
                                             height,
                                             CM_SURFACE_FORMAT_A8R8G8B8,
                                             input_surface_pitch,
                                             input_surface_size));

    // Creates a CmSurface2DUP as input surface in UP (User Provided) system memory
    // with given surface width and height in pixel, and format. The UP system memory
    // must be page (4K Bytes) aligned. The size of the system memory must be
    // larger than or equal to the size returned by GetSurface2DInfo.
    // Application can either access the memory through the memory pointer from
    // the CPU, or access the 2D surface created upon the same memory from the GPU.
    CmSurface2DUP *input_surface = nullptr;
    void *sysmem_src = CM_ALIGNED_MALLOC(input_surface_size, 0x1000);
    cm_result_check(device->CreateSurface2DUP(width * 3 / 4,
                                              height,
                                              CM_SURFACE_FORMAT_A8R8G8B8,
                                              sysmem_src,
                                              input_surface));

    // Copies the input image data to the system memory provided to create
    // CmSurface2DUP using the CPU.
    memcpy(sysmem_src, input_image.getData(), width * height * 3);

    // Gets necessary information in order to create and use CmSurface2DUP
    // as output surface later.
    unsigned int output_surface_pitch = 0;
    unsigned int output_surface_size = 0;
    cm_result_check(device->GetSurface2DInfo(width * 3 / 4,
                                             height,
                                             CM_SURFACE_FORMAT_A8R8G8B8,
                                             output_surface_pitch,
                                             output_surface_size));

    // Creates a CmSurface2DUP in UP (User Provided) system memory to serve
    // as the output surface.
    CmSurface2DUP *output_surface = nullptr;
    void *sysmem_dst = CM_ALIGNED_MALLOC(output_surface_size, 0x1000);
    cm_result_check(device->CreateSurface2DUP(width * 3 / 4,
                                              height,
                                              CM_SURFACE_FORMAT_A8R8G8B8,
                                              sysmem_dst,
                                              output_surface));

CreateBufferUP – 1D user provided memory

vector matching example is an example that uses 1D user provided memory

  // Creates a 1D input surface for the feature vectors in the user provided
  // system memory. Application can either access the memory through the
  // memory pointer from the CPU, or access the buffer created upon the same
  // memory from the GPU.
  CmBufferUP *feature_vect_surf = nullptr;
  cm_result_check(device->CreateBufferUP(feature_vect_num*VECTOR_LENGTH,
                                         feature_vect,
                                         feature_vect_surf));

There is no difference in using those surfaces on the kernel-side.