Zero-Copy with User-Provided Surfaces
Tutorial 9. Zero-Copy with User-Provided Surfaces
CM also provides a way for user to create surface in system memory. This way, CPU and GPU share the physical memory. CPU access memory through pointer, GPU access memory through surface handle. It is user’s responsibility to avoid data race between GPU and CPU
Also be aware that media-block read/write from user-provided surface can be slower because, unlike regular 2D surfaces which has tiled layout, user-provided surface has a linear layout.
CreateSurface2DUP – 2D user provided memory
linear_up_walker is an example that uses 2D user provided memory.
// Gets necessary information in order to create and use CmSurface2DUP as // input surface later. // To create CmSurface2DUP, user needs to allocate such amount of system // memory which is equal to or larger than physical size returned here. // When accessing the system memory, user needs to be aware of the pitch, // which is equal to pixel_width * byte_per_pixel + necessary_padding. unsigned int input_surface_pitch = 0; unsigned int input_surface_size = 0; cm_result_check(device->GetSurface2DInfo(width * 3 / 4, height, CM_SURFACE_FORMAT_A8R8G8B8, input_surface_pitch, input_surface_size)); // Creates a CmSurface2DUP as input surface in UP (User Provided) system memory // with given surface width and height in pixel, and format. The UP system memory // must be page (4K Bytes) aligned. The size of the system memory must be // larger than or equal to the size returned by GetSurface2DInfo. // Application can either access the memory through the memory pointer from // the CPU, or access the 2D surface created upon the same memory from the GPU. CmSurface2DUP *input_surface = nullptr; void *sysmem_src = CM_ALIGNED_MALLOC(input_surface_size, 0x1000); cm_result_check(device->CreateSurface2DUP(width * 3 / 4, height, CM_SURFACE_FORMAT_A8R8G8B8, sysmem_src, input_surface)); // Copies the input image data to the system memory provided to create // CmSurface2DUP using the CPU. memcpy(sysmem_src, input_image.getData(), width * height * 3); // Gets necessary information in order to create and use CmSurface2DUP // as output surface later. unsigned int output_surface_pitch = 0; unsigned int output_surface_size = 0; cm_result_check(device->GetSurface2DInfo(width * 3 / 4, height, CM_SURFACE_FORMAT_A8R8G8B8, output_surface_pitch, output_surface_size)); // Creates a CmSurface2DUP in UP (User Provided) system memory to serve // as the output surface. CmSurface2DUP *output_surface = nullptr; void *sysmem_dst = CM_ALIGNED_MALLOC(output_surface_size, 0x1000); cm_result_check(device->CreateSurface2DUP(width * 3 / 4, height, CM_SURFACE_FORMAT_A8R8G8B8, sysmem_dst, output_surface));
CreateBufferUP – 1D user provided memory
vector matching example is an example that uses 1D user provided memory
// Creates a 1D input surface for the feature vectors in the user provided // system memory. Application can either access the memory through the // memory pointer from the CPU, or access the buffer created upon the same // memory from the GPU. CmBufferUP *feature_vect_surf = nullptr; cm_result_check(device->CreateBufferUP(feature_vect_num*VECTOR_LENGTH, feature_vect, feature_vect_surf));
There is no difference in using those surfaces on the kernel-side.