DRM Memory Management

Modern Linux systems require large amount of graphics memory to store frame buffers, textures, vertices and other graphics-related data. Given the very dynamic nature of many of that data, managing graphics memory efficiently is thus crucial for the graphics stack and plays a central role in the DRM infrastructure.

The DRM core includes two memory managers, namely Translation Table Maps (TTM) and Graphics Execution Manager (GEM). TTM was the first DRM memory manager to be developed and tried to be a one-size-fits-them all solution. It provides a single userspace API to accommodate the need of all hardware, supporting both Unified Memory Architecture (UMA) devices and devices with dedicated video RAM (i.e. most discrete video cards). This resulted in a large, complex piece of code that turned out to be hard to use for driver development.

GEM started as an Intel-sponsored project in reaction to TTM’s complexity. Its design philosophy is completely different: instead of providing a solution to every graphics memory-related problems, GEM identified common code between drivers and created a support library to share it. GEM has simpler initialization and execution requirements than TTM, but has no video RAM management capabilities and is thus limited to UMA devices.

The Translation Table Manager (TTM)

TTM design background and information belongs here.

TTM initialization

Warning This section is outdated.

Drivers wishing to support TTM must pass a filled ttm_bo_driver structure to ttm_bo_device_init, together with an initialized global reference to the memory manager. The ttm_bo_driver structure contains several fields with function pointers for initializing the TTM, allocating and freeing memory, waiting for command completion and fence synchronization, and memory migration.

The struct drm_global_reference is made up of several fields:

struct drm_global_reference {
        enum ttm_global_types global_type;
        size_t size;
        void *object;
        int (*init) (struct drm_global_reference *);
        void (*release) (struct drm_global_reference *);
};

There should be one global reference structure for your memory manager as a whole, and there will be others for each object created by the memory manager at runtime. Your global TTM should have a type of TTM_GLOBAL_TTM_MEM. The size field for the global object should be sizeof(struct ttm_mem_global), and the init and release hooks should point at your driver-specific init and release routines, which probably eventually call ttm_mem_global_init and ttm_mem_global_release, respectively.

Once your global TTM accounting structure is set up and initialized by calling ttm_global_item_ref() on it, you need to create a buffer object TTM to provide a pool for buffer object allocation by clients and the kernel itself. The type of this object should be TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct ttm_bo_global). Again, driver-specific init and release functions may be provided, likely eventually calling ttm_bo_global_init() and ttm_bo_global_release(), respectively. Also, like the previous object, ttm_global_item_ref() is used to create an initial reference count for the TTM, which will call your initialization function.

See the radeon_ttm.c file for an example of usage.

int drm_global_item_ref(struct drm_global_reference * ref)

Initialize and acquire reference to memory object

Parameters

struct drm_global_reference * ref
Object for initialization

Description

This initializes a memory object, allocating memory and calling the .:c:func:init() hook. Further calls will increase the reference count for that item.

Return

Zero on success, non-zero otherwise.

void drm_global_item_unref(struct drm_global_reference * ref)

Drop reference to memory object

Parameters

struct drm_global_reference * ref
Object being removed

Description

Drop a reference to the memory object and eventually call the release() hook. The allocated object should be dropped in the release() hook or before calling this function

The Graphics Execution Manager (GEM)

The GEM design approach has resulted in a memory manager that doesn’t provide full coverage of all (or even all common) use cases in its userspace or kernel API. GEM exposes a set of standard memory-related operations to userspace and a set of helper functions to drivers, and let drivers implement hardware-specific operations with their own private API.

The GEM userspace API is described in the GEM - the Graphics Execution Manager article on LWN. While slightly outdated, the document provides a good overview of the GEM API principles. Buffer allocation and read and write operations, described as part of the common GEM API, are currently implemented using driver-specific ioctls.

GEM is data-agnostic. It manages abstract buffer objects without knowing what individual buffers contain. APIs that require knowledge of buffer contents or purpose, such as buffer allocation or synchronization primitives, are thus outside of the scope of GEM and must be implemented using driver-specific ioctls.

On a fundamental level, GEM involves several operations:

  • Memory allocation and freeing
  • Command execution
  • Aperture management at command execution time

Buffer object allocation is relatively straightforward and largely provided by Linux’s shmem layer, which provides memory to back each object.

Device-specific operations, such as command execution, pinning, buffer read & write, mapping, and domain ownership transfers are left to driver-specific ioctls.

GEM Initialization

Drivers that use GEM must set the DRIVER_GEM bit in the struct struct drm_driver driver_features field. The DRM core will then automatically initialize the GEM core before calling the load operation. Behind the scene, this will create a DRM Memory Manager object which provides an address space pool for object allocation.

In a KMS configuration, drivers need to allocate and initialize a command ring buffer following core GEM initialization if required by the hardware. UMA devices usually have what is called a “stolen” memory region, which provides space for the initial framebuffer and large, contiguous memory regions required by the device. This space is typically not managed by GEM, and must be initialized separately into its own DRM MM object.

GEM Objects Creation

GEM splits creation of GEM objects and allocation of the memory that backs them in two distinct operations.

GEM objects are represented by an instance of struct struct drm_gem_object. Drivers usually need to extend GEM objects with private information and thus create a driver-specific GEM object structure type that embeds an instance of struct struct drm_gem_object.

To create a GEM object, a driver allocates memory for an instance of its specific GEM object type and initializes the embedded struct struct drm_gem_object with a call to drm_gem_object_init(). The function takes a pointer to the DRM device, a pointer to the GEM object and the buffer object size in bytes.

GEM uses shmem to allocate anonymous pageable memory. drm_gem_object_init() will create an shmfs file of the requested size and store it into the struct struct drm_gem_object filp field. The memory is used as either main storage for the object when the graphics hardware uses system memory directly or as a backing store otherwise.

Drivers are responsible for the actual physical pages allocation by calling shmem_read_mapping_page_gfp() for each page. Note that they can decide to allocate pages when initializing the GEM object, or to delay allocation until the memory is needed (for instance when a page fault occurs as a result of a userspace memory access or when the driver needs to start a DMA transfer involving the memory).

Anonymous pageable memory allocation is not always desired, for instance when the hardware requires physically contiguous system memory as is often the case in embedded devices. Drivers can create GEM objects with no shmfs backing (called private GEM objects) by initializing them with a call to drm_gem_private_object_init() instead of drm_gem_object_init(). Storage for private GEM objects must be managed by drivers.

GEM Objects Lifetime

All GEM objects are reference-counted by the GEM core. References can be acquired and release by calling drm_gem_object_get() and drm_gem_object_put() respectively. The caller must hold the struct drm_device struct_mutex lock when calling drm_gem_object_get(). As a convenience, GEM provides drm_gem_object_put_unlocked() functions that can be called without holding the lock.

When the last reference to a GEM object is released the GEM core calls the struct drm_driver gem_free_object_unlocked operation. That operation is mandatory for GEM-enabled drivers and must free the GEM object and all associated resources.

void (*gem_free_object) (struct drm_gem_object *obj); Drivers are responsible for freeing all GEM object resources. This includes the resources created by the GEM core, which need to be released with drm_gem_object_release().

GEM Objects Naming

Communication between userspace and the kernel refers to GEM objects using local handles, global names or, more recently, file descriptors. All of those are 32-bit integer values; the usual Linux kernel limits apply to the file descriptors.

GEM handles are local to a DRM file. Applications get a handle to a GEM object through a driver-specific ioctl, and can use that handle to refer to the GEM object in other standard or driver-specific ioctls. Closing a DRM file handle frees all its GEM handles and dereferences the associated GEM objects.

To create a handle for a GEM object drivers call drm_gem_handle_create(). The function takes a pointer to the DRM file and the GEM object and returns a locally unique handle. When the handle is no longer needed drivers delete it with a call to drm_gem_handle_delete(). Finally the GEM object associated with a handle can be retrieved by a call to drm_gem_object_lookup().

Handles don’t take ownership of GEM objects, they only take a reference to the object that will be dropped when the handle is destroyed. To avoid leaking GEM objects, drivers must make sure they drop the reference(s) they own (such as the initial reference taken at object creation time) as appropriate, without any special consideration for the handle. For example, in the particular case of combined GEM object and handle creation in the implementation of the dumb_create operation, drivers must drop the initial reference to the GEM object before returning the handle.

GEM names are similar in purpose to handles but are not local to DRM files. They can be passed between processes to reference a GEM object globally. Names can’t be used directly to refer to objects in the DRM API, applications must convert handles to names and names to handles using the DRM_IOCTL_GEM_FLINK and DRM_IOCTL_GEM_OPEN ioctls respectively. The conversion is handled by the DRM core without any driver-specific support.

GEM also supports buffer sharing with dma-buf file descriptors through PRIME. GEM-based drivers must use the provided helpers functions to implement the exporting and importing correctly. See ?. Since sharing file descriptors is inherently more secure than the easily guessable and global GEM names it is the preferred buffer sharing mechanism. Sharing buffers through GEM names is only supported for legacy userspace. Furthermore PRIME also allows cross-device buffer sharing since it is based on dma-bufs.

GEM Objects Mapping

Because mapping operations are fairly heavyweight GEM favours read/write-like access to buffers, implemented through driver-specific ioctls, over mapping buffers to userspace. However, when random access to the buffer is needed (to perform software rendering for instance), direct access to the object can be more efficient.

The mmap system call can’t be used directly to map GEM objects, as they don’t have their own file handle. Two alternative methods currently co-exist to map GEM objects to userspace. The first method uses a driver-specific ioctl to perform the mapping operation, calling do_mmap() under the hood. This is often considered dubious, seems to be discouraged for new GEM-enabled drivers, and will thus not be described here.

The second method uses the mmap system call on the DRM file handle. void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset); DRM identifies the GEM object to be mapped by a fake offset passed through the mmap offset argument. Prior to being mapped, a GEM object must thus be associated with a fake offset. To do so, drivers must call drm_gem_create_mmap_offset() on the object.

Once allocated, the fake offset value must be passed to the application in a driver-specific way and can then be used as the mmap offset argument.

The GEM core provides a helper method drm_gem_mmap() to handle object mapping. The method can be set directly as the mmap file operation handler. It will look up the GEM object based on the offset value and set the VMA operations to the struct drm_driver gem_vm_ops field. Note that drm_gem_mmap() doesn’t map memory to userspace, but relies on the driver-provided fault handler to map pages individually.

To use drm_gem_mmap(), drivers must fill the struct struct drm_driver gem_vm_ops field with a pointer to VM operations.

The VM operations is a struct vm_operations_struct made up of several fields, the more interesting ones being:

struct vm_operations_struct {
        void (*open)(struct vm_area_struct * area);
        void (*close)(struct vm_area_struct * area);
        int (*fault)(struct vm_fault *vmf);
};

The open and close operations must update the GEM object reference count. Drivers can use the drm_gem_vm_open() and drm_gem_vm_close() helper functions directly as open and close handlers.

The fault operation handler is responsible for mapping individual pages to userspace when a page fault occurs. Depending on the memory allocation scheme, drivers can allocate pages at fault time, or can decide to allocate memory for the GEM object at the time the object is created.

Drivers that want to map the GEM object upfront instead of handling page faults can implement their own mmap file operation handler.

For platforms without MMU the GEM core provides a helper method drm_gem_cma_get_unmapped_area(). The mmap() routines will call this to get a proposed address for the mapping.

To use drm_gem_cma_get_unmapped_area(), drivers must fill the struct struct file_operations get_unmapped_area field with a pointer on drm_gem_cma_get_unmapped_area().

More detailed information about get_unmapped_area can be found in Documentation/nommu-mmap.txt

Memory Coherency

When mapped to the device or used in a command buffer, backing pages for an object are flushed to memory and marked write combined so as to be coherent with the GPU. Likewise, if the CPU accesses an object after the GPU has finished rendering to the object, then the object must be made coherent with the CPU’s view of memory, usually involving GPU cache flushing of various kinds. This core CPU<->GPU coherency management is provided by a device-specific ioctl, which evaluates an object’s current domain and performs any necessary flushing or synchronization to put the object into the desired coherency domain (note that the object may be busy, i.e. an active render target; in that case, setting the domain blocks the client and waits for rendering to complete before performing any necessary flushing operations).

Command Execution

Perhaps the most important GEM function for GPU devices is providing a command execution interface to clients. Client programs construct command buffers containing references to previously allocated memory objects, and then submit them to GEM. At that point, GEM takes care to bind all the objects into the GTT, execute the buffer, and provide necessary synchronization between clients accessing the same buffers. This often involves evicting some objects from the GTT and re-binding others (a fairly expensive operation), and providing relocation support which hides fixed GTT offsets from clients. Clients must take care not to submit command buffers that reference more objects than can fit in the GTT; otherwise, GEM will reject them and no rendering will occur. Similarly, if several objects in the buffer require fence registers to be allocated for correct rendering (e.g. 2D blits on pre-965 chips), care must be taken not to require more fence registers than are available to the client. Such resource management should be abstracted from the client in libdrm.

GEM Function Reference

struct drm_gem_object

GEM buffer object

Definition

struct drm_gem_object {
  struct kref refcount;
  unsigned handle_count;
  struct drm_device *dev;
  struct file *filp;
  struct drm_vma_offset_node vma_node;
  size_t size;
  int name;
  struct dma_buf *dma_buf;
  struct dma_buf_attachment *import_attach;
};

Members

refcount

Reference count of this object

Please use drm_gem_object_get() to acquire and drm_gem_object_put() or drm_gem_object_put_unlocked() to release a reference to a GEM buffer object.

handle_count

This is the GEM file_priv handle count of this object.

Each handle also holds a reference. Note that when the handle_count drops to 0 any global names (e.g. the id in the flink namespace) will be cleared.

Protected by drm_device.object_name_lock.

dev
DRM dev this object belongs to.
filp
SHMEM file node used as backing storage for swappable buffer objects. GEM also supports driver private objects with driver-specific backing storage (contiguous CMA memory, special reserved blocks). In this case filp is NULL.
vma_node

Mapping info for this object to support mmap. Drivers are supposed to allocate the mmap offset using drm_gem_create_mmap_offset(). The offset itself can be retrieved using drm_vma_node_offset_addr().

Memory mapping itself is handled by drm_gem_mmap(), which also checks that userspace is allowed to access the object.

size
Size of the object, in bytes. Immutable over the object’s lifetime.
name
Global name for this object, starts at 1. 0 means unnamed. Access is covered by drm_device.object_name_lock. This is used by the GEM_FLINK and GEM_OPEN ioctls.
dma_buf

dma-buf associated with this GEM object.

Pointer to the dma-buf associated with this gem object (either through importing or exporting). We break the resulting reference loop when the last gem handle for this object is released.

Protected by drm_device.object_name_lock.

import_attach

dma-buf attachment backing this object.

Any foreign dma_buf imported as a gem object has this set to the attachment point for the device. This is invariant over the lifetime of a gem object.

The drm_driver.gem_free_object callback is responsible for cleaning up the dma_buf attachment and references acquired at import time.

Note that the drm gem/prime core does not depend upon drivers setting this field any more. So for drivers where this doesn’t make sense (e.g. virtual devices or a displaylink behind an usb bus) they can simply leave it as NULL.

Description

This structure defines the generic parts for GEM buffer objects, which are mostly around handling mmap and userspace handles.

Buffer objects are often abbreviated to BO.

DEFINE_DRM_GEM_FOPS(name)

macro to generate file operations for GEM drivers

Parameters

name
name for the generated structure

Description

This macro autogenerates a suitable struct file_operations for GEM based drivers, which can be assigned to drm_driver.fops. Note that this structure cannot be shared between drivers, because it contains a reference to the current module using THIS_MODULE.

Note that the declaration is already marked as static - if you need a non-static version of this you’re probably doing it wrong and will break the THIS_MODULE reference by accident.

void drm_gem_object_get(struct drm_gem_object * obj)

acquire a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This function acquires an additional reference to obj. It is illegal to call this without already holding a reference. No locks required.

void __drm_gem_object_put(struct drm_gem_object * obj)

raw function to release a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This function is meant to be used by drivers which are not encumbered with drm_device.struct_mutex legacy locking and which are using the gem_free_object_unlocked callback. It avoids all the locking checks and locking overhead of drm_gem_object_put() and drm_gem_object_put_unlocked().

Drivers should never call this directly in their code. Instead they should wrap it up into a driver_gem_object_put(struct driver_gem_object *obj) wrapper function, and use that. Shared code should never call this, to avoid breaking drivers by accident which still depend upon drm_device.struct_mutex locking.

void drm_gem_object_reference(struct drm_gem_object * obj)

acquire a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This is a compatibility alias for drm_gem_object_get() and should not be used by new code.

void __drm_gem_object_unreference(struct drm_gem_object * obj)

raw function to release a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This is a compatibility alias for __drm_gem_object_put() and should not be used by new code.

void drm_gem_object_unreference_unlocked(struct drm_gem_object * obj)

release a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This is a compatibility alias for drm_gem_object_put_unlocked() and should not be used by new code.

void drm_gem_object_unreference(struct drm_gem_object * obj)

release a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This is a compatibility alias for drm_gem_object_put() and should not be used by new code.

int drm_gem_object_init(struct drm_device * dev, struct drm_gem_object * obj, size_t size)

initialize an allocated shmem-backed GEM object

Parameters

struct drm_device * dev
drm_device the object should be initialized for
struct drm_gem_object * obj
drm_gem_object to initialize
size_t size
object size

Description

Initialize an already allocated GEM object of the specified size with shmfs backing store.

void drm_gem_private_object_init(struct drm_device * dev, struct drm_gem_object * obj, size_t size)

initialize an allocated private GEM object

Parameters

struct drm_device * dev
drm_device the object should be initialized for
struct drm_gem_object * obj
drm_gem_object to initialize
size_t size
object size

Description

Initialize an already allocated GEM object of the specified size with no GEM provided backing store. Instead the caller is responsible for backing the object and handling it.

int drm_gem_handle_delete(struct drm_file * filp, u32 handle)

deletes the given file-private handle

Parameters

struct drm_file * filp
drm file-private structure to use for the handle look up
u32 handle
userspace handle to delete

Description

Removes the GEM handle from the filp lookup table which has been added with drm_gem_handle_create(). If this is the last handle also cleans up linked resources like GEM names.

int drm_gem_dumb_map_offset(struct drm_file * file, struct drm_device * dev, u32 handle, u64 * offset)

return the fake mmap offset for a gem object

Parameters

struct drm_file * file
drm file-private structure containing the gem object
struct drm_device * dev
corresponding drm_device
u32 handle
gem object handle
u64 * offset
return location for the fake mmap offset

Description

This implements the drm_driver.dumb_map_offset kms driver callback for drivers which use gem to manage their backing storage.

Return

0 on success or a negative error code on failure.

int drm_gem_dumb_destroy(struct drm_file * file, struct drm_device * dev, uint32_t handle)

dumb fb callback helper for gem based drivers

Parameters

struct drm_file * file
drm file-private structure to remove the dumb handle from
struct drm_device * dev
corresponding drm_device
uint32_t handle
the dumb handle to remove

Description

This implements the drm_driver.dumb_destroy kms driver callback for drivers which use gem to manage their backing storage.

int drm_gem_handle_create(struct drm_file * file_priv, struct drm_gem_object * obj, u32 * handlep)

create a gem handle for an object

Parameters

struct drm_file * file_priv
drm file-private structure to register the handle for
struct drm_gem_object * obj
object to register
u32 * handlep
pionter to return the created handle to the caller

Description

Create a handle for this object. This adds a handle reference to the object, which includes a regular reference count. Callers will likely want to dereference the object afterwards.

Since this publishes obj to userspace it must be fully set up by this point, drivers must call this last in their buffer object creation callbacks.

void drm_gem_free_mmap_offset(struct drm_gem_object * obj)

release a fake mmap offset for an object

Parameters

struct drm_gem_object * obj
obj in question

Description

This routine frees fake offsets allocated by drm_gem_create_mmap_offset().

Note that drm_gem_object_release() already calls this function, so drivers don’t have to take care of releasing the mmap offset themselves when freeing the GEM object.

int drm_gem_create_mmap_offset_size(struct drm_gem_object * obj, size_t size)

create a fake mmap offset for an object

Parameters

struct drm_gem_object * obj
obj in question
size_t size
the virtual size

Description

GEM memory mapping works by handing back to userspace a fake mmap offset it can use in a subsequent mmap(2) call. The DRM core code then looks up the object based on the offset and sets up the various memory mapping structures.

This routine allocates and attaches a fake offset for obj, in cases where the virtual size differs from the physical size (ie. drm_gem_object.size). Otherwise just use drm_gem_create_mmap_offset().

This function is idempotent and handles an already allocated mmap offset transparently. Drivers do not need to check for this case.

int drm_gem_create_mmap_offset(struct drm_gem_object * obj)

create a fake mmap offset for an object

Parameters

struct drm_gem_object * obj
obj in question

Description

GEM memory mapping works by handing back to userspace a fake mmap offset it can use in a subsequent mmap(2) call. The DRM core code then looks up the object based on the offset and sets up the various memory mapping structures.

This routine allocates and attaches a fake offset for obj.

Drivers can call drm_gem_free_mmap_offset() before freeing obj to release the fake offset again.

struct page ** drm_gem_get_pages(struct drm_gem_object * obj)

helper to allocate backing pages for a GEM object from shmem

Parameters

struct drm_gem_object * obj
obj in question

Description

This reads the page-array of the shmem-backing storage of the given gem object. An array of pages is returned. If a page is not allocated or swapped-out, this will allocate/swap-in the required pages. Note that the whole object is covered by the page-array and pinned in memory.

Use drm_gem_put_pages() to release the array and unpin all pages.

This uses the GFP-mask set on the shmem-mapping (see mapping_set_gfp_mask()). If you require other GFP-masks, you have to do those allocations yourself.

Note that you are not allowed to change gfp-zones during runtime. That is, shmem_read_mapping_page_gfp() must be called with the same gfp_zone(gfp) as set during initialization. If you have special zone constraints, set them after drm_gem_object_init() via mapping_set_gfp_mask(). shmem-core takes care to keep pages in the required zone during swap-in.

void drm_gem_put_pages(struct drm_gem_object * obj, struct page ** pages, bool dirty, bool accessed)

helper to free backing pages for a GEM object

Parameters

struct drm_gem_object * obj
obj in question
struct page ** pages
pages to free
bool dirty
if true, pages will be marked as dirty
bool accessed
if true, the pages will be marked as accessed
struct drm_gem_object * drm_gem_object_lookup(struct drm_file * filp, u32 handle)

look up a GEM object from it’s handle

Parameters

struct drm_file * filp
DRM file private date
u32 handle
userspace handle

Return

A reference to the object named by the handle if such exists on filp, NULL otherwise.

void drm_gem_object_release(struct drm_gem_object * obj)

release GEM buffer object resources

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This releases any structures and resources used by obj and is the invers of drm_gem_object_init().

void drm_gem_object_free(struct kref * kref)

free a GEM object

Parameters

struct kref * kref
kref of the object to free

Description

Called after the last reference to the object has been lost. Must be called holding drm_device.struct_mutex.

Frees the object

void drm_gem_object_put_unlocked(struct drm_gem_object * obj)

drop a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This releases a reference to obj. Callers must not hold the drm_device.struct_mutex lock when calling this function.

See also __drm_gem_object_put().

void drm_gem_object_put(struct drm_gem_object * obj)

release a GEM buffer object reference

Parameters

struct drm_gem_object * obj
GEM buffer object

Description

This releases a reference to obj. Callers must hold the drm_device.struct_mutex lock when calling this function, even when the driver doesn’t use drm_device.struct_mutex for anything.

For drivers not encumbered with legacy locking use drm_gem_object_put_unlocked() instead.

void drm_gem_vm_open(struct vm_area_struct * vma)

vma->ops->open implementation for GEM

Parameters

struct vm_area_struct * vma
VM area structure

Description

This function implements the #vm_operations_struct open() callback for GEM drivers. This must be used together with drm_gem_vm_close().

void drm_gem_vm_close(struct vm_area_struct * vma)

vma->ops->close implementation for GEM

Parameters

struct vm_area_struct * vma
VM area structure

Description

This function implements the #vm_operations_struct close() callback for GEM drivers. This must be used together with drm_gem_vm_open().

int drm_gem_mmap_obj(struct drm_gem_object * obj, unsigned long obj_size, struct vm_area_struct * vma)

memory map a GEM object

Parameters

struct drm_gem_object * obj
the GEM object to map
unsigned long obj_size
the object size to be mapped, in bytes
struct vm_area_struct * vma
VMA for the area to be mapped

Description

Set up the VMA to prepare mapping of the GEM object using the gem_vm_ops provided by the driver. Depending on their requirements, drivers can either provide a fault handler in their gem_vm_ops (in which case any accesses to the object will be trapped, to perform migration, GTT binding, surface register allocation, or performance monitoring), or mmap the buffer memory synchronously after calling drm_gem_mmap_obj.

This function is mainly intended to implement the DMABUF mmap operation, when the GEM object is not looked up based on its fake offset. To implement the DRM mmap operation, drivers should use the drm_gem_mmap() function.

drm_gem_mmap_obj() assumes the user is granted access to the buffer while drm_gem_mmap() prevents unprivileged users from mapping random objects. So callers must verify access restrictions before calling this helper.

Return 0 or success or -EINVAL if the object size is smaller than the VMA size, or if no gem_vm_ops are provided.

int drm_gem_mmap(struct file * filp, struct vm_area_struct * vma)

memory map routine for GEM objects

Parameters

struct file * filp
DRM file pointer
struct vm_area_struct * vma
VMA for the area to be mapped

Description

If a driver supports GEM object mapping, mmap calls on the DRM file descriptor will end up here.

Look up the GEM object based on the offset passed in (vma->vm_pgoff will contain the fake offset we created when the GTT map ioctl was called on the object) and map it with a call to drm_gem_mmap_obj().

If the caller is not granted access to the buffer object, the mmap will fail with EACCES. Please see the vma manager for more information.

GEM CMA Helper Functions Reference

The Contiguous Memory Allocator reserves a pool of memory at early boot that is used to service requests for large blocks of contiguous memory.

The DRM GEM/CMA helpers use this allocator as a means to provide buffer objects that are physically contiguous in memory. This is useful for display drivers that are unable to map scattered buffers via an IOMMU.

struct drm_gem_cma_object

GEM object backed by CMA memory allocations

Definition

struct drm_gem_cma_object {
  struct drm_gem_object base;
  dma_addr_t paddr;
  struct sg_table *sgt;
  void *vaddr;
};

Members

base
base GEM object
paddr
physical address of the backing memory
sgt
scatter/gather table for imported PRIME buffers. The table can have more than one entry but they are guaranteed to have contiguous DMA addresses.
vaddr
kernel virtual address of the backing memory
DEFINE_DRM_GEM_CMA_FOPS(name)

macro to generate file operations for CMA drivers

Parameters

name
name for the generated structure

Description

This macro autogenerates a suitable struct file_operations for CMA based drivers, which can be assigned to drm_driver.fops. Note that this structure cannot be shared between drivers, because it contains a reference to the current module using THIS_MODULE.

Note that the declaration is already marked as static - if you need a non-static version of this you’re probably doing it wrong and will break the THIS_MODULE reference by accident.

struct drm_gem_cma_object * drm_gem_cma_create(struct drm_device * drm, size_t size)

allocate an object with the given size

Parameters

struct drm_device * drm
DRM device
size_t size
size of the object to allocate

Description

This function creates a CMA GEM object and allocates a contiguous chunk of memory as backing store. The backing memory has the writecombine attribute set.

Return

A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative error code on failure.

void drm_gem_cma_free_object(struct drm_gem_object * gem_obj)

free resources associated with a CMA GEM object

Parameters

struct drm_gem_object * gem_obj
GEM object to free

Description

This function frees the backing memory of the CMA GEM object, cleans up the GEM object state and frees the memory used to store the object itself. Drivers using the CMA helpers should set this as their drm_driver.gem_free_object_unlocked callback.

int drm_gem_cma_dumb_create_internal(struct drm_file * file_priv, struct drm_device * drm, struct drm_mode_create_dumb * args)

create a dumb buffer object

Parameters

struct drm_file * file_priv
DRM file-private structure to create the dumb buffer for
struct drm_device * drm
DRM device
struct drm_mode_create_dumb * args
IOCTL data

Description

This aligns the pitch and size arguments to the minimum required. This is an internal helper that can be wrapped by a driver to account for hardware with more specific alignment requirements. It should not be used directly as their drm_driver.dumb_create callback.

Return

0 on success or a negative error code on failure.

int drm_gem_cma_dumb_create(struct drm_file * file_priv, struct drm_device * drm, struct drm_mode_create_dumb * args)

create a dumb buffer object

Parameters

struct drm_file * file_priv
DRM file-private structure to create the dumb buffer for
struct drm_device * drm
DRM device
struct drm_mode_create_dumb * args
IOCTL data

Description

This function computes the pitch of the dumb buffer and rounds it up to an integer number of bytes per pixel. Drivers for hardware that doesn’t have any additional restrictions on the pitch can directly use this function as their drm_driver.dumb_create callback.

For hardware with additional restrictions, drivers can adjust the fields set up by userspace and pass the IOCTL data along to the drm_gem_cma_dumb_create_internal() function.

Return

0 on success or a negative error code on failure.

int drm_gem_cma_mmap(struct file * filp, struct vm_area_struct * vma)

memory-map a CMA GEM object

Parameters

struct file * filp
file object
struct vm_area_struct * vma
VMA for the area to be mapped

Description

This function implements an augmented version of the GEM DRM file mmap operation for CMA objects: In addition to the usual GEM VMA setup it immediately faults in the entire object instead of using on-demaind faulting. Drivers which employ the CMA helpers should use this function as their ->:c:func:mmap() handler in the DRM device file’s file_operations structure.

Instead of directly referencing this function, drivers should use the DEFINE_DRM_GEM_CMA_FOPS().macro.

Return

0 on success or a negative error code on failure.

unsigned long drm_gem_cma_get_unmapped_area(struct file * filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)

propose address for mapping in noMMU cases

Parameters

struct file * filp
file object
unsigned long addr
memory address
unsigned long len
buffer size
unsigned long pgoff
page offset
unsigned long flags
memory flags

Description

This function is used in noMMU platforms to propose address mapping for a given buffer. It’s intended to be used as a direct handler for the struct file_operations.get_unmapped_area operation.

Return

mapping address on success or a negative error code on failure.

void drm_gem_cma_print_info(struct drm_printer * p, unsigned int indent, const struct drm_gem_object * obj)

Print drm_gem_cma_object info for debugfs

Parameters

struct drm_printer * p
DRM printer
unsigned int indent
Tab indentation level
const struct drm_gem_object * obj
GEM object

Description

This function can be used as the drm_driver->gem_print_info callback. It prints paddr and vaddr for use in e.g. debugfs output.

struct sg_table * drm_gem_cma_prime_get_sg_table(struct drm_gem_object * obj)

provide a scatter/gather table of pinned pages for a CMA GEM object

Parameters

struct drm_gem_object * obj
GEM object

Description

This function exports a scatter/gather table suitable for PRIME usage by calling the standard DMA mapping API. Drivers using the CMA helpers should set this as their drm_driver.gem_prime_get_sg_table callback.

Return

A pointer to the scatter/gather table of pinned pages or NULL on failure.

struct drm_gem_object * drm_gem_cma_prime_import_sg_table(struct drm_device * dev, struct dma_buf_attachment * attach, struct sg_table * sgt)

produce a CMA GEM object from another driver’s scatter/gather table of pinned pages

Parameters

struct drm_device * dev
device to import into
struct dma_buf_attachment * attach
DMA-BUF attachment
struct sg_table * sgt
scatter/gather table of pinned pages

Description

This function imports a scatter/gather table exported via DMA-BUF by another driver. Imported buffers must be physically contiguous in memory (i.e. the scatter/gather table must contain a single entry). Drivers that use the CMA helpers should set this as their drm_driver.gem_prime_import_sg_table callback.

Return

A pointer to a newly created GEM object or an ERR_PTR-encoded negative error code on failure.

int drm_gem_cma_prime_mmap(struct drm_gem_object * obj, struct vm_area_struct * vma)

memory-map an exported CMA GEM object

Parameters

struct drm_gem_object * obj
GEM object
struct vm_area_struct * vma
VMA for the area to be mapped

Description

This function maps a buffer imported via DRM PRIME into a userspace process’s address space. Drivers that use the CMA helpers should set this as their drm_driver.gem_prime_mmap callback.

Return

0 on success or a negative error code on failure.

void * drm_gem_cma_prime_vmap(struct drm_gem_object * obj)

map a CMA GEM object into the kernel’s virtual address space

Parameters

struct drm_gem_object * obj
GEM object

Description

This function maps a buffer exported via DRM PRIME into the kernel’s virtual address space. Since the CMA buffers are already mapped into the kernel virtual address space this simply returns the cached virtual address. Drivers using the CMA helpers should set this as their DRM driver’s drm_driver.gem_prime_vmap callback.

Return

The kernel virtual address of the CMA GEM object’s backing store.

void drm_gem_cma_prime_vunmap(struct drm_gem_object * obj, void * vaddr)

unmap a CMA GEM object from the kernel’s virtual address space

Parameters

struct drm_gem_object * obj
GEM object
void * vaddr
kernel virtual address where the CMA GEM object was mapped

Description

This function removes a buffer exported via DRM PRIME from the kernel’s virtual address space. This is a no-op because CMA buffers cannot be unmapped from kernel space. Drivers using the CMA helpers should set this as their drm_driver.gem_prime_vunmap callback.

VMA Offset Manager

The vma-manager is responsible to map arbitrary driver-dependent memory regions into the linear user address-space. It provides offsets to the caller which can then be used on the address_space of the drm-device. It takes care to not overlap regions, size them appropriately and to not confuse mm-core by inconsistent fake vm_pgoff fields. Drivers shouldn’t use this for object placement in VMEM. This manager should only be used to manage mappings into linear user-space VMs.

We use drm_mm as backend to manage object allocations. But it is highly optimized for alloc/free calls, not lookups. Hence, we use an rb-tree to speed up offset lookups.

You must not use multiple offset managers on a single address_space. Otherwise, mm-core will be unable to tear down memory mappings as the VM will no longer be linear.

This offset manager works on page-based addresses. That is, every argument and return code (with the exception of drm_vma_node_offset_addr()) is given in number of pages, not number of bytes. That means, object sizes and offsets must always be page-aligned (as usual). If you want to get a valid byte-based user-space address for a given offset, please see drm_vma_node_offset_addr().

Additionally to offset management, the vma offset manager also handles access management. For every open-file context that is allowed to access a given node, you must call drm_vma_node_allow(). Otherwise, an mmap() call on this open-file with the offset of the node will fail with -EACCES. To revoke access again, use drm_vma_node_revoke(). However, the caller is responsible for destroying already existing mappings, if required.

struct drm_vma_offset_node * drm_vma_offset_exact_lookup_locked(struct drm_vma_offset_manager * mgr, unsigned long start, unsigned long pages)

Look up node by exact address

Parameters

struct drm_vma_offset_manager * mgr
Manager object
unsigned long start
Start address (page-based, not byte-based)
unsigned long pages
Size of object (page-based)

Description

Same as drm_vma_offset_lookup_locked() but does not allow any offset into the node. It only returns the exact object with the given start address.

Return

Node at exact start address start.

void drm_vma_offset_lock_lookup(struct drm_vma_offset_manager * mgr)

Lock lookup for extended private use

Parameters

struct drm_vma_offset_manager * mgr
Manager object

Description

Lock VMA manager for extended lookups. Only locked VMA function calls are allowed while holding this lock. All other contexts are blocked from VMA until the lock is released via drm_vma_offset_unlock_lookup().

Use this if you need to take a reference to the objects returned by drm_vma_offset_lookup_locked() before releasing this lock again.

This lock must not be used for anything else than extended lookups. You must not call any other VMA helpers while holding this lock.

Note

You’re in atomic-context while holding this lock!

void drm_vma_offset_unlock_lookup(struct drm_vma_offset_manager * mgr)

Unlock lookup for extended private use

Parameters

struct drm_vma_offset_manager * mgr
Manager object

Description

Release lookup-lock. See drm_vma_offset_lock_lookup() for more information.

void drm_vma_node_reset(struct drm_vma_offset_node * node)

Initialize or reset node object

Parameters

struct drm_vma_offset_node * node
Node to initialize or reset

Description

Reset a node to its initial state. This must be called before using it with any VMA offset manager.

This must not be called on an already allocated node, or you will leak memory.

unsigned long drm_vma_node_start(const struct drm_vma_offset_node * node)

Return start address for page-based addressing

Parameters

const struct drm_vma_offset_node * node
Node to inspect

Description

Return the start address of the given node. This can be used as offset into the linear VM space that is provided by the VMA offset manager. Note that this can only be used for page-based addressing. If you need a proper offset for user-space mappings, you must apply “<< PAGE_SHIFT” or use the drm_vma_node_offset_addr() helper instead.

Return

Start address of node for page-based addressing. 0 if the node does not have an offset allocated.

unsigned long drm_vma_node_size(struct drm_vma_offset_node * node)

Return size (page-based)

Parameters

struct drm_vma_offset_node * node
Node to inspect

Description

Return the size as number of pages for the given node. This is the same size that was passed to drm_vma_offset_add(). If no offset is allocated for the node, this is 0.

Return

Size of node as number of pages. 0 if the node does not have an offset allocated.

__u64 drm_vma_node_offset_addr(struct drm_vma_offset_node * node)

Return sanitized offset for user-space mmaps

Parameters

struct drm_vma_offset_node * node
Linked offset node

Description

Same as drm_vma_node_start() but returns the address as a valid offset that can be used for user-space mappings during mmap(). This must not be called on unlinked nodes.

Return

Offset of node for byte-based addressing. 0 if the node does not have an object allocated.

void drm_vma_node_unmap(struct drm_vma_offset_node * node, struct address_space * file_mapping)

Unmap offset node

Parameters

struct drm_vma_offset_node * node
Offset node
struct address_space * file_mapping
Address space to unmap node from

Description

Unmap all userspace mappings for a given offset node. The mappings must be associated with the file_mapping address-space. If no offset exists nothing is done.

This call is unlocked. The caller must guarantee that drm_vma_offset_remove() is not called on this node concurrently.

int drm_vma_node_verify_access(struct drm_vma_offset_node * node, struct drm_file * tag)

Access verification helper for TTM

Parameters

struct drm_vma_offset_node * node
Offset node
struct drm_file * tag
Tag of file to check

Description

This checks whether tag is granted access to node. It is the same as drm_vma_node_is_allowed() but suitable as drop-in helper for TTM verify_access() callbacks.

Return

0 if access is granted, -EACCES otherwise.

void drm_vma_offset_manager_init(struct drm_vma_offset_manager * mgr, unsigned long page_offset, unsigned long size)

Initialize new offset-manager

Parameters

struct drm_vma_offset_manager * mgr
Manager object
unsigned long page_offset
Offset of available memory area (page-based)
unsigned long size
Size of available address space range (page-based)

Description

Initialize a new offset-manager. The offset and area size available for the manager are given as page_offset and size. Both are interpreted as page-numbers, not bytes.

Adding/removing nodes from the manager is locked internally and protected against concurrent access. However, node allocation and destruction is left for the caller. While calling into the vma-manager, a given node must always be guaranteed to be referenced.

void drm_vma_offset_manager_destroy(struct drm_vma_offset_manager * mgr)

Destroy offset manager

Parameters

struct drm_vma_offset_manager * mgr
Manager object

Description

Destroy an object manager which was previously created via drm_vma_offset_manager_init(). The caller must remove all allocated nodes before destroying the manager. Otherwise, drm_mm will refuse to free the requested resources.

The manager must not be accessed after this function is called.

struct drm_vma_offset_node * drm_vma_offset_lookup_locked(struct drm_vma_offset_manager * mgr, unsigned long start, unsigned long pages)

Find node in offset space

Parameters

struct drm_vma_offset_manager * mgr
Manager object
unsigned long start
Start address for object (page-based)
unsigned long pages
Size of object (page-based)

Description

Find a node given a start address and object size. This returns the _best_ match for the given node. That is, start may point somewhere into a valid region and the given node will be returned, as long as the node spans the whole requested area (given the size in number of pages as pages).

Note that before lookup the vma offset manager lookup lock must be acquired with drm_vma_offset_lock_lookup(). See there for an example. This can then be used to implement weakly referenced lookups using kref_get_unless_zero().

Example

drm_vma_offset_lock_lookup(mgr);
node = drm_vma_offset_lookup_locked(mgr);
if (node)
    kref_get_unless_zero(container_of(node, sth, entr));
drm_vma_offset_unlock_lookup(mgr);

Return

Returns NULL if no suitable node can be found. Otherwise, the best match is returned. It’s the caller’s responsibility to make sure the node doesn’t get destroyed before the caller can access it.

int drm_vma_offset_add(struct drm_vma_offset_manager * mgr, struct drm_vma_offset_node * node, unsigned long pages)

Add offset node to manager

Parameters

struct drm_vma_offset_manager * mgr
Manager object
struct drm_vma_offset_node * node
Node to be added
unsigned long pages
Allocation size visible to user-space (in number of pages)

Description

Add a node to the offset-manager. If the node was already added, this does nothing and return 0. pages is the size of the object given in number of pages. After this call succeeds, you can access the offset of the node until it is removed again.

If this call fails, it is safe to retry the operation or call drm_vma_offset_remove(), anyway. However, no cleanup is required in that case.

pages is not required to be the same size as the underlying memory object that you want to map. It only limits the size that user-space can map into their address space.

Return

0 on success, negative error code on failure.

void drm_vma_offset_remove(struct drm_vma_offset_manager * mgr, struct drm_vma_offset_node * node)

Remove offset node from manager

Parameters

struct drm_vma_offset_manager * mgr
Manager object
struct drm_vma_offset_node * node
Node to be removed

Description

Remove a node from the offset manager. If the node wasn’t added before, this does nothing. After this call returns, the offset and size will be 0 until a new offset is allocated via drm_vma_offset_add() again. Helper functions like drm_vma_node_start() and drm_vma_node_offset_addr() will return 0 if no offset is allocated.

int drm_vma_node_allow(struct drm_vma_offset_node * node, struct drm_file * tag)

Add open-file to list of allowed users

Parameters

struct drm_vma_offset_node * node
Node to modify
struct drm_file * tag
Tag of file to remove

Description

Add tag to the list of allowed open-files for this node. If tag is already on this list, the ref-count is incremented.

The list of allowed-users is preserved across drm_vma_offset_add() and drm_vma_offset_remove() calls. You may even call it if the node is currently not added to any offset-manager.

You must remove all open-files the same number of times as you added them before destroying the node. Otherwise, you will leak memory.

This is locked against concurrent access internally.

Return

0 on success, negative error code on internal failure (out-of-mem)

void drm_vma_node_revoke(struct drm_vma_offset_node * node, struct drm_file * tag)

Remove open-file from list of allowed users

Parameters

struct drm_vma_offset_node * node
Node to modify
struct drm_file * tag
Tag of file to remove

Description

Decrement the ref-count of tag in the list of allowed open-files on node. If the ref-count drops to zero, remove tag from the list. You must call this once for every drm_vma_node_allow() on tag.

This is locked against concurrent access internally.

If tag is not on the list, nothing is done.

bool drm_vma_node_is_allowed(struct drm_vma_offset_node * node, struct drm_file * tag)

Check whether an open-file is granted access

Parameters

struct drm_vma_offset_node * node
Node to check
struct drm_file * tag
Tag of file to remove

Description

Search the list in node whether tag is currently on the list of allowed open-files (see drm_vma_node_allow()).

This is locked against concurrent access internally.

Return

true iff filp is on the list

PRIME Buffer Sharing

PRIME is the cross device buffer sharing framework in drm, originally created for the OPTIMUS range of multi-gpu platforms. To userspace PRIME buffers are dma-buf based file descriptors.

Overview and Driver Interface

Similar to GEM global names, PRIME file descriptors are also used to share buffer objects across processes. They offer additional security: as file descriptors must be explicitly sent over UNIX domain sockets to be shared between applications, they can’t be guessed like the globally unique GEM names.

Drivers that support the PRIME API must set the DRIVER_PRIME bit in the struct struct drm_driver driver_features field, and implement the prime_handle_to_fd and prime_fd_to_handle operations.

int (*prime_handle_to_fd)(struct drm_device *dev, struct drm_file *file_priv, uint32_t handle, uint32_t flags, int *prime_fd); int (*prime_fd_to_handle)(struct drm_device *dev, struct drm_file *file_priv, int prime_fd, uint32_t *handle); Those two operations convert a handle to a PRIME file descriptor and vice versa. Drivers must use the kernel dma-buf buffer sharing framework to manage the PRIME file descriptors. Similar to the mode setting API PRIME is agnostic to the underlying buffer object manager, as long as handles are 32bit unsigned integers.

While non-GEM drivers must implement the operations themselves, GEM drivers must use the drm_gem_prime_handle_to_fd() and drm_gem_prime_fd_to_handle() helper functions. Those helpers rely on the driver gem_prime_export and gem_prime_import operations to create a dma-buf instance from a GEM object (dma-buf exporter role) and to create a GEM object from a dma-buf instance (dma-buf importer role).

struct dma_buf * (*gem_prime_export)(struct drm_device *dev, struct drm_gem_object *obj, int flags); struct drm_gem_object * (*gem_prime_import)(struct drm_device *dev, struct dma_buf *dma_buf); These two operations are mandatory for GEM drivers that support PRIME.

PRIME Helper Functions

Drivers can implement gem_prime_export and gem_prime_import in terms of simpler APIs by using the helper functions drm_gem_prime_export and drm_gem_prime_import. These functions implement dma-buf support in terms of six lower-level driver callbacks:

Export callbacks:

  • gem_prime_pin (optional): prepare a GEM object for exporting
  • gem_prime_get_sg_table: provide a scatter/gather table of pinned pages
  • gem_prime_vmap: vmap a buffer exported by your driver
  • gem_prime_vunmap: vunmap a buffer exported by your driver
  • gem_prime_mmap (optional): mmap a buffer exported by your driver

Import callback:

  • gem_prime_import_sg_table (import): produce a GEM object from another driver’s scatter/gather table

PRIME Function References

struct drm_prime_file_private

per-file tracking for PRIME

Definition

struct drm_prime_file_private {
};

Members

Description

This just contains the internal struct dma_buf and handle caches for each struct drm_file used by the PRIME core code.

int drm_gem_map_attach(struct dma_buf * dma_buf, struct dma_buf_attachment * attach)

dma_buf attach implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to attach device to
struct dma_buf_attachment * attach
buffer attachment data

Description

Allocates drm_prime_attachment and calls drm_driver.gem_prime_pin for device specific attachment. This can be used as the dma_buf_ops.attach callback.

Returns 0 on success, negative error code on failure.

void drm_gem_map_detach(struct dma_buf * dma_buf, struct dma_buf_attachment * attach)

dma_buf detach implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to detach from
struct dma_buf_attachment * attach
attachment to be detached

Description

Cleans up dma_buf_attachment. This can be used as the dma_buf_ops.detach callback.

struct sg_table * drm_gem_map_dma_buf(struct dma_buf_attachment * attach, enum dma_data_direction dir)

map_dma_buf implementation for GEM

Parameters

struct dma_buf_attachment * attach
attachment whose scatterlist is to be returned
enum dma_data_direction dir
direction of DMA transfer

Description

Calls drm_driver.gem_prime_get_sg_table and then maps the scatterlist. This can be used as the dma_buf_ops.map_dma_buf callback.

Returns sg_table containing the scatterlist to be returned; returns ERR_PTR on error. May return -EINTR if it is interrupted by a signal.

void drm_gem_unmap_dma_buf(struct dma_buf_attachment * attach, struct sg_table * sgt, enum dma_data_direction dir)

unmap_dma_buf implementation for GEM

Parameters

struct dma_buf_attachment * attach
attachment to unmap buffer from
struct sg_table * sgt
scatterlist info of the buffer to unmap
enum dma_data_direction dir
direction of DMA transfer

Description

Not implemented. The unmap is done at drm_gem_map_detach(). This can be used as the dma_buf_ops.unmap_dma_buf callback.

struct dma_buf * drm_gem_dmabuf_export(struct drm_device * dev, struct dma_buf_export_info * exp_info)

dma_buf export implementation for GEM

Parameters

struct drm_device * dev
parent device for the exported dmabuf
struct dma_buf_export_info * exp_info
the export information used by dma_buf_export()

Description

This wraps dma_buf_export() for use by generic GEM drivers that are using drm_gem_dmabuf_release(). In addition to calling dma_buf_export(), we take a reference to the drm_device and the exported drm_gem_object (stored in dma_buf_export_info.priv) which is released by drm_gem_dmabuf_release().

Returns the new dmabuf.

void drm_gem_dmabuf_release(struct dma_buf * dma_buf)

dma_buf release implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to be released

Description

Generic release function for dma_bufs exported as PRIME buffers. GEM drivers must use this in their dma_buf ops structure as the release callback. drm_gem_dmabuf_release() should be used in conjunction with drm_gem_dmabuf_export().

void * drm_gem_dmabuf_vmap(struct dma_buf * dma_buf)

dma_buf vmap implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to be mapped

Description

Sets up a kernel virtual mapping. This can be used as the dma_buf_ops.vmap callback.

Returns the kernel virtual address.

void drm_gem_dmabuf_vunmap(struct dma_buf * dma_buf, void * vaddr)

dma_buf vunmap implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to be unmapped
void * vaddr
the virtual address of the buffer

Description

Releases a kernel virtual mapping. This can be used as the dma_buf_ops.vunmap callback.

void * drm_gem_dmabuf_kmap(struct dma_buf * dma_buf, unsigned long page_num)

map implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to be mapped
unsigned long page_num
page number within the buffer

Description

Not implemented. This can be used as the dma_buf_ops.map callback.

void drm_gem_dmabuf_kunmap(struct dma_buf * dma_buf, unsigned long page_num, void * addr)

unmap implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to be unmapped
unsigned long page_num
page number within the buffer
void * addr
virtual address of the buffer

Description

Not implemented. This can be used as the dma_buf_ops.unmap callback.

int drm_gem_dmabuf_mmap(struct dma_buf * dma_buf, struct vm_area_struct * vma)

dma_buf mmap implementation for GEM

Parameters

struct dma_buf * dma_buf
buffer to be mapped
struct vm_area_struct * vma
virtual address range

Description

Provides memory mapping for the buffer. This can be used as the dma_buf_ops.mmap callback.

Returns 0 on success or a negative error code on failure.

struct dma_buf * drm_gem_prime_export(struct drm_device * dev, struct drm_gem_object * obj, int flags)

helper library implementation of the export callback

Parameters

struct drm_device * dev
drm_device to export from
struct drm_gem_object * obj
GEM object to export
int flags
flags like DRM_CLOEXEC and DRM_RDWR

Description

This is the implementation of the gem_prime_export functions for GEM drivers using the PRIME helpers.

int drm_gem_prime_handle_to_fd(struct drm_device * dev, struct drm_file * file_priv, uint32_t handle, uint32_t flags, int * prime_fd)

PRIME export function for GEM drivers

Parameters

struct drm_device * dev
dev to export the buffer from
struct drm_file * file_priv
drm file-private structure
uint32_t handle
buffer handle to export
uint32_t flags
flags like DRM_CLOEXEC
int * prime_fd
pointer to storage for the fd id of the create dma-buf

Description

This is the PRIME export function which must be used mandatorily by GEM drivers to ensure correct lifetime management of the underlying GEM object. The actual exporting from GEM object to a dma-buf is done through the gem_prime_export driver callback.

struct drm_gem_object * drm_gem_prime_import_dev(struct drm_device * dev, struct dma_buf * dma_buf, struct device * attach_dev)

core implementation of the import callback

Parameters

struct drm_device * dev
drm_device to import into
struct dma_buf * dma_buf
dma-buf object to import
struct device * attach_dev
struct device to dma_buf attach

Description

This is the core of drm_gem_prime_import. It’s designed to be called by drivers who want to use a different device structure than dev->dev for attaching via dma_buf.

struct drm_gem_object * drm_gem_prime_import(struct drm_device * dev, struct dma_buf * dma_buf)

helper library implementation of the import callback

Parameters

struct drm_device * dev
drm_device to import into
struct dma_buf * dma_buf
dma-buf object to import

Description

This is the implementation of the gem_prime_import functions for GEM drivers using the PRIME helpers.

int drm_gem_prime_fd_to_handle(struct drm_device * dev, struct drm_file * file_priv, int prime_fd, uint32_t * handle)

PRIME import function for GEM drivers

Parameters

struct drm_device * dev
dev to export the buffer from
struct drm_file * file_priv
drm file-private structure
int prime_fd
fd id of the dma-buf which should be imported
uint32_t * handle
pointer to storage for the handle of the imported buffer object

Description

This is the PRIME import function which must be used mandatorily by GEM drivers to ensure correct lifetime management of the underlying GEM object. The actual importing of GEM object from the dma-buf is done through the gem_import_export driver callback.

struct sg_table * drm_prime_pages_to_sg(struct page ** pages, unsigned int nr_pages)

converts a page array into an sg list

Parameters

struct page ** pages
pointer to the array of page pointers to convert
unsigned int nr_pages
length of the page vector

Description

This helper creates an sg table object from a set of pages the driver is responsible for mapping the pages into the importers address space for use with dma_buf itself.

int drm_prime_sg_to_page_addr_arrays(struct sg_table * sgt, struct page ** pages, dma_addr_t * addrs, int max_entries)

convert an sg table into a page array

Parameters

struct sg_table * sgt
scatter-gather table to convert
struct page ** pages
optional array of page pointers to store the page array in
dma_addr_t * addrs
optional array to store the dma bus address of each page
int max_entries
size of both the passed-in arrays

Description

Exports an sg table into an array of pages and addresses. This is currently required by the TTM driver in order to do correct fault handling.

void drm_prime_gem_destroy(struct drm_gem_object * obj, struct sg_table * sg)

helper to clean up a PRIME-imported GEM object

Parameters

struct drm_gem_object * obj
GEM object which was created from a dma-buf
struct sg_table * sg
the sg-table which was pinned at import time

Description

This is the cleanup functions which GEM drivers need to call when they use drm_gem_prime_import to import dma-bufs.

DRM MM Range Allocator

Overview

drm_mm provides a simple range allocator. The drivers are free to use the resource allocator from the linux core if it suits them, the upside of drm_mm is that it’s in the DRM core. Which means that it’s easier to extend for some of the crazier special purpose needs of gpus.

The main data struct is drm_mm, allocations are tracked in drm_mm_node. Drivers are free to embed either of them into their own suitable datastructures. drm_mm itself will not do any memory allocations of its own, so if drivers choose not to embed nodes they need to still allocate them themselves.

The range allocator also supports reservation of preallocated blocks. This is useful for taking over initial mode setting configurations from the firmware, where an object needs to be created which exactly matches the firmware’s scanout target. As long as the range is still free it can be inserted anytime after the allocator is initialized, which helps with avoiding looped dependencies in the driver load sequence.

drm_mm maintains a stack of most recently freed holes, which of all simplistic datastructures seems to be a fairly decent approach to clustering allocations and avoiding too much fragmentation. This means free space searches are O(num_holes). Given that all the fancy features drm_mm supports something better would be fairly complex and since gfx thrashing is a fairly steep cliff not a real concern. Removing a node again is O(1).

drm_mm supports a few features: Alignment and range restrictions can be supplied. Furthermore every drm_mm_node has a color value (which is just an opaque unsigned long) which in conjunction with a driver callback can be used to implement sophisticated placement restrictions. The i915 DRM driver uses this to implement guard pages between incompatible caching domains in the graphics TT.

Two behaviors are supported for searching and allocating: bottom-up and top-down. The default is bottom-up. Top-down allocation can be used if the memory area has different restrictions, or just to reduce fragmentation.

Finally iteration helpers to walk all nodes and all holes are provided as are some basic allocator dumpers for debugging.

Note that this range allocator is not thread-safe, drivers need to protect modifications with their own locking. The idea behind this is that for a full memory manager additional data needs to be protected anyway, hence internal locking would be fully redundant.

LRU Scan/Eviction Support

Very often GPUs need to have continuous allocations for a given object. When evicting objects to make space for a new one it is therefore not most efficient when we simply start to select all objects from the tail of an LRU until there’s a suitable hole: Especially for big objects or nodes that otherwise have special allocation constraints there’s a good chance we evict lots of (smaller) objects unnecessarily.

The DRM range allocator supports this use-case through the scanning interfaces. First a scan operation needs to be initialized with drm_mm_scan_init() or drm_mm_scan_init_with_range(). The driver adds objects to the roster, probably by walking an LRU list, but this can be freely implemented. Eviction candiates are added using drm_mm_scan_add_block() until a suitable hole is found or there are no further evictable objects. Eviction roster metadata is tracked in struct drm_mm_scan.

The driver must walk through all objects again in exactly the reverse order to restore the allocator state. Note that while the allocator is used in the scan mode no other operation is allowed.

Finally the driver evicts all objects selected (drm_mm_scan_remove_block() reported true) in the scan, and any overlapping nodes after color adjustment (drm_mm_scan_color_evict()). Adding and removing an object is O(1), and since freeing a node is also O(1) the overall complexity is O(scanned_objects). So like the free stack which needs to be walked before a scan operation even begins this is linear in the number of objects. It doesn’t seem to hurt too badly.

DRM MM Range Allocator Function References

enum drm_mm_insert_mode

control search and allocation behaviour

Constants

DRM_MM_INSERT_BEST

Search for the smallest hole (within the search range) that fits the desired node.

Allocates the node from the bottom of the found hole.

DRM_MM_INSERT_LOW

Search for the lowest hole (address closest to 0, within the search range) that fits the desired node.

Allocates the node from the bottom of the found hole.

DRM_MM_INSERT_HIGH

Search for the highest hole (address closest to U64_MAX, within the search range) that fits the desired node.

Allocates the node from the top of the found hole. The specified alignment for the node is applied to the base of the node (drm_mm_node.start).

DRM_MM_INSERT_EVICT

Search for the most recently evicted hole (within the search range) that fits the desired node. This is appropriate for use immediately after performing an eviction scan (see drm_mm_scan_init()) and removing the selected nodes to form a hole.

Allocates the node from the bottom of the found hole.

DRM_MM_INSERT_ONCE
Only check the first hole for suitablity and report -ENOSPC immediately otherwise, rather than check every hole until a suitable one is found. Can only be used in conjunction with another search method such as DRM_MM_INSERT_HIGH or DRM_MM_INSERT_LOW.
DRM_MM_INSERT_HIGHEST

Only check the highest hole (the hole with the largest address) and insert the node at the top of the hole or report -ENOSPC if unsuitable.

Does not search all holes.

DRM_MM_INSERT_LOWEST

Only check the lowest hole (the hole with the smallest address) and insert the node at the bottom of the hole or report -ENOSPC if unsuitable.

Does not search all holes.

Description

The struct drm_mm range manager supports finding a suitable modes using a number of search trees. These trees are oranised by size, by address and in most recent eviction order. This allows the user to find either the smallest hole to reuse, the lowest or highest address to reuse, or simply reuse the most recent eviction that fits. When allocating the drm_mm_node from within the hole, the drm_mm_insert_mode also dictate whether to allocate the lowest matching address or the highest.

struct drm_mm_node

allocated block in the DRM allocator

Definition

struct drm_mm_node {
  unsigned long color;
  u64 start;
  u64 size;
};

Members

color
Opaque driver-private tag.
start
Start address of the allocated block.
size
Size of the allocated block.

Description

This represents an allocated block in a drm_mm allocator. Except for pre-reserved nodes inserted using drm_mm_reserve_node() the structure is entirely opaque and should only be accessed through the provided funcions. Since allocation of these nodes is entirely handled by the driver they can be embedded.

struct drm_mm

DRM allocator

Definition

struct drm_mm {
  void (*color_adjust)(const struct drm_mm_node *node,unsigned long color, u64 *start, u64 *end);
};

Members

color_adjust
Optional driver callback to further apply restrictions on a hole. The node argument points at the node containing the hole from which the block would be allocated (see drm_mm_hole_follows() and friends). The other arguments are the size of the block to be allocated. The driver can adjust the start and end as needed to e.g. insert guard pages.

Description

DRM range allocator with a few special functions and features geared towards managing GPU memory. Except for the color_adjust callback the structure is entirely opaque and should only be accessed through the provided functions and macros. This structure can be embedded into larger driver structures.

struct drm_mm_scan

DRM allocator eviction roaster data

Definition

struct drm_mm_scan {
};

Members

Description

This structure tracks data needed for the eviction roaster set up using drm_mm_scan_init(), and used with drm_mm_scan_add_block() and drm_mm_scan_remove_block(). The structure is entirely opaque and should only be accessed through the provided functions and macros. It is meant to be allocated temporarily by the driver on the stack.

bool drm_mm_node_allocated(const struct drm_mm_node * node)

checks whether a node is allocated

Parameters

const struct drm_mm_node * node
drm_mm_node to check

Description

Drivers are required to clear a node prior to using it with the drm_mm range manager.

Drivers should use this helper for proper encapsulation of drm_mm internals.

Return

True if the node is allocated.

bool drm_mm_initialized(const struct drm_mm * mm)

checks whether an allocator is initialized

Parameters

const struct drm_mm * mm
drm_mm to check

Description

Drivers should clear the struct drm_mm prior to initialisation if they want to use this function.

Drivers should use this helper for proper encapsulation of drm_mm internals.

Return

True if the mm is initialized.

bool drm_mm_hole_follows(const struct drm_mm_node * node)

checks whether a hole follows this node

Parameters

const struct drm_mm_node * node
drm_mm_node to check

Description

Holes are embedded into the drm_mm using the tail of a drm_mm_node. If you wish to know whether a hole follows this particular node, query this function. See also drm_mm_hole_node_start() and drm_mm_hole_node_end().

Return

True if a hole follows the node.

u64 drm_mm_hole_node_start(const struct drm_mm_node * hole_node)

computes the start of the hole following node

Parameters

const struct drm_mm_node * hole_node
drm_mm_node which implicitly tracks the following hole

Description

This is useful for driver-specific debug dumpers. Otherwise drivers should not inspect holes themselves. Drivers must check first whether a hole indeed follows by looking at drm_mm_hole_follows()

Return

Start of the subsequent hole.

u64 drm_mm_hole_node_end(const struct drm_mm_node * hole_node)

computes the end of the hole following node

Parameters

const struct drm_mm_node * hole_node
drm_mm_node which implicitly tracks the following hole

Description

This is useful for driver-specific debug dumpers. Otherwise drivers should not inspect holes themselves. Drivers must check first whether a hole indeed follows by looking at drm_mm_hole_follows().

Return

End of the subsequent hole.

drm_mm_nodes(mm)

list of nodes under the drm_mm range manager

Parameters

mm
the struct drm_mm range manger

Description

As the drm_mm range manager hides its node_list deep with its structure, extracting it looks painful and repetitive. This is not expected to be used outside of the drm_mm_for_each_node() macros and similar internal functions.

Return

The node list, may be empty.

drm_mm_for_each_node(entry, mm)

iterator to walk over all allocated nodes

Parameters

entry
struct drm_mm_node to assign to in each iteration step
mm
drm_mm allocator to walk

Description

This iterator walks over all nodes in the range allocator. It is implemented with list_for_each(), so not save against removal of elements.

drm_mm_for_each_node_safe(entry, next, mm)

iterator to walk over all allocated nodes

Parameters

entry
struct drm_mm_node to assign to in each iteration step
next
struct drm_mm_node to store the next step
mm
drm_mm allocator to walk

Description

This iterator walks over all nodes in the range allocator. It is implemented with list_for_each_safe(), so save against removal of elements.

drm_mm_for_each_hole(pos, mm, hole_start, hole_end)

iterator to walk over all holes

Parameters

pos
drm_mm_node used internally to track progress
mm
drm_mm allocator to walk
hole_start
ulong variable to assign the hole start to on each iteration
hole_end
ulong variable to assign the hole end to on each iteration

Description

This iterator walks over all holes in the range allocator. It is implemented with list_for_each(), so not save against removal of elements. entry is used internally and will not reflect a real drm_mm_node for the very first hole. Hence users of this iterator may not access it.

Implementation Note: We need to inline list_for_each_entry in order to be able to set hole_start and hole_end on each iteration while keeping the macro sane.

int drm_mm_insert_node_generic(struct drm_mm * mm, struct drm_mm_node * node, u64 size, u64 alignment, unsigned long color, enum drm_mm_insert_mode mode)

search for space and insert node

Parameters

struct drm_mm * mm
drm_mm to allocate from
struct drm_mm_node * node
preallocate node to insert
u64 size
size of the allocation
u64 alignment
alignment of the allocation
unsigned long color
opaque tag value to use for this node
enum drm_mm_insert_mode mode
fine-tune the allocation search and placement

Description

This is a simplified version of drm_mm_insert_node_in_range() with no range restrictions applied.

The preallocated node must be cleared to 0.

Return

0 on success, -ENOSPC if there’s no suitable hole.

int drm_mm_insert_node(struct drm_mm * mm, struct drm_mm_node * node, u64 size)

search for space and insert node

Parameters

struct drm_mm * mm
drm_mm to allocate from
struct drm_mm_node * node
preallocate node to insert
u64 size
size of the allocation

Description

This is a simplified version of drm_mm_insert_node_generic() with color set to 0.

The preallocated node must be cleared to 0.

Return

0 on success, -ENOSPC if there’s no suitable hole.

bool drm_mm_clean(const struct drm_mm * mm)

checks whether an allocator is clean

Parameters

const struct drm_mm * mm
drm_mm allocator to check

Return

True if the allocator is completely free, false if there’s still a node allocated in it.

drm_mm_for_each_node_in_range(node__, mm__, start__, end__)

iterator to walk over a range of allocated nodes

Parameters

node__
drm_mm_node structure to assign to in each iteration step
mm__
drm_mm allocator to walk
start__
starting offset, the first node will overlap this
end__
ending offset, the last node will start before this (but may overlap)

Description

This iterator walks over all nodes in the range allocator that lie between start and end. It is implemented similarly to list_for_each(), but using the internal interval tree to accelerate the search for the starting node, and so not safe against removal of elements. It assumes that end is within (or is the upper limit of) the drm_mm allocator. If [start, end] are beyond the range of the drm_mm, the iterator may walk over the special _unallocated_ drm_mm.head_node, and may even continue indefinitely.

void drm_mm_scan_init(struct drm_mm_scan * scan, struct drm_mm * mm, u64 size, u64 alignment, unsigned long color, enum drm_mm_insert_mode mode)

initialize lru scanning

Parameters

struct drm_mm_scan * scan
scan state
struct drm_mm * mm
drm_mm to scan
u64 size
size of the allocation
u64 alignment
alignment of the allocation
unsigned long color
opaque tag value to use for the allocation
enum drm_mm_insert_mode mode
fine-tune the allocation search and placement

Description

This is a simplified version of drm_mm_scan_init_with_range() with no range restrictions applied.

This simply sets up the scanning routines with the parameters for the desired hole.

Warning: As long as the scan list is non-empty, no other operations than adding/removing nodes to/from the scan list are allowed.

int drm_mm_reserve_node(struct drm_mm * mm, struct drm_mm_node * node)

insert an pre-initialized node

Parameters

struct drm_mm * mm
drm_mm allocator to insert node into
struct drm_mm_node * node
drm_mm_node to insert

Description

This functions inserts an already set-up drm_mm_node into the allocator, meaning that start, size and color must be set by the caller. All other fields must be cleared to 0. This is useful to initialize the allocator with preallocated objects which must be set-up before the range allocator can be set-up, e.g. when taking over a firmware framebuffer.

Return

0 on success, -ENOSPC if there’s no hole where node is.

int drm_mm_insert_node_in_range(struct drm_mm *const mm, struct drm_mm_node *const node, u64 size, u64 alignment, unsigned long color, u64 range_start, u64 range_end, enum drm_mm_insert_mode mode)

ranged search for space and insert node

Parameters

struct drm_mm *const mm
drm_mm to allocate from
struct drm_mm_node *const node
preallocate node to insert
u64 size
size of the allocation
u64 alignment
alignment of the allocation
unsigned long color
opaque tag value to use for this node
u64 range_start
start of the allowed range for this node
u64 range_end
end of the allowed range for this node
enum drm_mm_insert_mode mode
fine-tune the allocation search and placement

Description

The preallocated node must be cleared to 0.

Return

0 on success, -ENOSPC if there’s no suitable hole.

void drm_mm_remove_node(struct drm_mm_node * node)

Remove a memory node from the allocator.

Parameters

struct drm_mm_node * node
drm_mm_node to remove

Description

This just removes a node from its drm_mm allocator. The node does not need to be cleared again before it can be re-inserted into this or any other drm_mm allocator. It is a bug to call this function on a unallocated node.

void drm_mm_replace_node(struct drm_mm_node * old, struct drm_mm_node * new)

move an allocation from old to new

Parameters

struct drm_mm_node * old
drm_mm_node to remove from the allocator
struct drm_mm_node * new
drm_mm_node which should inherit old’s allocation

Description

This is useful for when drivers embed the drm_mm_node structure and hence can’t move allocations by reassigning pointers. It’s a combination of remove and insert with the guarantee that the allocation start will match.

void drm_mm_scan_init_with_range(struct drm_mm_scan * scan, struct drm_mm * mm, u64 size, u64 alignment, unsigned long color, u64 start, u64 end, enum drm_mm_insert_mode mode)

initialize range-restricted lru scanning

Parameters

struct drm_mm_scan * scan
scan state
struct drm_mm * mm
drm_mm to scan
u64 size
size of the allocation
u64 alignment
alignment of the allocation
unsigned long color
opaque tag value to use for the allocation
u64 start
start of the allowed range for the allocation
u64 end
end of the allowed range for the allocation
enum drm_mm_insert_mode mode
fine-tune the allocation search and placement

Description

This simply sets up the scanning routines with the parameters for the desired hole.

Warning: As long as the scan list is non-empty, no other operations than adding/removing nodes to/from the scan list are allowed.

bool drm_mm_scan_add_block(struct drm_mm_scan * scan, struct drm_mm_node * node)

add a node to the scan list

Parameters

struct drm_mm_scan * scan
the active drm_mm scanner
struct drm_mm_node * node
drm_mm_node to add

Description

Add a node to the scan list that might be freed to make space for the desired hole.

Return

True if a hole has been found, false otherwise.

bool drm_mm_scan_remove_block(struct drm_mm_scan * scan, struct drm_mm_node * node)

remove a node from the scan list

Parameters

struct drm_mm_scan * scan
the active drm_mm scanner
struct drm_mm_node * node
drm_mm_node to remove

Description

Nodes must be removed in exactly the reverse order from the scan list as they have been added (e.g. using list_add() as they are added and then list_for_each() over that eviction list to remove), otherwise the internal state of the memory manager will be corrupted.

When the scan list is empty, the selected memory nodes can be freed. An immediately following drm_mm_insert_node_in_range_generic() or one of the simpler versions of that function with !DRM_MM_SEARCH_BEST will then return the just freed block (because its at the top of the free_stack list).

Return

True if this block should be evicted, false otherwise. Will always return false when no hole has been found.

struct drm_mm_node * drm_mm_scan_color_evict(struct drm_mm_scan * scan)

evict overlapping nodes on either side of hole

Parameters

struct drm_mm_scan * scan
drm_mm scan with target hole

Description

After completing an eviction scan and removing the selected nodes, we may need to remove a few more nodes from either side of the target hole if mm.color_adjust is being used.

Return

A node to evict, or NULL if there are no overlapping nodes.

void drm_mm_init(struct drm_mm * mm, u64 start, u64 size)

initialize a drm-mm allocator

Parameters

struct drm_mm * mm
the drm_mm structure to initialize
u64 start
start of the range managed by mm
u64 size
end of the range managed by mm

Description

Note that mm must be cleared to 0 before calling this function.

void drm_mm_takedown(struct drm_mm * mm)

clean up a drm_mm allocator

Parameters

struct drm_mm * mm
drm_mm allocator to clean up

Description

Note that it is a bug to call this function on an allocator which is not clean.

void drm_mm_print(const struct drm_mm * mm, struct drm_printer * p)

print allocator state

Parameters

const struct drm_mm * mm
drm_mm allocator to print
struct drm_printer * p
DRM printer to use

DRM Cache Handling

void drm_clflush_pages(struct page * pages, unsigned long num_pages)

Flush dcache lines of a set of pages.

Parameters

struct page * pages
List of pages to be flushed.
unsigned long num_pages
Number of pages in the array.

Description

Flush every data cache line entry that points to an address belonging to a page in the array.

void drm_clflush_sg(struct sg_table * st)

Flush dcache lines pointing to a scather-gather.

Parameters

struct sg_table * st
struct sg_table.

Description

Flush every data cache line entry that points to an address in the sg.

void drm_clflush_virt_range(void * addr, unsigned long length)

Flush dcache lines of a region

Parameters

void * addr
Initial kernel memory address.
unsigned long length
Region size.

Description

Flush every data cache line entry that points to an address in the region requested.

DRM Sync Objects

DRM synchronisation objects (syncobj, see struct drm_syncobj) are persistent objects that contain an optional fence. The fence can be updated with a new fence, or be NULL.

syncobj’s can be waited upon, where it will wait for the underlying fence.

syncobj’s can be export to fd’s and back, these fd’s are opaque and have no other use case, except passing the syncobj between processes.

Their primary use-case is to implement Vulkan fences and semaphores.

syncobj have a kref reference count, but also have an optional file. The file is only created once the syncobj is exported. The file takes a reference on the kref.

struct drm_syncobj

sync object.

Definition

struct drm_syncobj {
  struct kref refcount;
  struct dma_fence __rcu *fence;
  struct list_head cb_list;
  spinlock_t lock;
  struct file *file;
};

Members

refcount
Reference count of this object.
fence

NULL or a pointer to the fence bound to this object.

This field should not be used directly. Use drm_syncobj_fence_get() and drm_syncobj_replace_fence() instead.

cb_list
List of callbacks to call when the fence gets replaced.
lock
Protects cb_list and write-locks fence.
file
A file backing for this syncobj.

Description

This structure defines a generic sync object which wraps a dma_fence.

struct drm_syncobj_cb

callback for drm_syncobj_add_callback

Definition

struct drm_syncobj_cb {
  struct list_head node;
  drm_syncobj_func_t func;
};

Members

node
used by drm_syncob_add_callback to append this struct to drm_syncobj.cb_list
func
drm_syncobj_func_t to call

Description

This struct will be initialized by drm_syncobj_add_callback, additional data can be passed along by embedding drm_syncobj_cb in another struct. The callback will get called the next time drm_syncobj_replace_fence is called.

void drm_syncobj_get(struct drm_syncobj * obj)

acquire a syncobj reference

Parameters

struct drm_syncobj * obj
sync object

Description

This acquires an additional reference to obj. It is illegal to call this without already holding a reference. No locks required.

void drm_syncobj_put(struct drm_syncobj * obj)

release a reference to a sync object.

Parameters

struct drm_syncobj * obj
sync object.
struct dma_fence * drm_syncobj_fence_get(struct drm_syncobj * syncobj)

get a reference to a fence in a sync object

Parameters

struct drm_syncobj * syncobj
sync object.

Description

This acquires additional reference to drm_syncobj.fence contained in obj, if not NULL. It is illegal to call this without already holding a reference. No locks required.

Return

Either the fence of obj or NULL if there’s none.

struct drm_syncobj * drm_syncobj_find(struct drm_file * file_private, u32 handle)

lookup and reference a sync object.

Parameters

struct drm_file * file_private
drm file private pointer
u32 handle
sync object handle to lookup.

Description

Returns a reference to the syncobj pointed to by handle or NULL. The reference must be released by calling drm_syncobj_put().

void drm_syncobj_add_callback(struct drm_syncobj * syncobj, struct drm_syncobj_cb * cb, drm_syncobj_func_t func)

adds a callback to syncobj::cb_list

Parameters

struct drm_syncobj * syncobj
Sync object to which to add the callback
struct drm_syncobj_cb * cb
Callback to add
drm_syncobj_func_t func
Func to use when initializing the drm_syncobj_cb struct

Description

This adds a callback to be called next time the fence is replaced

void drm_syncobj_remove_callback(struct drm_syncobj * syncobj, struct drm_syncobj_cb * cb)

removes a callback to syncobj::cb_list

Parameters

struct drm_syncobj * syncobj
Sync object from which to remove the callback
struct drm_syncobj_cb * cb
Callback to remove
void drm_syncobj_replace_fence(struct drm_syncobj * syncobj, struct dma_fence * fence)

replace fence in a sync object.

Parameters

struct drm_syncobj * syncobj
Sync object to replace fence in
struct dma_fence * fence
fence to install in sync file.

Description

This replaces the fence on a sync object.

int drm_syncobj_find_fence(struct drm_file * file_private, u32 handle, struct dma_fence ** fence)

lookup and reference the fence in a sync object

Parameters

struct drm_file * file_private
drm file private pointer
u32 handle
sync object handle to lookup.
struct dma_fence ** fence
out parameter for the fence

Description

This is just a convenience function that combines drm_syncobj_find() and drm_syncobj_fence_get().

Returns 0 on success or a negative error value on failure. On success fence contains a reference to the fence, which must be released by calling dma_fence_put().

void drm_syncobj_free(struct kref * kref)

free a sync object.

Parameters

struct kref * kref
kref to free.

Description

Only to be called from kref_put in drm_syncobj_put.

int drm_syncobj_create(struct drm_syncobj ** out_syncobj, uint32_t flags, struct dma_fence * fence)

create a new syncobj

Parameters

struct drm_syncobj ** out_syncobj
returned syncobj
uint32_t flags
DRM_SYNCOBJ_* flags
struct dma_fence * fence
if non-NULL, the syncobj will represent this fence

Description

This is the first function to create a sync object. After creating, drivers probably want to make it available to userspace, either through drm_syncobj_get_handle() or drm_syncobj_get_fd().

Returns 0 on success or a negative error value on failure.

int drm_syncobj_get_handle(struct drm_file * file_private, struct drm_syncobj * syncobj, u32 * handle)

get a handle from a syncobj

Parameters

struct drm_file * file_private
drm file private pointer
struct drm_syncobj * syncobj
Sync object to export
u32 * handle
out parameter with the new handle

Description

Exports a sync object created with drm_syncobj_create() as a handle on file_private to userspace.

Returns 0 on success or a negative error value on failure.

int drm_syncobj_get_fd(struct drm_syncobj * syncobj, int * p_fd)

get a file descriptor from a syncobj

Parameters

struct drm_syncobj * syncobj
Sync object to export
int * p_fd
out parameter with the new file descriptor

Description

Exports a sync object created with drm_syncobj_create() as a file descriptor.

Returns 0 on success or a negative error value on failure.

GPU Scheduler

Overview

The GPU scheduler provides entities which allow userspace to push jobs into software queues which are then scheduled on a hardware run queue. The software queues have a priority among them. The scheduler selects the entities from the run queue using a FIFO. The scheduler provides dependency handling features among jobs. The driver is supposed to provide callback functions for backend operations to the scheduler like submitting a job to hardware run queue, returning the dependencies of a job etc.

The organisation of the scheduler is the following:

  1. Each hw run queue has one scheduler
  2. Each scheduler has multiple run queues with different priorities (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL)
  3. Each scheduler run queue has a queue of entities to schedule
  4. Entities themselves maintain a queue of jobs that will be scheduled on the hardware.

The jobs in a entity are always scheduled in the order that they were pushed.

Scheduler Function References

struct drm_sched_entity

A wrapper around a job queue (typically attached to the DRM file_priv).

Definition

struct drm_sched_entity {
  struct list_head                list;
  struct drm_sched_rq             *rq;
  spinlock_t rq_lock;
  struct spsc_queue               job_queue;
  atomic_t fence_seq;
  uint64_t fence_context;
  struct dma_fence                *dependency;
  struct dma_fence_cb             cb;
  atomic_t *guilty;
  struct dma_fence                *last_scheduled;
  struct task_struct              *last_user;
};

Members

list
used to append this struct to the list of entities in the runqueue.
rq
runqueue to which this entity belongs.
rq_lock
lock to modify the runqueue to which this entity belongs.
job_queue
the list of jobs of this entity.
fence_seq
a linearly increasing seqno incremented with each new drm_sched_fence which is part of the entity.
fence_context
a unique context for all the fences which belong to this entity. The drm_sched_fence.scheduled uses the fence_context but drm_sched_fence.finished uses fence_context + 1.
dependency
the dependency fence of the job which is on the top of the job queue.
cb
callback for the dependency fence above.
guilty
points to ctx’s guilty.
last_scheduled
points to the finished fence of the last scheduled job.
last_user
last group leader pushing a job into the entity.

Description

Entities will emit jobs in order to their corresponding hardware ring, and the scheduler will alternate between entities based on scheduling policy.

struct drm_sched_rq

queue of entities to be scheduled.

Definition

struct drm_sched_rq {
  spinlock_t lock;
  struct drm_gpu_scheduler        *sched;
  struct list_head                entities;
  struct drm_sched_entity         *current_entity;
};

Members

lock
to modify the entities list.
sched
the scheduler to which this rq belongs to.
entities
list of the entities to be scheduled.
current_entity
the entity which is to be scheduled.

Description

Run queue is a set of entities scheduling command submissions for one specific ring. It implements the scheduling policy that selects the next entity to emit commands from.

struct drm_sched_fence

fences corresponding to the scheduling of a job.

Definition

struct drm_sched_fence {
  struct dma_fence                scheduled;
  struct dma_fence                finished;
  struct dma_fence_cb             cb;
  struct dma_fence                *parent;
  struct drm_gpu_scheduler        *sched;
  spinlock_t lock;
  void *owner;
};

Members

scheduled
this fence is what will be signaled by the scheduler when the job is scheduled.
finished

this fence is what will be signaled by the scheduler when the job is completed.

When setting up an out fence for the job, you should use this, since it’s available immediately upon drm_sched_job_init(), and the fence returned by the driver from run_job() won’t be created until the dependencies have resolved.

cb
the callback for the parent fence below.
parent
the fence returned by drm_sched_backend_ops.run_job when scheduling the job on hardware. We signal the drm_sched_fence.finished fence once parent is signalled.
sched
the scheduler instance to which the job having this struct belongs to.
lock
the lock used by the scheduled and the finished fences.
owner
job owner for debugging
struct drm_sched_job

A job to be run by an entity.

Definition

struct drm_sched_job {
  struct spsc_node                queue_node;
  struct drm_gpu_scheduler        *sched;
  struct drm_sched_fence          *s_fence;
  struct dma_fence_cb             finish_cb;
  struct work_struct              finish_work;
  struct list_head                node;
  struct delayed_work             work_tdr;
  uint64_t id;
  atomic_t karma;
  enum drm_sched_priority         s_priority;
  struct drm_sched_entity  *entity;
};

Members

queue_node
used to append this struct to the queue of jobs in an entity.
sched
the scheduler instance on which this job is scheduled.
s_fence
contains the fences for the scheduling of job.
finish_cb
the callback for the finished fence.
finish_work
schedules the function drm_sched_job_finish once the job has finished to remove the job from the drm_gpu_scheduler.ring_mirror_list.
node
used to append this struct to the drm_gpu_scheduler.ring_mirror_list.
work_tdr
schedules a delayed call to drm_sched_job_timedout after the timeout interval is over.
id
a unique id assigned to each job scheduled on the scheduler.
karma
increment on every hang caused by this job. If this exceeds the hang limit of the scheduler then the job is marked guilty and will not be scheduled further.
s_priority
the priority of the job.
entity
the entity to which this job belongs.

Description

A job is created by the driver using drm_sched_job_init(), and should call drm_sched_entity_push_job() once it wants the scheduler to schedule the job.

struct drm_sched_backend_ops

Definition

struct drm_sched_backend_ops {
  struct dma_fence *(*dependency)(struct drm_sched_job *sched_job, struct drm_sched_entity *s_entity);
  struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
  void (*timedout_job)(struct drm_sched_job *sched_job);
  void (*free_job)(struct drm_sched_job *sched_job);
};

Members

dependency
Called when the scheduler is considering scheduling this job next, to get another struct dma_fence for this job to block on. Once it returns NULL, run_job() may be called.
run_job
Called to execute the job once all of the dependencies have been resolved. This may be called multiple times, if timedout_job() has happened and drm_sched_job_recovery() decides to try it again.
timedout_job
Called when a job has taken too long to execute, to trigger GPU recovery.
free_job
Called once the job’s finished fence has been signaled and it’s time to clean it up.

Description

Define the backend operations called by the scheduler, these functions should be implemented in driver side.

struct drm_gpu_scheduler

Definition

struct drm_gpu_scheduler {
  const struct drm_sched_backend_ops      *ops;
  uint32_t hw_submission_limit;
  long timeout;
  const char                      *name;
  struct drm_sched_rq             sched_rq[DRM_SCHED_PRIORITY_MAX];
  wait_queue_head_t wake_up_worker;
  wait_queue_head_t job_scheduled;
  atomic_t hw_rq_count;
  atomic64_t job_id_count;
  struct task_struct              *thread;
  struct list_head                ring_mirror_list;
  spinlock_t job_list_lock;
  int hang_limit;
};

Members

ops
backend operations provided by the driver.
hw_submission_limit
the max size of the hardware queue.
timeout
the time after which a job is removed from the scheduler.
name
name of the ring for which this scheduler is being used.
sched_rq
priority wise array of run queues.
wake_up_worker
the wait queue on which the scheduler sleeps until a job is ready to be scheduled.
job_scheduled
once drm_sched_entity_do_release is called the scheduler waits on this wait queue until all the scheduled jobs are finished.
hw_rq_count
the number of jobs currently in the hardware queue.
job_id_count
used to assign unique id to the each job.
thread
the kthread on which the scheduler which run.
ring_mirror_list
the list of jobs which are currently in the job queue.
job_list_lock
lock to protect the ring_mirror_list.
hang_limit
once the hangs by a job crosses this limit then it is marked guilty and it will be considered for scheduling further.

Description

One scheduler is implemented for each hardware ring.

int drm_sched_entity_init(struct drm_sched_entity * entity, struct drm_sched_rq ** rq_list, unsigned int num_rq_list, atomic_t * guilty)

Init a context entity used by scheduler when submit to HW ring.

Parameters

struct drm_sched_entity * entity
scheduler entity to init
struct drm_sched_rq ** rq_list
the list of run queue on which jobs from this entity can be submitted
unsigned int num_rq_list
number of run queue in rq_list
atomic_t * guilty
atomic_t set to 1 when a job on this queue is found to be guilty causing a timeout

Note

the rq_list should have atleast one element to schedule
the entity

Returns 0 on success or a negative error code on failure.

long drm_sched_entity_flush(struct drm_sched_entity * entity, long timeout)

Flush a context entity

Parameters

struct drm_sched_entity * entity
scheduler entity
long timeout
time to wait in for Q to become empty in jiffies.

Description

Splitting drm_sched_entity_fini() into two functions, The first one does the waiting, removes the entity from the runqueue and returns an error when the process was killed.

Returns the remaining time in jiffies left from the input timeout

void drm_sched_entity_fini(struct drm_sched_entity * entity)

Destroy a context entity

Parameters

struct drm_sched_entity * entity
scheduler entity

Description

This should be called after drm_sched_entity_do_release. It goes over the entity and signals all jobs with an error code if the process was killed.

void drm_sched_entity_destroy(struct drm_sched_entity * entity)

Destroy a context entity

Parameters

struct drm_sched_entity * entity
scheduler entity

Description

Calls drm_sched_entity_do_release() and drm_sched_entity_cleanup()

void drm_sched_entity_set_rq(struct drm_sched_entity * entity, struct drm_sched_rq * rq)

Sets the run queue for an entity

Parameters

struct drm_sched_entity * entity
scheduler entity
struct drm_sched_rq * rq
scheduler run queue

Description

Sets the run queue for an entity and removes the entity from the previous run queue in which was present.

bool drm_sched_dependency_optimized(struct dma_fence * fence, struct drm_sched_entity * entity)

Parameters

struct dma_fence * fence
the dependency fence
struct drm_sched_entity * entity
the entity which depends on the above fence

Description

Returns true if the dependency can be optimized and false otherwise

void drm_sched_entity_push_job(struct drm_sched_job * sched_job, struct drm_sched_entity * entity)

Submit a job to the entity’s job queue

Parameters

struct drm_sched_job * sched_job
job to submit
struct drm_sched_entity * entity
scheduler entity

Note

To guarantee that the order of insertion to queue matches the job’s fence sequence number this function should be called with drm_sched_job_init under common lock.

Returns 0 for success, negative error code otherwise.

void drm_sched_hw_job_reset(struct drm_gpu_scheduler * sched, struct drm_sched_job * bad)

stop the scheduler if it contains the bad job

Parameters

struct drm_gpu_scheduler * sched
scheduler instance
struct drm_sched_job * bad
bad scheduler job
void drm_sched_job_recovery(struct drm_gpu_scheduler * sched)

recover jobs after a reset

Parameters

struct drm_gpu_scheduler * sched
scheduler instance
int drm_sched_job_init(struct drm_sched_job * job, struct drm_sched_entity * entity, void * owner)

init a scheduler job

Parameters

struct drm_sched_job * job
scheduler job to init
struct drm_sched_entity * entity
scheduler entity to use
void * owner
job owner for debugging

Description

Refer to drm_sched_entity_push_job() documentation for locking considerations.

Returns 0 for success, negative error code otherwise.

int drm_sched_init(struct drm_gpu_scheduler * sched, const struct drm_sched_backend_ops * ops, unsigned hw_submission, unsigned hang_limit, long timeout, const char * name)

Init a gpu scheduler instance

Parameters

struct drm_gpu_scheduler * sched
scheduler instance
const struct drm_sched_backend_ops * ops
backend operations for this scheduler
unsigned hw_submission
number of hw submissions that can be in flight
unsigned hang_limit
number of times to allow a job to hang before dropping it
long timeout
timeout value in jiffies for the scheduler
const char * name
name used for debugging

Description

Return 0 on success, otherwise error code.

void drm_sched_fini(struct drm_gpu_scheduler * sched)

Destroy a gpu scheduler

Parameters

struct drm_gpu_scheduler * sched
scheduler instance

Description

Tears down and cleans up the scheduler.