Improving Linux* Window Systems with DRM Format Modifiers
Linux has too many window systems. Ten years ago, there was only X11*. Now, we have X11, Wayland*, Mir*, Android*, and several other one-off window systems. Supporting all these window systems is difficult because window system issues weave themselves throughout your driver in weird ways. As the number of window systems increases, so does the complexity of the driver. There is also a significant burden on window system developers as they have to support multiple driver stacks.
Window systems and driver interactions have a long history. When OpenGL was first released by SGI*, it was a fork of IRIS* GL which was SGI’s graphics library. IRIS GL contained not only commands for 3D drawing but also window system management, input handling, and everything else you would need to write a GUI. When they released it as OpenGL, they stripped out everything except the 3D rendering part but it remained intrinsically tied to the window system. The mental model was that the OpenGL driver ran on top of the window system as much as it did on hardware. The way to access OpenGL was to first create a window and then ask the window system for an OpenGL context on that window. The roots of the window system penetrated deep into the 3D driver and, in the old days of DRI1 on Linux, even the kernel had to be aware of things such as clipping rectangles.
Having 3D closely tied to the window system is all well and good if you're making a normal 3D application on a platform such as Windows* or MacOS* where there is only one window system or even Linux if you only care about X11. However, in the modern landscape, things have changed. There are now applications such as video transcoders or scientific applications that want to use 3D rendering but neither need nor want a window. With compositors, applications are no longer rendering directly to the front-buffer so there is no need for the kernel to be as directly involved in resource sharing. In short, the close ties between the window system and the 3D driver are no longer needed and don’t make much sense in the modern world.
There is also a significant scaling problem. There are many more Linux window systems now than just X11: Wayland, Mir, and the Chrome OS* window system to name a few. With the rise of all these window systems, each of them having deep ties to the driver doesn't scale. This scaling problem goes both ways. On the one hand, we (as a driver team) don't want to carry window system integration code for several different window systems inside the driver. On the other hand, when a new window system is being created, the window system implementers don't want to have to modify a bunch of drivers in order to get their window-system working. This has caused significant problems for Wayland because NVIDIA* still refuses to add the necessary code to their driver to support it natively.
Some Partial Solutions
There is quite a bit of momentum in various corners of the industry to try to solve this problem and provide better separation between 3D drivers and window systems. One notable attempt was NVIDIA's EGLStreams extensions, which attempted to allow for generic cross-process passing of rendered content. An EGLStream was an abstract pipeline for rendered content where the producer could render and present frames to the stream and then a consumer could texture from the stream for compositing or connect it directly to a video encoder or the scanout hardware. They had the advantage of providing an effectively arbitrary side-band channel through which the driver could pass any information it needed to render and scan out efficiently. The downside was that the interface didn't provide the application with enough control to build a competent composition framework.
The two client operating systems from Google* also make interesting case-studies. On Android, Google* provides their own EGL implementation which is a wrapper around the EGL implementation provided by the hardware vendor. The wrapper EGL implements the usual EGLSurface constructs in terms of a special Android-specific extension and the ANativeWindow struct, which the driver uses to communicate with the window system. Because ANativeWindow is just an interface, the driver is mostly unaware of the details of the window system and can just concern itself with rendering into the buffers provided by the ANativeWindow. Unfortunately, the EGL specification is not really designed to be wrapped in this way and there are many complex issues that arise when you try to implement certain EGL entrypoints, such as eglSwapBuffers, by calling into another vendor-provided EGL implementation.
Google’s ChromeOS takes a different approach by using a “surfaceless” EGL context that does not require a window system and then implements their window system on top of EGL. This works because ChromeOS does not have to provide an EGL implementation (primarily, it is just a browser), so there is no wrapping as with Android. It has the advantage of giving the ChromeOS window system code all the control it wants over buffer passing. However, with the currently available EGL extensions, the driver does not have enough information about how the images will be used to make good choices about image allocation and rendering, and we lose a bit of performance.
Every VR vendor (such as Valve* and Oculus*) has their own VR compositor. While they don’t necessarily have full window systems with multiple applications on a desktop, they need some way for the compositor to talk to the VR app running inside it. To solve this, Khronos* developed a set of Vulkan* extensions for sharing images, memory objects, and synchronization primitives between processes. The result is that you can implement an entire VR window system with just those Vulkan extensions and no custom driver code. Unfortunately, these extensions enforce strict driver-version-matching requirements, which prevents them from being suitable for more general-purpose window systems, such as X11 and Wayland.
A Balanced Solution: DRM Format Modifiers
The fundamental problem in all of these discussions is information passing. Different parts of the graphics stack, such as 3D, media, camera, and display, may have different restrictions on the kinds of image layouts they can handle. You may also have different components from different vendors, such as a laptop with Intel integrated graphics and an NVIDIA discrete GPU. Even within one component, there may be different restrictions based on exactly how the image will be used. In order to provide good performance and low power consumption, we need to coordinate between the different components to choose the best type of image that is compatible with everyone. Unfortunately, with the current X11 and Wayland interfaces, we have very little information and always have to assume the worst case. Other solutions, such as EGLStreams, allow for information passing--but at the cost of either too much encapsulation (in the case of EGLStreams) or significant coupling between the driver and the window system.
As an example of component restrictions, Intel® architecture has three different primary tiling formats: linear, X, and Y. While a linear image has the pixels arranged in the usual way (the offset is given by X * pitch + Y), X-tiled and Y-tiled images have the pixels shuffled about in memory to improve cache locality while rendering or texturing. Of these three formats, Y-tiled is the most efficient for rendering, followed by X-tiled, and then linear. Up until the 6th generation of Intel® Core™ processors, the scanout hardware could handle linear and X-tiled images but not Y-tiled. This meant that the best we could do for window system images was X-tiled because we have no knowledge about whether that image would be handed off to the scanout hardware and had to assume that it was a possibility. On the 6th generation of Intel® Core™ processors and newer, the scanout hardware is much more capable and can handle Y-tiling as well as CCS image compression, but some components of the graphics stack still assume X-tiling as an artifact of history.
During the last two years, I’ve worked with several other engineers in the Linux graphics community to develop a solution to the information passing problem. The ideal solution would balance the need for sharing vendor-specific information between components with the window system’s need for explicit control over buffer passing. The solution we’ve chosen is DRM format modifiers. In the Linux kernel modesetting (KMS) interface, a buffer is described using a color format, width, and height together with a series of per-plane prime memory handles, strides, and offsets. While your usual RGB images are only a single plane, this allows for describing YUV images precisely as well. To this interface, we’ve added a modifier which is a 64-bit integer which describes additional details about the image layout. Some of those modifiers such as DRM_FORMAT_MOD_LINEAR are understood by everyone, while others are vendor-specific, such as I915_FORMAT_MOD_Y_TILED.
The information passing problem is then solved by each component providing a list of all the modifiers it supports for a given format and usage. The compositor can then query all of the different components: display, its 3D driver, the client’s 3D driver, etc. and take the intersection of the lists to get the list of modifiers supported by everyone. If the set of components changes (such as the client going full-screen and the compositor deciding to scan out from the client image directly), the compositor can redo the query and come up with a new list of modifiers. This allows us to use Y-tiling or CCS when the client image is being composited using the 3D engine but then fall back to X-tiling or linear as soon as we need to use the image with a component that doesn’t understand Y-tiling or CCS.
Re-plumbing the Graphics Stack
Implementing this solution requires a lot of coordinated work across all the different components of the graphics stack. Each of the different components needs to know how to speak the new language of modifiers. Dozens of engineers from several different companies have worked together to make the necessary changes to all of the different components. Some of the notable changes are listed below.
- An interface has been added to KMS to query the modifiers supported by a particular plane configuration
- A modifier parameter has been added to the ioctl KMS provides to specify an image
- Various GBM interfaces have been added to support creating images and surfaces with modifiers
- A new Wayland protocol has been developed to pass images from client to compositor explicitly using modifiers
- The server side of the new Wayland protocol has been implemented in Weston and Mutter
- The DRI3 protocol for X11 has been extended to support modifiers
- The Mesa* EGL, GLX, and Vulkan WSI implementations have been updated to use the new X11 and Wayland protocols for modifiers
- A new EGL extension, EGL_EXT_image_dma_buf_import_modifiers was drafted
- Two new Vulkan extensions, VK_EXT_external_memory_dma_buf and VK_EXT_image_drm_format_modifier, have been drafted for creating, importing, and exporting images using modifiers
- EGL_EXT_image_dma_buf_import_modifiers was implemented in the Mesa EGL* and OpenGL* drivers for Intel graphics
- VK_EXT_external_memory_dma_buf and VK_EXT_image_drm_format_modifier have been implemented in the Mesa Vulkan* driver for Intel graphics
There may be some pieces missing from the list above but it should give some sense of the scope of the work being done. At the time of this post, much of the above work has landed in the respective projects or specifications but there are still a few bits that are in-flight.
Window System Integration as a Vulkan Layer (Almost)
Most of the engineering work I have done in this effort has been in the OpenGL and Vulkan drivers. Part of that has been working on the plumbing for DRM format modifiers in Vulkan WSI. The existing Vulkan WSI code wasn't ready for such a change. It was an ad-hoc abstraction created when we pulled the WSI code out of the Intel driver and put it into common code.
When prime support (for displaying on a different GPU) was added to RADV, the interface became a bit of a mess. Before making the problem worse by adding modifiers support, we had clean things up a bit. During the last week of November 2017, Dave Airlie and I started to clean up the interface. The result is a series of patches that rework window system integration to look more like a Vulkan layer. Instead of using an ad-hoc interface, we tried to re-use as much of the Vulkan API as possible.
Previously, the Vulkan WSI code looked much like all the other window system integration code in Mesa. Image creation used a special WSI-specific function call that would create an image with all the usual assumptions for a window-system image (X-tiled on Intel) and allocate memory at the same time. Prime support was implemented by RADV creating command buffers to do a blit from tiled to linear and storing them in core WSI data structures, then pulling them back out, and executing them in vkQueuePresent. It was quite ugly.
To replace the old abstraction, I added some new chain-in structs that look and feel like a Vulkan extension, even though there is nothing for it in the Vulkan XML and there is no actual spec text. This pseudo-extension adds a chain-in struct for vkCreateImage that lets the WSI code create legacy window system images that have the usual tiling and alignment constraints. It also adds a chain-in for vkAllocateMemory, which lets the WSI code enable GEM implicit synchronization. Between these two, we can now create "legacy" window system images with just vkCreateImage and vkAllocateMemory.
For prime support where we need to do a copy from tiled to linear as part of vkQueuePresent, we create a plain VkImage (without using the pseudo-extension) for the WSI image and create a VkBuffer for the linear shadow copy. We then create a set of command buffers (one per queue family) that perform a vkCmdCopyImageToBuffer to copy the image data from the WSI image to the linear shadow copy. By doing this, the driver becomes almost completely unaware that the image is being shared with another GPU and it just thinks that it’s rendering to a regular VkImage.
With this refactoring done, the WSI code now uses entirely standard Vulkan entrypoints for image creation, memory allocation, and memory import/export. Even though the entrypoints used are standard, the WSI code needs to make a few assumptions about implementation details of the driver. For instance, we assume that you can execute vkCmdCopyImageToBuffer on an image in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout. Because the pseudo-extension will only ever be implemented by other Mesa drivers, these assumptions are reasonably safe as both the Intel and Radeon* Vulkan drivers (the only two in Mesa) satisfy them. The functions provided to the driver by the WSI code also now look a lot more like the corresponding Vulkan entrypoints and the entrypoints in the drivers are now just thin wrappers around common code. This means the WSI code in Mesa is starting to look very much like a Vulkan layer.
Where is this all headed? Once the VK_EXT_image_drm_format_modifier extension is finalized, we will be able to stop using the pseudo-extension for image creation. Once the plumbing is done for explicit synchronization (that’s a whole other post), we will be able to drop the pseudo-extension for memory allocation. This means that Vulkan WSI for X11 and Wayland can be done entirely through standardized Vulkan extensions. We can then break the Mesa WSI code out into a completely generic Vulkan layer that need not even live in the Mesa codebase and can be used with other non-Mesa drivers. We will finally have complete separation between the driver code-base and the window system integration code.
Why Does All This Matter?
Clearly, significant effort has been put into making DRM format modifiers a reality, but what does it gain us?
First, it enables us to take better advantage of Intel graphics hardware. On the 6th generation of Intel® Core™ processors and later, we will be able to pass CCS compressed surfaces from the client all the way through to scanout. By using CCS compressed surfaces, we can reduce memory bandwidth and get better 3D performance and lower power consumption. On older hardware, we will get Y-tiling for windowed (not full-screen) applications which will also help performance. As new features are added in future Intel graphics hardware, new DRM format modifiers can easily be added to take advantage of those features without having to re-plumb the entire graphics stack again.
Second, it enables innovation in the window system space. It is impossible to predict what the future will hold and what products people will want to build, and custom window systems are becoming more and more common. One of the advantages of open-source drivers is that anyone can modify them as needed for their platform. Projects such as Wayland and Mir would not have been possible without open-source drivers. However, drivers are complicated and modifying them to can be a difficult and tedious task. As demonstrated by our refactoring of Vulkan WSI, it will soon be possible to write an entire window system without touching driver internals while still taking advantage of all the power and performance features available on modern hardware.