Linux* Consumption of x86 Page Table Bits
Page table entries (PTEs) for x86 processors running in 64-bit mode are 64 bits in size. Some of those bits are consumed by hardware features and others are available to software. Since this space is shared by hardware and software, there are often questions about how the space is consumed and what space is truly required.
Generally, software engineers gravitate toward the simplest software solutions, which tend to consume more than the bare minimum of truly required bits. Carving space out of existing data structures also has the benefit of having no immediately apparent overhead.
However, hardware engineers simultaneously have their eyes on the same bits in order to efficiently implement new hardware features. This leads to inherent conflicts over the use of this limited resource.
This article describes the bits which are currently ignored by hardware and consumed by the Linux kernel.
Linux defines four bit locations that are available for software use within the kernel:
#define _PAGE_BIT_SOFTW1 9 /* available for programmer */ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_SOFTW4 58 /* available for programmer */
Only three (1,3,4) are used, and they are used across four logical use-cases:
#define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1 #define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 /* Note _PAGE_BIT_SOFTW2 is unused */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
This bit means that core memory management does not own the PTE and implies special handling is required. It is most frequently seen on page table entries created by device drivers to map device memory.
get_user_pages() is a fast software mechanism which is used for performance software-based virtual-to-physical translation, generally in order to obtain a reference to physical memory. The primary use case is to ensure physical memory remains allocated during DMA operations.
This bit allows a fast-path which is lockless and very high performance. It’s critical for initiating fast, low-latency I/O. Large databases love this functionality. While it would be possible to have a parallel data to replace this bit, keeping it in the page tables near the hardware virtual-to-physical translation structures allows a single, canonical source for the information. This ensures it is simple and high-performance because of the zero additional memory (and associated cache footprint) required.
This bit is used for debugging only and is an alias for the same physical bit as
This is a software-based "sticky" dirty bit. It closely mirrors the hardware dirty bit (
_PAGE_BIT_DIRTY) but is not consumed inside the kernel. It is only consumed or cleared in response to explicit application requests.
Some applications prefer to monitor writes to memory and track them independently of kernel management of the hardware-Dirty bit. For instance, the kernel clears hardware-Dirty before starting I/O, but an application might still want to know the page was written, even if the kernel now considers the page clean.
Applications use this to tell when data needs to be backed up, or to “live migrate” application contents.
This bit is used primarily for memory for which core memory management structures are available, but the memory is not managed by core memory management. The main use case is for DAX persistent memory, but it can also be used for other "device memory". It helps maintain
get_user_pages() fast path while also being a special case.
The Linux kernel only consumes 3 bits. The kernel might get away with only consuming 2, but it would be a bit of work and potentially cause performance regressions for persistent memory and
get_user_pages(). Other OSes use far more space, so Linux developers have little motivation for this work.
Linux is likely to slowly consume more and more software space over time, but primarily out of convenience and simplicity rather than as an absolute requirement.
Hardware and software engineers continue to find novel new uses for these resources to implement new features and increase performance. By understanding how Linux uses page table space, both hardware and software engineers can make the best use of this resource going forward.