Sorry, you need to enable JavaScript to visit this website.

Linux* Consumption of x86 Page Table Bits

BY Dave Hansen ON Jan 31, 2020

Overview

Page table entries (PTEs) for x86 processors running in 64-bit mode are 64 bits in size. Some of those bits are consumed by hardware features and others are available to software. Since this space is shared by hardware and software, there are often questions about how the space is consumed and what space is truly required.

Generally, software engineers gravitate toward the simplest software solutions, which tend to consume more than the bare minimum of truly required bits. Carving space out of existing data structures also has the benefit of having no immediately apparent overhead.

However, hardware engineers simultaneously have their eyes on the same bits in order to efficiently implement new hardware features. This leads to inherent conflicts over the use of this limited resource.

This article describes the bits which are currently ignored by hardware and consumed by the Linux kernel.

The Bits

Linux defines four bit locations that are available for software use within the kernel:


    #define _PAGE_BIT_SOFTW1        9       /* available for programmer */
    #define _PAGE_BIT_SOFTW2        10      /* " */
    #define _PAGE_BIT_SOFTW3        11      /* " */
    #define _PAGE_BIT_SOFTW4        58      /* available for programmer */

Only three (1,3,4) are used, and they are used across four logical use-cases:


    #define _PAGE_BIT_SPECIAL       _PAGE_BIT_SOFTW1
    #define _PAGE_BIT_CPA_TEST      _PAGE_BIT_SOFTW1
    /* Note _PAGE_BIT_SOFTW2 is unused */
    #define _PAGE_BIT_SOFT_DIRTY    _PAGE_BIT_SOFTW3
    #define _PAGE_BIT_DEVMAP        _PAGE_BIT_SOFTW4

_PAGE_BIT_SPECIAL

This bit means that core memory management does not own the PTE and implies special handling is required. It is most frequently seen on page table entries created by device drivers to map device memory.

get_user_pages() is a fast software mechanism which is used for performance software-based virtual-to-physical translation, generally in order to obtain a reference to physical memory. The primary use case is to ensure physical memory remains allocated during DMA operations.

This bit allows a fast-path which is lockless and very high performance. It’s critical for initiating fast, low-latency I/O. Large databases love this functionality. While it would be possible to have a parallel data to replace this bit, keeping it in the page tables near the hardware virtual-to-physical translation structures allows a single, canonical source for the information. This ensures it is simple and high-performance because of the zero additional memory (and associated cache footprint) required.

_PAGE_BIT_CPA_TEST

This bit is used for debugging only and is an alias for the same physical bit as _PAGE_BIT_SPECIAL.

_PAGE_BIT_SOFT_DIRTY

This is a software-based "sticky" dirty bit. It closely mirrors the hardware dirty bit (_PAGE_BIT_DIRTY) but is not consumed inside the kernel. It is only consumed or cleared in response to explicit application requests.

Some applications prefer to monitor writes to memory and track them independently of kernel management of the hardware-Dirty bit. For instance, the kernel clears hardware-Dirty before starting I/O, but an application might still want to know the page was written, even if the kernel now considers the page clean.

Applications use this to tell when data needs to be backed up, or to “live migrate” application contents.

_PAGE_BIT_DEVMAP

This bit is used primarily for memory for which core memory management structures are available, but the memory is not managed by core memory management. The main use case is for DAX persistent memory, but it can also be used for other "device memory". It helps maintain get_user_pages() fast path while also being a special case.

Space Consumption

The Linux kernel only consumes 3 bits. The kernel might get away with only consuming 2, but it would be a bit of work and potentially cause performance regressions for persistent memory and get_user_pages(). Other OSes use far more space, so Linux developers have little motivation for this work.

Linux is likely to slowly consume more and more software space over time, but primarily out of convenience and simplicity rather than as an absolute requirement.

Conclusion

Hardware and software engineers continue to find novel new uses for these resources to implement new features and increase performance. By understanding how Linux uses page table space, both hardware and software engineers can make the best use of this resource going forward.