Running Average Power Limit – RAPL
Running Average Power Limit
by Srinivas Pandruvada
The thermal design power (TDP) represents the maximum amount of power the cooling system in a computer is required to dissipate. For example, for a processor with TDP of 35W, Intel guarantees the OEM that if it implements a chassis and cooling system capable of dissipating that much heat, the chip will operate as intended. This is the power budget under which the system needs to operate. But this is not the same as the maximum power the processor can consume. It is possible for the processor to consume more than the TDP power for a short period of time without it being “thermally significant”. Using basic physics, heat will take some time to propagate, so a short burst may not necessarily violate TDP.
Let’s analyze how a TDP can influence performance.
With a single core CPU, an application demanding performance will get full performance, as long as it is under TDP.
If one more core is added, a multi-threaded workload can demand full performance on both cores. If the CPU is using maximum frequency on both cores, this will consume more power than TDP. To avoid this, the CPU will use different maximum frequencies, depending on the number of active cores. So, out of two, if one core is more active than the other, it can use the remaining thermal budget and run at a higher frequency. This is achieved by introduction of turbo mode.
Intel® Turbo boost technology allows processor cores to run faster than their base frequency, if the operating condition permits. A Power Control Unit (PCU) firmware decides based on the:
- Number of cores active
- Estimated current consumption
- Estimated power consumption
- Processor temperature
Once the GPU is added to the package, its power must also be considered in making this turbo boost decision.
The PCU uses some internal models and counters to predict the actual and estimated power consumption. Before Sandy Bridge microarchitecture, turbo decisions are driven by models, which by nature tend to be conservative, to avoid issues in some isolated cases. Sandy Bridge added onboard power meter capability, which can be used to make better decisions, rather than decisions based purely on models. In addition, it exported these power meters and power limits used in its calculations through a set of MSRs (Machine Specific Registers) and PCIe config space. This interface is called the RAPL interface.
RAPL Use Cases
RAPL Power meter capability
RAPL provides a set of counters providing energy and power consumption information. RAPL is not an analog power meter, but rather uses a software power model. This software power model estimates energy usage by using hardware performance counters and I/O models. Based on our measurements, they match actual power measurements .
RAPL Power Limiting
RAPL provides a way to set power limits on processor packages and DRAM. This will allow a monitoring and control program to dynamically limit max average power, to match its expected power and cooling budget. In addition, power limits in a rack enable power budgeting across the rack distribution. By dynamically monitoring the feedback of power consumption, power limits can be reassigned based on use and workloads. Because multiple bursts of heavy workloads will eventually cause the ambient temperature to rise, reducing the rate of heat transfer, one uniform power limit can’t be enforced. RAPL provides a way to set short term and longer term averaging windows for power limits. These window sizes and power limits can be adjusted dynamically.
RAPL performance feedback
RAPL counters capture the total time that the RAPL mechanism forced the P-state to be below the OS-requested P-state. Also, it has a counter which tracks the total time that the RAPL power limit limited the operating frequency of the processor. These counters can be used to fine tune or rebalance loads on nodes, if there are performance degradations caused by running below OS-requested P-states.
In RAPL, platforms are divided into domains for fine grained reports and control. A RAPL domain is a physically meaningful domain for power management. The specific RAPL domains available in a platform vary across product segments.
Each RAPL domain supports:
- ENERGY_STATUS for power monitoring
- POWER_LIMIT and TIME_WINDOW for controlling power
- PERF_STATUS for monitoring the performance impact of the power limit
- RAPL_INFO contains information on measurement units, the minimum and maximum power supported by the domain
The RAPL interface is used in multiple products at OTC. The following list summarizes some of the products and their usages:
TurboStat utility that's part of the Linux kernel tools and is now capable of displaying the wattage information. This wattage information is read using RAPL MSRs.
PowerTOP provides estimated power consumption for various components. These estimates are based on: real power measurements, taken using a power meter, and a power model that takes into account estimated activity on that component. RAPL can help get the exact power numbers for CPUs/GPUs and DRAMs to show actual power consumption. Recent changes in PowerTOP show the actual CPU, GPU, and DRAM power consumption.
The RAPL driver is implemented as a power cap driver. This is available in the latest Linux releases from kernel.org.
Linux Thermal Daemon
RAPL power limits are very effective in reducing package temperature. The Linux thermal daemon uses RAPL interface using RAPL driver to control platform thermals.
- Intel® 64 and IA-32 Architectures Software Developer’s Manual
- RAPL for the Romley Platform, Whitepaper
- Power-Management Architecture of the Intel Microarchitecture Code-Name Sandy Bridge” (IEEE Micro, March/April 2012)
- Measuring Processor Power TDP vs. ACP, Whitepaper Intel® Xeon® Processor