The demand for high performance desktop/laptop systems has resulted in higher power dissipation. This problem is compounded by increased demand for small form factor systems. It is increasingly difficult to manage performance and thermal issues in isolation. Thermal issues are important to handle proactively without significantly impacting performance.
The solution used in this document is to develop a thermal daemon, which proactively controls thermal, using P-states, T-states, and the Intel power clamp driver.
Problem Statement / Introduction
As processors execute at higher clock speeds, dynamic power consumption increases. The heat generated must be efficiently dissipated to improve reliability of the system. Data sheets normally contain a TDP (Thermal design power), which is the maximum amount of power the cooling system is required to dissipate. Common techniques for heat dissipation can include heat sinks, fans, and other forms of cooling devices. But, as system form factors shrink, it is not efficient to just rely on hardware/BIOS/OSPM to cool the system. This problem will be more eminent as fanless systems become more popular.
Based on experiments on small form factor devices, it was observed that systems reach near maximum temperature with relatively less load. This can also cause performance issues, depending on the efficiency of the cooling method used in the BIOS.
After a detailed analysis of the problem, it was observed that an increase in CPU temperature causes the BIOS to begin thermal throttling, which impacts performance.
The following graph shows how the BIOS is acting to limit temperature, once maximum temperature is reached.
Current thermal control methods
Current systems use two approaches:
- Reactive approach
- BIOS starts taking action once it reaches a certain temperature
- ACPI thermal configuration drives cooling device activation
- Proactive approach
- Using ACPI thermal configuration data to proactively activate the configured cooling devices
The BIOS can control the system fan. Based on temperature, it can increase or decrease fan speed. It also can do thermal throttling. Thermal throttling is done by adjusting the duty cycle of the processor clock or reducing the operating frequency and voltage. These controls can greatly impact performance.
ACPI Active and Passive Control
ACPI defines configuration and interfaces that allow OSPM to implement thermal control. Currently, the Linux kernel thermal ACPI module implements these controls. So, based on the validity of configuration data, this can be a very efficient method for thermal controls. But, it was observed that many systems don’t have this configuration data or have invalid data, preventing the kernel module from taking timely action.
The main objectives, while considering possible solutions, are:
- Prevent the BIOS from thermal throttling as much as possible by using proactive controls
- Don't rely on the validity of the ACPI configuration data. If there is good, valid data, the existing Linux kernel module will take necessary actions anyway.
This solution defines a Linux user mode daemon “thermal daemon”, which provides:
- A short time to market
- Proactively controlled temperature
- Use of existing kernel infrastructure
- A defined architecture, which can be easily enhanced.
Thermal Daemon Overview
The primary function of this daemon:
- Use existing I/F to read temperature sensors
- Calculate set point dynamically
- Alternatively use pre-configured or stored set points
- Once set point is reached, use the best cooling method
- Allow the user to set preferences using DBUS I/F
Input to the system
- Configuration data
The idea is not to rely on any configuration data. But if someone decides to tune a particular system, an ACPI style configuration can be specified using an XML file. DMI UUID performs the system matching. Refer to thermal-conf.xml for DTD and an example.
- DTS Sensors
DTS sensors refer to digital temperature sensors. The output of DTS sensors can be read from the /sys/devices/platform/coretemp.x interface.
- Zone sensors
Zone sensors can be read from the /sys/class/thermal/ interface. These defined temperature sensors use the ACPI style configuration data. By default, this daemon doesn’t use these sensors, unless specified in XML configuration.
Outputs from the system
P-states refer to performance states. This daemon uses either the Intel P-state driver, if available, or MSRs defined in the Intel Software Architecture document to control P-states. The advantage of using MSRs is finer control of turbo states. If the system can’t be cooled by downgrading turbo state, turbo is disengaged and non-turbo frequencies are used to control thermal.
- RAPL Cooling device
If a RAPL (Running average power limit) cooling device driver is present, the daemon will use it to set limits to reduce temperature.
- Power Clamp driver
When P-states are not enough to control the termperature, the daemon uses the Intel power clamp driver, which uses idle injections to cool the system.
You can use T-states to modify clock duty cycle. This daemon uses T-state controls using MSRs, once the system can’t be cooled by any other method.
- Cooling devices
If the XML configuration defines ACPI style cooling device paths, this daemon activates cooling devices once the system reaches the configured set point.
Outputs control order
Once the temperature needs to be controlled, the daemon starts activating the cooling devices in this order:
RAPL cooling device driver
Intel P-state driver
Turbo sub states
Traverse half of the P-states
Intel power clamp driver
Dynamic set point adjustment
Set point refers to a temperature when a cooling action is activated. This daemon uses the maximum temperature, which can be read from /sys/devices/platform/coretemp.x as a reference. The very first time the system reaches this temperature, it calculates a set point, at which cooling action starts on subsequent times.This is based onNewton's law of cooling:
Rate of heat transfer will reduce over time
The longer the system spends in the high temperature zone, the more correction required
Account for the time and slope relationship between temperature and required cooling
Software Architecture (class diagram)
The graph below shows how cooling methods are activated to keep the temperature below the maximum temperature. On this platform, just reducing P-states at the high temperature is enough to decrease the temperature, without needing to use T-state control, which would have impacted performance.