Sorry, you need to enable JavaScript to visit this website.

pm-graph

The pm-graph project provides sleepgraph and bootgraph tools for system developers to visualize the activity in suspend/resume and boot, allowing them to identify inefficiencies and bottlenecks. Using the sleepgraph and bootgraph tools is an excellent way to save power in Linux* platforms, whether in mobile devices using Intel® technology or large-scale server farms. Optimizing the performance of suspend/resume has become extremely important because the more time spent entering and exiting low power modes, the less the system can be in use.

Using v2.3 of AnalyzeSuspend

BY Todd Brandt ON Jan 17, 2014

1) Overview

The AnalyzeSuspend tool is designed to assist kernel and OS developers in optimizing their Linux stack's suspend/resume time. Using a kernel image built with a few extra options enabled and a small patch to enable ftrace, the tool will execute a suspend, and will capture dmesg and ftrace data until resume is complete. This data is transformed into a set of timelines and a callgraph to give a quick and detailed view of which devices and kernel processes are taking the most time in suspend/resume.

The amount of data in the timeline is becoming so huge that it's nearly impossible to view all but the largest entries in static mode. Version 2.3 has made the timeline zoomable. On first load the timeline shows up fully expanded, and there are three buttons which can be used to zoom in and out, and a scrollbar to slide back and forth

  • zoom-in: decrease the time window and increase the timescale precision (down to individual milliseconds)
  • zoom-out: increase the time window and decrease the timescale precision (up to several seconds)
  • zoom-1:1: zoom back to the default 100% view
  • slider: when in higher zoom levels the slider lets you move back and forth on the timeline

The output of the tool is a single, monolithic html file which includes HTML, CSS, and Javascript. The timing and ftrace data is written into the file as a set of html constants and javascript variables, and is manipulated by the browser UI via javascript. The javascript source requires no libraries or dependencies, and is written to be portable across both mozilla and chrome. You may potentially run into some sluggishness when you view the output of an ftrace enabled test. This is because all the ftrace data needed to make callgraphs for dozens of devices can take up alot of space, typically around 30MB.

The timeline itself is also interactive. If you hover the pointer over a device it will highlight, and after a half second will display the full name of the device and the total time spent in its pm callback. You can also click the devices to get to a device detail view, which adds another section underneath the timeline. This view gives the full title of the device, it's total time spent, and also a hierarchy of all the devices parents, siblings, and children. If you have ftrace data enabled, clicking the device will also filter the callgraph data to show you just the device you clicked, its siblings, and children. This can be very helpful if you're optimizing a subsystem and need to understand the interaction between its devices.

In 2.3+, the tool also includes the firmware suspend/resume time in the timeline (if your kernel has ACPI enabled and lets you view the FPDT). This will make the data more closely match what you're actually experiencing while suspending the system.

2) Downloading the tool

The latest version of the tool's python script can be downloaded directly from the master branch on github:

https://github.com/01org/pm-graph/blob/master/analyze_suspend.py

Or you can download a release tarball which includes a README and some helpful config patches.

3) Simple Commands

3.1) Test whether the tool can execute (-status)

The analyzesuspend tool has several basic requirements which have to be met before it can do its job. For instance, when testing on linux you need root permission and the sysfs filesystem mounted. If you're using ftrace you need ftrace to be configured properly and available in sysfs. If you're using rtcwake you need it installed. If you're running on linux these requirements need to be met on the host system; if on android the remote target system. So there's alot to verify., and the tool can do it for you. The status command will go through all the command line arguments and test whether or not you can run that particular command on the current platform. This can be helpful to help diagnose errors. For example here is the output of a command that the system supports:

%> sudo ./analyze_suspend.py -status -rtcwake -f -m disk
Checking this system (skynet)...
    have root access: YES
    is sysfs mounted: YES
    is "disk" a valid power mode: YES
    is ftrace supported: YES
    is rtcwake supported: YES

This is the output of a command with a missing requirement:

%> sudo ./analyze_suspend.py -status -rtcwake -f -m freeze
Checking this system (skynet)...
    have root access: YES
    is sysfs mounted: YES
    is "freeze" a valid power mode: NO
    is ftrace supported: YES
    is rtcwake supported: YES

3.2) View supported low-power modes (-modes)

The tool was designed with S3 suspend/resume as its primary target, but it can run any power mode that the system allows you to enter via /sys/power/state. Nearly all systems support S3: "mem", and also hibernate: "disk". But there are also modes like "standby" and "freeze". Running this command lets you know what's available.

%> analyze_suspend.py -modes
['standby', 'mem', 'freeze', 'disk']

NOTE: version 2.3 fully supports graphical output for all four of these modes, but standby and freeze don't stop the system clock from running when active. So the timeline will include the delay from rtcwake or keypress wait in these modes.

3.3) View Firmware Performance Data Table (-fpdt)

The ACPI subsystem in the kernel provides access to several tables populated with firmware data. There's one in particular called the Firmware Performance Data Table (FPDT) that the tool now makes use of. This table provides nanosecond level timing data on the last run suspend and resume initiated on the system. The tool first provides a new argument called -fpdt which causes the tool to read out the current table contents. This can be used to test if this feature is supported on your platform.

%> sudo ./analyze_suspend.py -fpdt

Firmware Performance Data Table (FPDT)
                  Signature : FPDT
               Table Length : 68
                   Revision : 1
                   Checksum : 0x84
                     OEM ID : INTEL
               OEM Table ID : TIANO   
               OEM Revision : 1
                 Creator ID : MSFT
           Creator Revision : 0x1000013

Firmware Basic Boot Performance Record (FBPT)
                  Reset END : 8836123 ns
  OS Loader LoadImage Start : 0 ns
 OS Loader StartImage Start : 10796773297 ns
     ExitBootServices Entry : 0 ns
      ExitBootServices Exit : 0 ns

S3 Performance Table Record (S3PT)
    Basic S3 Resume Performance Record
               Resume Count : 0
                 FullResume : 0 ns
              AverageResume : 0 ns
    Basic S3 Suspend Performance Record
               SuspendStart : 0 ns
                 SuspendEnd : 0 ns
                SuspendTime : 0 ns

The tool also makes use of this data during a test run and adds it into the timeline. If the FPDT data is available and valid, a new section will appear in the timeline named BIOS; along with two entries: firmware-suspend and firmware-resume. If the fpdt data isn't there then the timeline just displays the kernel data. NOTE: this does not yet function on android.

3.4) Recreate the html output from a previous run's log data (-dmesg || -ftrace)

The primary input to the tool is the dmesg output recorded during the suspend/resume. So long as it covers the whole of suspend/resume, any dmesg log can be input to the tool for it to create an html timeline. dmesg logs that were generated by this tool have a couple additional lines at the top which record basic test information about what, where, and when the test was run. So if the input log is missing this stamp the html output will not include this information and will simply be named "output.html", but the timeline itself should be viewable. Logs generated by this tool, however, can be used to re-create exact copies of the html outputs. You just need to run the tool with the paths of the dmesg and (optionallty) ftrace log. NOTE: the ftrace input must have been generated by this tool since the callgraph format is very specific.

%> analyze_suspend.py -dmesg skynet_mem_dmesg.txt -ftrace skynet_mem_ftrace.txt
PROCESSING DATA <-- Will generate skynet_mem.html

4) Linux Testing

4.1) Kernel Configuration

The following kernel build options are required. They're needed because the tool uses the debug output from the PM subsystem to capture device timing.

  • CONFIG_PM_DEBUG=y
  • CONFIG_PM_SLEEP_DEBUG=y

Optional: needed for ftrace to function.

  • CONFIG_FTRACE=y
  • CONFIG_FUNCTION_TRACER=y
  • CONFIG_FUNCTION_GRAPH_TRACER=y

4.2) Kernel Parameters

The following kernel parameters are required: initcall_debug and log_buf_len. initcall_debug is required so that the tool can know when devices are being initialized on resume. It's also a good idea to increase the size of the kernel log buffer because we don't want it to overflow. You can add these parameters at boot time in the grub menu, by editting the /boot/grub/grub.cfg file, or by adding them to /etc/default/grub. The last method is the best because it will automatically edit the grub.cfg file for newly installed kernels.

/etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="... initcall_debug log_buf_len=16M ..."

4.3) Usage

  1. First, configure a kernel using the instructions from the previous sections. Then build, install, and boot with it.
  2. Open up a terminal window and execute the script: %> sudo analyze-suspend.py [-f] [-rtcwake]
  3. Wait for the system to suspend. The script's timer stops counting at system suspend, so you can wait as long as you need.
  4. Press a key to resume. (if you didn't use -rtcwake)

When the system comes back, you'll see the script finishing up and creating the output files in a new subdirectory (suspend-mmddyy-HHMMSS)

  • HTML output:                    <hostname>_<mode>.html
  • raw dmesg output:           <hostname>_<mode>_dmesg.txt
  • raw ftrace output:             <hostname>_<mode>_ftrace.txt (-f)

The html file contains the device timeline, and optionally the device callgraphs (assuming ftrace was used with the -f option). View the html output file in Firefox or Chrome.

5) Android Testing

5.1) Kernel Configuration

The following kernel build options are required. They're needed because the tool uses the debug output from the PM subsystem to capture device timing.

  • CONFIG_PM_DEBUG=y
  • CONFIG_PM_SLEEP_DEBUG=y

5.2) Kernel Build and Parameters

The tool can be run on any kernel version 3.0 or newer, but requires some extra kernel parameters to be passed in order to gather the data it needs. The initcall_debug parameter is required for device suspend/resume callback tracing, and it's also a good idea to increase the size of the log buffer.

  • initcall_debug
  • log_buf_len=16M

In your android source folder, edit the BoardConfig.mk file for the target build and add the new parameters:

 nano device/<vendor>/<platform>/BoardConfig.mk
 BOARD_KERNEL_CMDLINE += initcall_debug log_buf_len=16M

then build android and boot/install with live.img

 source build/envsetup.sh
 lunch <platform>-<build>
 make -j8 allimages
 dd if=out/target/product/<platform>/live.img of=/dev/sdc

5.3) Usage

Once you've booted the android build with the updated kernel parameters, you need to set up adb to attach to it with root access.

%> adb kill-server
%> adb connect <ip address>
%> adb root

When testing an android device, the tool runs on the host system and uses adb to interact with the device remotely. This is because android doesn't have python and other analyzesuspend dependencies by default. The tool takes in the absolute path of the adb binary (with the -adb option) and runs commands using adb shell. In theory once you've connected with adb, the connection should remain online for as long as you use the tool. In practice, adb can have a bit of trouble maintaining an active connection, especially on pre-release systems with experimental software. The tool recognizes this and includes a series of tests to ensure connectivity on each and every test run. If the checks fail prior to the test, the tool will fail gracefully and tell you why. You can also run these checks on their own with the -status argument like this (this is what you see if everything's fine):

%> analyze_suspend.py -adb /usr/bin/adb -status
Checking the android system ...
    is android device connected: YES
    have root access: YES
    is sysfs mounted: YES
    is "mem" a valid power mode: YES
    can I unlock the screen: YES

If something's amiss, you'll get some helpful instructions:

%> analyze_suspend.py -adb /usr/bin/adb -status
Checking the android system ...
    is android device connected: NO
    Please connect the device before using this tool

Another way to test that analyze_suspend.py is working by requesting a list of the supported suspend modes. With the addition of the -modes argument it will print out the device's supported suspend modes. Normally this is freeze, standby, and mem. S3 suspend is "mem", so that's the tool's default mode.

%> analyze_suspend.py -adb /usr/bin/adb -modes
 ['freeze', 'standby', 'mem']

Another quirk in android testing is that the device needs to be fully woken up from inactivity timeout in order for the tool to issue a suspend. Version 2.3+ gives the tool the ability to wake up the target device for you prior to a test run. If the screen is currently off, the tool will tell you it's initiating a wakeup, and then will wait for 3 seconds so you can see the screen come back on (NOTE: initiating a suspend while the device is currently auto-suspended does nothing). In 2.2, if the test were to be run with the screen off, you'd just see it issue the command and go straight through with the error "no timeline data".

Now you're ready to run the tool. Execute a test run with this command

%> analyze_suspend.py -adb /usr/bin/adb (-m mem)

The tool will first run the checks to be sure everything's online, then wake the display if it's currently off, and then suspend the device and wait for you to wake it back up with a keypress

%> ./analyze_suspend.py -adb /usr/bin/adb
Checking the android system ...
    is android device connected: YES
    have root access: YES
    is sysfs mounted: YES
    is "mem" a valid power mode: YES
    can I unlock the screen: YES
Waking the device up for the test...
<---- 3 second delay ---->
SUSPEND START (press a key on the device to resume)
<---- delay until a key is pressed ---->
RESUME COMPLETE
CAPTURING DMESG
PROCESSING DATA

Internally, the tool is running these commands on the target system:

  1. IS DISPLAY ON: 'dumpsys power | grep mScreenOn'
  2. IF OFF, WAKE IT UP: 'input keyevent 26'
  3. CLEAR DMESG LOG: 'dmesg -c > /dev/null'
  4. SUSPEND START: 'echo mem > /sys/power/state'
  5. POLL TIL RESUME COMPLETE: run 'pwd' over and over til it succeeds
  6. CAPTURING DMESG: 'dmesg' piped to local file

Once the tool is completed you should see two files outputed in a new subdirectory of your current path. Each test run creates a new subdir with a timestamp, along with the kernel log and html output:

%> ls -1 ./suspend-121213-192429
android_mem_dmesg.txt
android_mem.html

The tool doesn't currently support ftrace, rtcwake, or fpdt on the android target. It can only run a suspend mode, collect the dmesg, and create an html timeline of the kernel devices.