Sorry, you need to enable JavaScript to visit this website.

pm-graph

The pm-graph project provides sleepgraph and bootgraph tools for system developers to visualize the activity in suspend/resume and boot, allowing them to identify inefficiencies and bottlenecks. Using the sleepgraph and bootgraph tools is an excellent way to save power in Linux* platforms, whether in mobile devices using Intel® technology or large-scale server farms. Optimizing the performance of suspend/resume has become extremely important because the more time spent entering and exiting low power modes, the less the system can be in use.

USB resume with parallel enumeration of separate hosts

BY Todd Brandt ON May 14, 2014

USB resume with parallel enumeration of separate hosts

In the current code (drivers/usb/core/hub.c), there is a single, global instance of the usb_address0 mutex which is used for all devices on any host. This isn't completely necessary, as this mutex is only needed to prevent address0 collisions for devices on the *same* host (usb 2.0 spec, sec 4.6.1). This superfluous coverage can cause additional delay in system resume on systems with multiple hosts (up to several seconds depending on what devices are attached).

For instance, if I have two USB devices attached on two different hosts, there's a chance that they could be initialized at the same time. If so, one of the devices could be delayed unnecessarily as it waits for the other device on the other host to finish with the mutex. In the following use-case, I've demonstrated this effect with a USB WLAN and a KVM switch. The test system has 3 usb root hubs. Bus1 and Bus2 are each USB 2.0 (with EHCI), and Bus4 and Bus3 are USB 3.0 (with a real and virtual xHCI). This is the USB topology:

Bus 1 (USB 2.0) at PCI address 0000:00:1a.0
    usb1 (EHCI) --> 1-1 (Integrated Rate Matching Hub)
        1-1.2: USB TV Tuner

Bus 2 (USB 2.0) at PCI address 0000:00:1d.0
    usb2 (EHCI) --> 2-1 (Integrated Rate Matching Hub)
        2-1.7: USB Wireless LAN

Bus 3/4 (USB 3.0) at PCI address 0000:00:14.0
    usb3 (xHCI)
        3-1: USB 2.0 Genius HD Webcam
        3-2: USB 2.0 1TB Drive - Western Digital Blackbook
        3-3: USB 2.0 KVM switch hub (compound device)
            3-3.1: Wireless keyboard/mouse receiver
            3-3.3: KVM switcher
    usb4 (xHCI)
        4-4: USB 3.0 4TB Drive - Seagate

I used a kernel patch which adds trace events to the mutex_lock and mutex_unlock calls for usb_address0, and I've graphed the calls in the analyzesuspend output. The red, green, and blue lines in the device blocks represent attempting a lock, lock success, and unlock respectively.

  • red line: mutex_lock attempt for usb_address0_mutex
  • green line: mutex_lock success for usb_address0_mutex
  • blue line: mutex_unlock for usb_address0_mutex

So basically if a device tries to get a lock on the mutex and it succeeds immediately, you just see a green line. But if it tries a lock and fails, it shows up in red, and you can see graphically how long the device thread was waiting for the mutex.

In this scenerio the two devices which conflict are on separate root hubs. The USB WLAN (2-1.7) on Bus2 takes 5 seconds to resume. While this is happening, the KVM switch (3-3) tries to resume but fails to get a lock. You can see a huge red block where it's waiting for the mutex, and then it finally acquires it right as 2-1.7 releases it (the blue line).

 

The next graph is from a test where the kernel was patched to have multiple usb_address0 mutexes: one per host controller instance.

In this you can see that the Generic USB HUB (the hub portion of the KVM switcher) resumes earlier. And the total resume time is reduced by a second.