Sorry, you need to enable JavaScript to visit this website.

NumaTOP

NumaTOP is an observation tool for runtime memory locality characterization and analysis of processes and threads running on a NUMA system. It helps the user characterize the NUMA behavior of processes and threads and identify where the NUMA-related performance bottlenecks reside. The tool uses Intel performance counter sampling technologies and associates the performance data with Linux system runtime information, to provide real-time analysis in production systems.

Most modern systems use a Non-Uniform Memory Access (NUMA) design for multiprocessing. In NUMA systems, memory and processors are organized in such a way that some parts of memory are closer to a given processor, while other parts are farther from it. A processor can access memory that is closer to it much faster than the memory that is farther from it. Hence, the latency between the processors and different portions of the memory in a NUMA machine may be significantly different.

NumaTOP is an observation tool for runtime memory locality characterization and analysis of processes and threads running on a NUMA system. It helps the user characterize the NUMA behavior of processes and threads and identify where the NUMA-related performance bottlenecks reside. The tool uses Intel performance counter sampling technologies and associates the performance data with Linux system runtime information, to provide real-time analysis in production systems.

The tool can be used to:

  • Characterize the locality of all running processes and threads to identify those with the poorest locality in the system.
  • Identify the “hot” memory areas, report average memory access latency, and provide the location where accessed memory is allocated.
    Note: A “hot” memory area is where process/thread(s) accesses are most frequent. NumaTOP has a metric called “ACCESS%” that specifies what percentage of memory accesses are attributable to each memory area.
  • Provide the call-chain(s) in the process/thread code that accesses a given hot memory area.
  • Provide the call-chain(s) when the process/thread generates certain counter events. The call-chain(s) helps to locate the source code that generates the events.
  • Provide per-node statistics for memory and CPU utilization.
    Note: A node is a region of memory in which every byte is the same distance from each CPU.
  • Show, using a user-friendly interface, the list of processes/threads sorted by some metrics (by default, sorted by CPU utilization), with the top process having the highest CPU utilization in the system and the bottom one having the lowest CPU utilization. Users can also use hotkeys to resort the output by these metrics: Remote Memory Accesses (RMA), Local Memory Accesses (LMA), RMA/LMA ratio, Cycles Per Instruction (CPI), and CPU utilization.

NumaTOP is a GUI tool that periodically tracks and analyzes the NUMA activity of processes and threads and displays useful metrics. Users can scroll up/down by using the up or down key to navigate in the current window and can use several hot keys, shown at the bottom of the window, to switch between windows or to change the running state of the tool. For example, hotkey 'R' refreshes the data in the current window.

Here's a screen shot of the home window:

 

 

 

Project: