PC Sampling

Low Overhead and Context Sensitive Profiling of GPU-Accelerated Applications

As we near the end of Moore's law scaling, the next-generation computing platforms are increasingly exploring heterogeneous processors for acceleration. Graphics Processing Units (GPUs) are the most widely used accelerators. Meanwhile, applications …

An Automated Tool for Analysis and Tuning of GPU-accelerated Code in HPC Applications

The US Department of Energys fastest supercomputers and forthcoming exascale systems employ Graphics Processing Units (GPUs) to increase the computational performance of compute nodes. However, the complexity of GPU architectures makes tailoring …

Analyzing GPU-accelerated Applications Using HPCToolkit

Using HPCToolkit to Measure and Analyze the Performance of GPU-accelerated Applications Tutorial, Mar-Apr 2021

GPA: A GPU Performance Advisor Based on Instruction Sampling

Presented our CGO'21 work.

GPA: A GPU Performance Advisor Based on Instruction Sampling

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, …


GPA is a performance advisor for NVIDIA GPUs that suggests potential code optimization opportunities at a hierarchy of levels, including individual lines, loops, and functions. GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with suggestions for optimization. GPA estimates each optimization's speedup based on a PC sampling-based performance model.

Tools for Top-down Performance Analysis of GPU-Accelerated Applications

Presented our ICS'20 work.