Analyzing GPU-accelerated Applications Using HPCToolkit

Using HPCToolkit to Measure and Analyze the Performance of GPU-accelerated Applications Tutorial, Mar-Apr 2021

GPA: A GPU Performance Advisor Based on Instruction Sampling

Presented our CGO'21 work.


GPA is a performance advisor for NVIDIA GPUs that suggests potential code optimization opportunities at a hierarchy of levels, including individual lines, loops, and functions. GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with suggestions for optimization. GPA estimates each optimization's speedup based on a PC sampling-based performance model.

GVProf: A Value Profiler for GPU-Based Clusters

Presented our SC'20 work.

Tools for Top-down Performance Analysis of GPU-Accelerated Applications

Presented our ICS'20 work.

Tools for Top-down Performance Analysis of GPU-Accelerated Applications

This paper describes extensions to Rice University's HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's …


We implemented GVProf, the first value profiler that locates value redundancy problems in applications running on GPU-based clusters. Our experiments show that GVProf incurs acceptable overhead and scales to large executions. GVProf provides useful insights to guide performance optimization. Under the guidance of GVProf, we optimized several HPC and machine learning workloads, obtaining speedups up to 1.93x.

A Tool for Top-down Performance Analysis of GPU-accelerated Applications

Presented a poster and a short talk about HPCToolkit's GPU support at PPoPP'20

Optimizing GPU-accelerated Applications with HPCToolkit

Presented the prototype of HPCToolkit's GPU support at PETASCALE'19


Our tool provides a profile view and a trace view for GPU-accelerated applications. The profile view identifies where GPU APIs are invoked in CPU calling context, approximates calling context for GPU execution, and analyzes instruction mix for GPU kernels. The tool traces CPU and GPU activities for a large number of processes and threads with minimal overhead.