Profiler

Low Overhead and Context Sensitive Profiling of GPU-Accelerated Applications

As we near the end of Moore's law scaling, the next-generation computing platforms are increasingly exploring heterogeneous processors for acceleration. Graphics Processing Units (GPUs) are the most widely used accelerators. Meanwhile, applications …

Practical Performance Optimization for Deep Learning Applications

Presented triton programming language and a deep learning profiler

ValueExpert: Exploring Value Patterns in GPU-accelerated Applications

Presented a talk about our value profiling tool at ASPLOS'22

ValueExpert: exploring value patterns in GPU-accelerated applications

General-purpose GPUs have become common in modern computing systems to accelerate applications in many domains, including machine learning, high-performance computing, and autonomous driving. However, inefficiencies abound in GPU-accelerated …

Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy’s forthcoming exascale systems. Since the execution model for GPUs differs from that for …

Measurement and analysis of GPU-accelerated applications with HPCToolkit

To address the challenge of performance analysis on the US DOE’s forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help …

GPA: A GPU Performance Advisor Based on Instruction Sampling

Presented our CGO'21 work.

GVProf: A Value Profiler for GPU-Based Clusters

GPGPUs are widely used in high-performance computing systems to accelerate scientific and machine learning workloads. Developing efficient GPU kernels is critically important to obtain bare-metal performance on GPU-based clusters. In this paper, we …

Tools for Top-down Performance Analysis of GPU-Accelerated Applications

This paper describes extensions to Rice University's HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's …