Optimizing GPU-accelerated Applications with HPCToolkit

Presented our GPU performance tool

A Tool for Performance Analysis of GPU-accelerated Applications

Presented the prototype of our GPU performance tool

A Performance Analysis Framework for Exploiting GPU Microarchitectural Capability

Presented our ICS'17 work.

Deep Learning on Modern Architectures

Discussed how state-of-the-art deep learning libraries optimize computations by utilizing architectural features.

A performance analysis framework for exploiting GPU microarchitectural capability

Understanding the GPU microarchitecture to achieve bare-metal performance tuning