Optimizing GPU-accelerated Applications with HPCToolkit
Presented our GPU performance tool
A Tool for Performance Analysis of GPU-accelerated Applications
Presented the prototype of our GPU performance tool
A Performance Analysis Framework for Exploiting GPU Microarchitectural Capability
Presented our ICS'17 work.
Deep Learning on Modern Architectures
Discussed how state-of-the-art deep learning libraries optimize computations by utilizing architectural features.
A performance analysis framework for exploiting GPU microarchitectural capability
Understanding the GPU microarchitecture to achieve bare-metal performance tuning