Keren Zhou

Assistant Professor

George Mason University

Biography

I am an Assistant Professor in the Department of Computer Science at George Mason University and a part-time Member of Technical Staff at OpenAI. Before joining GMU, I was a full-time Member of Technical Staff at OpenAI. I obtained my Ph.D. degree from Rice University, advised by Professor John Mellor-Crummey. Previously, I studied at Institute of Computing Technology, Chinese Academy of Sciences in Professor Guangming Tan’s PAA group. Prior to that, I was an undergraduate student at Yunnan University, advised by Professor Wei Zhou.

Interests

High Performance Computing
Parallel Algorithms
Program Analysis
Machine Learning

Education

PhD in Computer Science, 2022
Rice University
MS in Computer Science, 2017
Institute of Computing Technology, Chinese Academy of Sciences
BE in Network Engineering, 2014
School of Software, Yunnan University

Recent News

Mar 2025 [Travel] Attended Linux Foundation Member Summit'25
Mar 2025 [Travel] Attended PPoPP/CGO/HPCA'25
Nov 2024 [Education] Served as a mentor for the Student Lightning Talks hosted by Virgina WHPC organization
Oct 2024 [Education] Attended PACT'24
July 2024 [K-12][Education] Participated in GMU’s VALHEN STEM Academy event
May 2024 [K-12][Education] Participated in GMU’s Exploring Pathways, Proficiencies, and Interdisciplinary Careers in STEM (EPPIC-STEM) event
Apr 2024 [Education] Attended ASPLOS'24

Experience

Member of Technical Staff

November 2023 – Present Fairfax

Triton Compiler

Assistant Professor

George Mason University

August 2023 – Present Fairfax

Computer Architecture/Compiler/Machine Learning Systems

Member of Technical Staff

June 2022 – August 2023 San Francisco

Performance Optimization of Deep Learning Workloads

Software Engineering Intern

May 2021 – August 2021 Remote

Performance Profiling for Deep Learning Frameworks

Software Engineering Intern

May 2020 – August 2020 Remote

Performance Regression Analysis of Feedback-direct Optimization (FDO) Based Programs

Research Intern

June 2018 – August 2018 Menlo Park

Neural Network Optimization on Mobiles

Research Intern

April 2017 – July 2017 Beijing

Neural Network Quantization

Software Engineering Intern

October 2014 – February 2015 Beijing

Hadoop Workflow Optimization

Projects

GPA is a performance advisor for NVIDIA GPUs that suggests potential code optimization opportunities at a hierarchy of levels, including individual lines, loops, and functions. GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program’s structure and the GPU to match inefficiency patterns with suggestions for optimization. GPA estimates each optimization’s speedup based on a PC sampling-based performance model.

Our tool provides a profile view and a trace view for GPU-accelerated applications. The profile view identifies where GPU APIs are invoked in CPU calling context, approximates calling context for GPU execution, and analyzes instruction mix for GPU kernels. The tool traces CPU and GPU activities for a large number of processes and threads with minimal overhead.

HPCToolkit

Triton is a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment for expressing tensor math workloads that offers high flexibility, developer productivity and end to end performance.

Triton

We implemented GVProf, the first value profiler that locates value redundancy problems in applications running on GPU-based clusters. Our experiments show that GVProf incurs acceptable overhead and scales to large executions. GVProf provides useful insights to guide performance optimization. Under the guidance of GVProf, we optimized several HPC and machine learning workloads, obtaining speedups up to 1.93x.

Featured Publications

Triton-Viz: Visualizing GPU Programming in AI Courses

GPU programming is a critical component in AI system courses, which is notoriously difficult to learn and teach, given its unique …

Tejas Ramesh, Alexander Rush, Xu Liu, Binqian Yin, Keren Zhou, Shuyin Jiao

An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications

The US Department of Energy’s fastest supercomputers and forthcoming exascale systems employ Graphics Processing Units (GPUs) to …

Keren Zhou, Xiaozhu Meng, Ryuichi Sai, Dejan Grubisic, John Mellor-Crummey

ValueExpert: Exploring Value Patterns in GPU-Accelerated Applications

General-purpose GPUs have become common in modern computing systems to accelerate applications in many domains, including machine …

Keren Zhou, Yueming Hao, John Mellor-Crummey, Xiaozhu Meng, Xu Liu

Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

To address the challenge of performance analysis on the US DOE’s forthcoming exascale supercomputers, Rice University has been …

Keren Zhou, Laksono Adhianto, Jonathon Anderson, Aaron Cherian, Dejan Grubisic, Mark Krentel, Yumeng Liu, Xiaozhu Meng, John Mellor-Crummey

Tools for Top-down Performance Analysis of GPU-Accelerated Applications

This paper describes extensions to Rice University’s HPCToolkit performance tools to support measurement and analysis of …

Keren Zhou, Mark W. Krentel, John Mellor-Crummey

Recent Publications

Quickly discover relevant content by filtering publications.

Triton-Viz: Visualizing GPU Programming in AI Courses
Tejas Ramesh, Alexander Rush, Xu Liu, Binqian Yin, Keren Zhou, Shuyin Jiao
2025 Proceedings of the 56th ACM Technical Symposium on Computer Science Education (SIGCSE).

Cite Project URL

SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform
Aditya Desai, Kimia Saedi, Apoorv Walia, Jihyeong Lee, Keren Zhou, Anshumali Shrivastava
2024 The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeuIPS).

Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor
Zhen Xie, Murali Emani, Xiaodong Yu, Dingwen Tao, Xin He, Pengfei Su, Keren Zhou, Venkatram Vishwanath
2024 USENIX Annual Technical Conference (USENIX ATC).

Cite Project URL

FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks
Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, Jiajia Li
2024 Proceedings of the 38th ACM International Conference on Supercomputing (ICS).

Cite Project DOI URL

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation
Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, Peter Bell, David Berard, Evgeni Burovski, Geeta Chauhan, Anjali Chourdia, Will Constable, Alban Desmaison, Zachary DeVito, Elias Ellison, Will Feng, Jiong Gong, Michael Gschwind, Brian Hirsh, Sherlock Huang, Kshiteej Kalambarkar, Laurent Kirsch, Michael Lazos, Mario Lezcano, Yanbo Liang, Jason Liang, Yinghai Lu, C. K. Luk, Bert Maher, Yunjie Pan, Christian Puhrsch, Matthias Reso, Mark Saroufim, Marcos Yukio Siraichi, Helen Suk, Shunting Zhang, Michael Suo, Phil Tillet, Xu Zhao, Eikan Wang, Keren Zhou, Richard Zou, Xiaodong Wang, Ajit Mathews, William Wen, Gregory Chanan, Peng Wu, Soumith Chintala
2024 Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Cite Project DOI URL

See all publications

Recent & Upcoming Talks

Proton: Adaptive and Lightweight Profiling for Deep Learning Workloads

A gentle introduction to the Proton profiler and its updates.

Mar 1, 2025 9:34 PM — 9:34 PM Las Vegas, NV

Keren Zhou, Corbin Robeck, Yuanwei Fang

The Proton Dialect: An MLIR Dialect For AI Compiler GPU Kernel Profiling

Presented our work on multi-level instrumentation-based profiling for Triton kernels.

Mar 1, 2025 9:34 PM — 9:34 PM Las Vegas, NV

Keren Zhou, Corbin Robeck, Yuanwei Fang

Profiling and Debugging GPU-accelerated AI Applications

Presented our research on debugging and profiling of GPU-accelerated AI applications.

Oct 24, 2024 9:41 PM — 9:41 PM Virtual

Proton: Introduction and Development

Presented the ongoing work on Proton

Oct 21, 2024 9:41 PM — 9:41 PM Virtual

Yuanwei Fang, Corbin Robeck, Keren Zhou

Dev Tools: Proton/Interpreter

Presented the Proton and Interpreter tools in the Triton project.

Sep 17, 2024 9:41 PM — 9:41 PM Virtual

Students

PhD Students
- Tejas Ramesh, GMU, 2023-
- Bowen Cui, GMU, 2024-
- Jihyeong Lee, GMU, 2024-
- Hao Wu, GMU, 2024-
- Junyu Yin, GMU, 2024-
- Atul Khatri, GMU, 2023-2024
  - co-advised with Prof. Songqing Chen
- Mao Lin, UC Merced, 2022-2023
  - co-advised with Prof. Pengfei Su
Master Students
- Mandar Chaudhari, GMU, 2023-2024
- Karthik Ganapathi Subramanian, NC State, 2023-2024
  - co-advised with Prof. Jiajia Li