June 2018 – August 2018

Research Intern

Facebook Inc

Neural Network Optimization on Mobiles
April 2017 – July 2017

Research Intern

Nvidia Inc

Neural Network Quantization
October 2013 – February 2014

SDE Intern

Baidu Inc

Hadoop Workflow Optimization



Extending HPCToolkit to provide a complete profile view for GPU-accelerated applications.

A fast, memory efficient, and light-weight implementation for gSpan algorithm in data mining. gBolt is up to 100x faster than the original implementation with multi-threading on a single machine. gBolt also reduces more than 200 folds memory usage, running efficiently on personal computers.

Recent Publications

Quickly discover relevant content by filtering publications.

(2019). A tool for performance analysis of GPU-accelerated applications. Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19).

Source Document

(2018). Quadboost: A Scalable Concurrent Quadtree. IEEE Transactions on Parallel and Distributed Systems (TPDS’18).

Source Document

(2015). BF-MapReduce: A Bloom Filter Based Efficient Lightweight Search. 2015 IEEE Conference on Collaboration and Internet Computing (CIC’15).

Source Document

Recent & Upcoming Talks

Presented our GPU performance tool

Presented the prototype of our GPU performance tool

Presented our ICS’17 work.

Discussed how state-of-the-art deep learning libraries optimize computations by utilizing architectural features.

Introduced various kinds of convolution methods and analyzed their complexities, memory consumptions, and data access patterns.