June 2018 – August 2018

Research Intern

Facebook Inc

Neural Network Optimization on Mobiles
April 2017 – July 2017

Research Intern

Nvidia Inc

Neural Network Quantization
October 2013 – February 2014

SDE Intern

Baidu Inc

Hadoop Workflow Optimization



Our tool provides a profile view and a trace view for GPU-accelerated applications. The profile view identifies where GPU APIs are invoked in CPU calling context, approximates calling context for GPU execution, and analyzes instruction mix for GPU kernels. The tool traces CPU and GPU activities for a large number of processes and threads with minimal overhead.

A fast, memory efficient, and light-weight implementation for gSpan algorithm in data mining. gBolt is up to 100x faster than the original implementation with multi-threading on a single machine. gBolt also reduces more than 200 folds memory usage, running efficiently on personal computers.

Recent Publications

Quickly discover relevant content by filtering publications.

(2019). A tool for performance analysis of GPU-accelerated applications. Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19).

Source Document

(2018). Quadboost: A Scalable Concurrent Quadtree. IEEE Transactions on Parallel and Distributed Systems (TPDS’18).

Source Document

(2015). BF-MapReduce: A Bloom Filter Based Efficient Lightweight Search. 2015 IEEE Conference on Collaboration and Internet Computing (CIC’15).

Source Document

Recent & Upcoming Talks

Presented our GPU performance tool

Presented the prototype of our GPU performance tool

Presented our ICS’17 work.

Discussed how state-of-the-art deep learning libraries optimize computations by utilizing architectural features.

Introduced various kinds of convolution methods and analyzed their complexities, memory consumptions, and data access patterns.