Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling

Abstract

Mercury coordinates remote memory scheduling across multiple GPUs to accelerate large language model operators with improved utilization and reduced communication overhead.

Publication
Proceedings of the 31st ACM Symposium on Operating Systems Principles (SOSP)