Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling

Yue Guan, Xinwei Qiang, Zaifeng Pan, Daniels Johnson, Yuanwei Fang, Keren Zhou, Yuke Wang, Wanlu Li, Yufei Ding, Adnan Aziz

October, 2025 Systems

Abstract

Mercury coordinates remote memory scheduling across multiple GPUs to accelerate large language model operators with improved utilization and reduced communication overhead.

Type

Conference paper

Publication

Proceedings of the 31st ACM Symposium on Operating Systems Principles (SOSP)

Distributed Systems GPU Large Language Models