Archives | Xiaocan (Bruce) Li's Blog

2025 ¹

February ¹

A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation

February 5, 2025 · 8 min · 1585 words · Xiaocan (Bruce) Li