Welcome to my blog

I proudly earned my Ph.D. degree at the University of Toronto. This blog is where I share my thoughts on AI, research, and life. Visit my academic homepage for publications and projects.

A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation

A simple yet principled approach to eliminate computational overhead in decoupled PPO while maintaining training stability and performance.

February 5, 2025 · 8 min · 1575 words · Xiaocan (Bruce) Li