Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes Paper • 2603.25562 • Published 26 days ago • 13
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 7 days ago • 83
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing Paper • 2604.02288 • Published 19 days ago • 32