16 2

liyaxuan

lllyx

AI & ML interests

None yet

Recent Activity

updated a model 4 days ago

lllyx/Qwen3-1.7B-SFT

updated a collection 4 days ago

Rethinking OPD

upvoted a paper 4 days ago

MAIC-UI: Making Interactive Courseware with Generative UI

View all activity

Organizations

None yet

updated a model 4 days ago

lllyx/Qwen3-1.7B-SFT

Text Generation • 2B • Updated 4 days ago • 614 • 1

updated a collection 4 days ago

Rethinking OPD

Collection

This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 3 items • Updated 4 days ago

upvoted a paper 4 days ago

MAIC-UI: Making Interactive Courseware with Generative UI

Paper • 2604.25806 • Published 10 days ago • 8

updated a collection 4 days ago

Rethinking OPD

Collection

This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 3 items • Updated 4 days ago

updated a model 4 days ago

lllyx/Qwen3-4B-Base-GRPO

Text Generation • 4B • Updated 4 days ago • 92 • 1

published a model 4 days ago

lllyx/Qwen3-4B-Base-GRPO

Text Generation • 4B • Updated 4 days ago • 92 • 1

upvoted a paper 4 days ago

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 9 days ago • 61

upvoted a paper 13 days ago

Near-Future Policy Optimization

Paper • 2604.20733 • Published 16 days ago • 74

updated a collection 20 days ago

Rethinking OPD

Collection

This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 3 items • Updated 4 days ago

authored 2 papers 22 days ago

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Paper • 2510.08483 • Published Oct 9, 2025 • 24

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published 24 days ago • 90

upvoted a paper 23 days ago

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published 24 days ago • 90

upvoted a paper 27 days ago

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 48

upvoted a paper about 1 month ago

Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 169

published a model about 2 months ago

lllyx/Qwen3-1.7B-SFT

Text Generation • 2B • Updated 4 days ago • 614 • 1

upvoted a paper 3 months ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published Feb 12 • 64

liked a model 3 months ago

openbmb/MiniCPM-SALA

Text Generation • 9B • Updated about 11 hours ago • 10.2k • 675

upvoted a collection 3 months ago

UltraData

Collection

Ultra Scale, Ultra Quality, Ultra Coverage • 10 items • Updated 20 days ago • 81

liked a model 3 months ago

openbmb/MiniCPM-o-4_5

Any-to-Any • 9B • Updated Mar 7 • 103k • 1.36k

liyaxuan

AI & ML interests

Recent Activity

Organizations

lllyx's activity