RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published 5 days ago • 72
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 16 days ago • 213
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published Mar 30 • 22
DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents Paper • 2602.07035 • Published Feb 3 • 30
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning Paper • 2510.06217 • Published Oct 7, 2025 • 67