MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory Paper • 2605.15128 • Published 3 days ago • 52
Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published Mar 2 • 33
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published Mar 2 • 63