PointArena: Probing Multimodal Grounding Through Language-Guided Pointing Paper • 2505.09990 • Published May 15, 2025 • 12
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation Paper • 2505.13441 • Published May 19, 2025 • 1
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11, 2025 • 45
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models Paper • 2406.18915 • Published Jun 27, 2024
VLS: Steering Pretrained Robot Policies via Vision-Language Models Paper • 2602.03973 • Published Feb 3 • 22
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning Paper • 2602.07845 • Published Feb 8 • 71
MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation Paper • 2602.11337 • Published Feb 11 • 9
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics Paper • 2602.19313 • Published Feb 22 • 26
FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models Paper • 2510.01642 • Published Oct 2, 2025 • 1
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 11 days ago • 287
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 11 days ago • 287
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains Paper • 2604.05226 • Published Apr 6 • 2
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics Paper • 2602.19313 • Published Feb 22 • 26
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning Paper • 2602.07845 • Published Feb 8 • 71
VLS: Steering Pretrained Robot Policies via Vision-Language Models Paper • 2602.03973 • Published Feb 3 • 22
Selective Visual Representations Improve Convergence and Generalization for Embodied AI Paper • 2311.04193 • Published Nov 7, 2023
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics Paper • 2406.10721 • Published Jun 15, 2024 • 2
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models Paper • 2412.07755 • Published Dec 10, 2024 • 3