Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System Paper • 2602.18640 • Published Feb 20 • 9
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published Jan 29 • 155
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published Feb 15 • 53
Quantifying the Gap between Understanding and Generation within Unified Multimodal Models Paper • 2602.02140 • Published Feb 2 • 12
SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild? Paper • 2602.03916 • Published Feb 3 • 11
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Paper • 2601.20833 • Published Jan 28 • 183
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models Paper • 2512.19526 • Published Dec 22, 2025 • 12
Running 3.82k The Ultra-Scale Playbook 🌌 3.82k The ultimate guide to training LLM on large GPU Clusters
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 12 items • Updated 12 days ago • 44
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads Paper • 2407.18245 • Published Jul 25, 2024 • 12
dima806/facial_emotions_image_detection Image Classification • 85.8M • Updated Oct 19, 2024 • 65.4k • • 122