Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 514
MobileCLIP2 Collection MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B • 30 items • Updated 19 days ago • 61
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 221
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 45 items • Updated Mar 2 • 109
view article Article Vision Language Model Alignment in TRL ⚡️ +3 sergiopaniego, merve, qgallouedec, kashif, ariG23498 • Aug 7, 2025 • 111
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! +10 reach-vb, pcuenq, lewtun, clem, Rocketknight1, clefourrier, celinah, Wauplin, marcsun13, pagezyhf, ahadnagy, joaogante • Aug 5, 2025 • 513
view article Article seemore: Implement a Vision Language Model from Scratch AviSoori1x • Jun 23, 2024 • 109
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22, 2025 • 130
view article Article 🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation moonshotai • Jun 21, 2025 • 77
NuExtract-2.0 Collection Models specialized in extracting structured information (JSON) from text, PDFs, scans, spreadsheets, etc. • 9 items • Updated Mar 2 • 29
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2, 2025 • 36
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 211
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 29
view article Article Gemma 3n fully available in the open-source ecosystem! +6 ariG23498, pcuenq, sergiopaniego, reach-vb, FL33TW00D-HF, Xenova, Steveeeeeeen, kashif • Jun 26, 2025 • 121