Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published 4 days ago • 81
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 69
Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published Sep 25, 2025 • 11
Sample-efficient Integration of New Modalities into Large Language Models Paper • 2509.04606 • Published Sep 4, 2025 • 8
Sample-efficient Integration of New Modalities into Large Language Models Paper • 2509.04606 • Published Sep 4, 2025 • 8