view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 338
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published Jan 16, 2025 • 35
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity Paper • 2412.09856 • Published Dec 13, 2024 • 11