VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper β’ 2505.23359 β’ Published May 29, 2025 β’ 38
Teaching LMMs for Image Quality Scoring and Interpreting Paper β’ 2503.09197 β’ Published Mar 12, 2025 β’ 1
Generative Frame Sampler for Long Video Understanding Paper β’ 2503.09146 β’ Published Mar 12, 2025 β’ 1
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare Paper β’ 2405.19298 β’ Published May 29, 2024
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper β’ 2411.13281 β’ Published Nov 20, 2024 β’ 21
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper β’ 2410.05993 β’ Published Oct 8, 2024 β’ 111
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper β’ 2407.15754 β’ Published Jul 22, 2024 β’ 21
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper β’ 2312.17090 β’ Published Dec 28, 2023 β’ 4
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper β’ 2311.06783 β’ Published Nov 12, 2023 β’ 28
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision Paper β’ 2309.14181 β’ Published Sep 25, 2023 β’ 2
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling Paper β’ 2207.02595 β’ Published Jul 6, 2022