WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation Paper • 2503.07265 • Published Mar 10, 2025 • 4
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3, 2025 • 58
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning Paper • 2507.03019 • Published Jul 2, 2025 • 1
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning Paper • 2510.11026 • Published Oct 13, 2025 • 18
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models Paper • 2510.12784 • Published Oct 14, 2025 • 20
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback Paper • 2510.16888 • Published Oct 19, 2025 • 22
OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination Paper • 2509.00723 • Published Aug 31, 2025 • 1
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published Nov 25, 2025 • 33
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published 18 days ago • 33
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 2 days ago • 141
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 2 days ago • 141
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published 14 days ago • 89
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published Apr 6 • 235