-
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 170 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 135 -
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70
Collections
Discover the best community collections!
Collections including paper arxiv:2512.13687
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 120 -
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance
Paper • 2309.03736 • Published
-
MiniMaxAI/VTP-Small-f16d64
Image Feature Extraction • 0.2B • Updated • 308 • 14 -
MiniMaxAI/VTP-Base-f16d64
Image Feature Extraction • Updated • 132 • 20 -
MiniMaxAI/VTP-Large-f16d64
Image Feature Extraction • 0.7B • Updated • 217 • 15 -
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 154 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 74 -
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Paper • 2505.13181 • Published • 9 -
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 17 • 7 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 61 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 18 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 76
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 170 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 135 -
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 70
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Paper2Video: Automatic Video Generation from Scientific Papers
Paper • 2510.05096 • Published • 120 -
TradingGPT: Multi-Agent System with Layered Memory and Distinct Characters for Enhanced Financial Trading Performance
Paper • 2309.03736 • Published
-
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106 -
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99 -
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper • 2512.23576 • Published • 66
-
MiniMaxAI/VTP-Small-f16d64
Image Feature Extraction • 0.2B • Updated • 308 • 14 -
MiniMaxAI/VTP-Base-f16d64
Image Feature Extraction • Updated • 132 • 20 -
MiniMaxAI/VTP-Large-f16d64
Image Feature Extraction • 0.7B • Updated • 217 • 15 -
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 106
-
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 74 -
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Paper • 2505.13181 • Published • 9 -
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 154 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 17 • 7 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 61 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 18 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 76
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33