deepseek-ai/DeepSeek-V4-Flash Text Generation ⢠158B ⢠Updated 4 days ago ⢠1.07M ⢠⢠1.02k
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper ⢠2601.07832 ⢠Published Jan 12 ⢠52