view article Article TRL v1.0: Post-Training Library Built to Move with the Field +2 29 days ago • 50
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 310
view article Article SmolVLM Grows Smaller – Introducing the 256M & 500M Models! +1 Jan 23, 2025 • 192