📚 LLM pretraining datasets Collection A collection of datasets for LLM pretraining • 9 items • Updated May 5, 2025 • 18
Running on CPU Upgrade Featured 3.14k The Smol Training Playbook 📚 3.14k The secrets to building world-class LLMs
Running 98 Unlocking On-Policy Distillation for Any Model Family 📝 98 Visualize on-policy distillation for any model family
[lecture artifacts] aligning open language models Collection artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin • 63 items • Updated Apr 17, 2024 • 58
Running Agents Featured 253 Jupyter Agent 2 🏃 253 Generate Jupyter notebooks from natural language tasks
laion/CLIP-ViT-B-32-laion2B-s34B-b79K Zero-Shot Image Classification • 0.2B • Updated Jan 22, 2025 • 3.08M • 139