view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 188
view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels drbh, danieldk • Aug 18, 2025 • 98
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 smohammadi, siro1, winglian, marcsun13, djsaunde • Aug 8, 2025 • 98
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1, 2025 • 36
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL Paper • 2503.23157 • Published Mar 29, 2025 • 10
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 ariG23498, merve, pcuenq, reach-vb • Mar 12, 2025 • 496
view article Article Faster Assisted Generation with Dynamic Speculation +5 jmamou, orenpereg, joaogante, lewtun, danielkorat, Nadav-Timor, moshew • Oct 8, 2024 • 51
CHESS: Contextual Harnessing for Efficient SQL Synthesis Paper • 2405.16755 • Published May 27, 2024 • 2
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 medmekk, marcsun13, lvwerra, pcuenq, osanseviero, thomwolf • Sep 18, 2024 • 280
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra • Aug 21, 2024 • 41
view article Article Training and Finetuning Embedding Models with Sentence Transformers tomaarsen • May 28, 2024 • 274
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation +7 yuxiang630, cassanof, ganler, YifengDing, StringChaos, harmdevries, lvwerra, arjunguha, lingming • Apr 29, 2024 • 79
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B +3 asoria, tdoehmen, senwu, lorr, vpm238 • Apr 4, 2024 • 29