Hugging Face Smol Models Research

Team

community

Activity Feed

AI & ML interests

Exploring smol models (for text, vision and video) and high quality web and synthetic datasets

Recent Activity

cmpatino updated a Space 6 days ago

HuggingFaceTB/trl-distillation-trainer

cmpatino published a dataset 8 days ago

HuggingFaceTB/CoT_Reasoning_Bushcraft_Survival

cmpatino updated a dataset 12 days ago

HuggingFaceTB/CoT_Reasoning_Bushcraft_Survival

View all activity

Papers

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

View all Papers

qgallouedec

posted an update 4 days ago

Post

1669

TRL v1.2 introduces the SSDTrainer 🚀

Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL.

The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model.

from trl.experimental.ssd import SSDConfig, SSDTrainer

trainer = SSDTrainer(
    model="Qwen/Qwen3-4B-Instruct",
    args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95),
    train_dataset=dataset,
)
trainer.train()

v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of use_transformers_paged, and key fixes for VLM response parsing.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0

cmpatino

updated a Space 6 days ago

Distilling 100B+ Models 40x Faster with TRL

📝

TRL distillation for 100B+ teachers, 40x faster

cmpatino

published a dataset 8 days ago

HuggingFaceTB/CoT_Reasoning_Bushcraft_Survival

Viewer • Updated 12 days ago • 3k • 81 • 2

cmpatino

updated a dataset 12 days ago

HuggingFaceTB/CoT_Reasoning_Bushcraft_Survival

Viewer • Updated 12 days ago • 3k • 81 • 2

cmpatino

updated a model 18 days ago

HuggingFaceTB/SmolLM3-3B-GSM8K-SFT

Text Generation • 3B • Updated 18 days ago • 403 • 1

cmpatino

published a model 18 days ago

HuggingFaceTB/SmolLM3-3B-GSM8K-SFT

Text Generation • 3B • Updated 18 days ago • 403 • 1

qgallouedec

posted an update 20 days ago

Post

2310

TRL v1.0 is out!

Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.

The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.

What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.

pip install --upgrade trl

HuggingFaceTB/qwen3-1.7b-gsm8k-sft

Text Generation • 2B • Updated 27 days ago • 2.04k • 3

cmpatino

published a model 27 days ago

HuggingFaceTB/qwen3-1.7b-gsm8k-sft

Text Generation • 2B • Updated 27 days ago • 2.04k • 3

clefourrier

authored a paper about 1 month ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published Mar 12 • 65

qgallouedec

posted an update 2 months ago

Post

3008

@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb

1 reply

lewtun

submitted 2 papers to Daily Papers 2 months ago

Single-minus gluon tree amplitudes are nonzero

Paper • 2602.12176 • Published Feb 12 • 8

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published Feb 3 • 13

cfahlgren1

submitted a paper to Daily Papers 3 months ago

How AI Impacts Skill Formation

Paper • 2601.20245 • Published Jan 28 • 10

pcuenq

posted an update 4 months ago

Post

4480

👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂

3 replies

craffel

authored a paper 4 months ago

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

Paper • 2512.20757 • Published Dec 23, 2025 • 18

qgallouedec

submitted a paper to Daily Papers 4 months ago

INTELLECT-3: Technical Report

Paper • 2512.16144 • Published Dec 18, 2025 • 20

alozowski

authored a paper 4 months ago

YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2, 2025 • 23

cgeorgiaw

posted an update 5 months ago

Post

1901

🚀🚀🚀Huge biotech data drop today🚀🚀🚀

The largest drug-target dataset ever created was just released on Hugging Face—and it's still growing...

EvE Bio is further updating the dataset every 8 weeks. Drug development dream.

Read the blog: https://huggingface.co/blog/hugging-science/eve-bio-mapping-the-pharmone-drug-interaction
Play with the data: eve-bio/drug-target-activity

AI & ML interests

Recent Activity

Papers

Team members 35

HuggingFaceTB's activity

Distilling 100B+ Models 40x Faster with TRL