Quentin Gallouédec's picture

Hiring 💼

Quentin Gallouédec PRO

qgallouedec

huggingface

·

AI & ML interests

None yet

Recent Activity

updated a Space 6 minutes ago

qgallouedec/huggingface-static-660dce

published a Space 6 minutes ago

qgallouedec/huggingface-static-660dce

updated a bucket 6 minutes ago

qgallouedec/huggingface-static-660dce-bucket

View all activity

Organizations

buckets 27

qgallouedec/chunked-nll-benchmark-2-bucket

qgallouedec/huggingface-static-660dce-bucket

qgallouedec/huggingface-static-02b073-bucket

qgallouedec/huggingface-static-8d4898-bucket

qgallouedec/huggingface-static-6a0931-bucket

qgallouedec/huggingface-static-a171f8-bucket

View 27 buckets

Posts 3

Post

1674

TRL v1.2 introduces the SSDTrainer 🚀

Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL.

The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model.

from trl.experimental.ssd import SSDConfig, SSDTrainer

trainer = SSDTrainer(
    model="Qwen/Qwen3-4B-Instruct",
    args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95),
    train_dataset=dataset,
)
trainer.train()

v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of use_transformers_paged, and key fixes for VLM response parsing.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0

Articles 14

Article

49

TRL v1.0: Post-Training Library Built to Move with the Field

View all Articles

Papers 4

arxiv:2402.09844

arxiv:2402.03046

arxiv:2208.14928

arxiv:2106.13687

spaces 53

Huggingface Static 660dce

Huggingface Static 02b073

Huggingface Static 8d4898

Huggingface Static 6a0931

Huggingface Static A171f8

Huggingface Static 85dbfe

models 789

qgallouedec/Qwen3-0.6B-SFT-20251113165959

Text Generation • 0.6B • Updated 12 days ago • 272

qgallouedec/tiny-aya-global-SFT

qgallouedec/tiny-aya-global-tool-calling-SFT

qgallouedec/my-other-awesome-model

Text Generation • 0.5B • Updated Feb 14 • 6

qgallouedec/my-awesome-model

Text Generation • 0.5B • Updated Feb 14 • 5

qgallouedec/trainer_output

Text Generation • 0.5B • Updated Feb 14 • 6

qgallouedec/test_push_output_4

Text Classification • 87.5k • Updated Feb 14 • 6

qgallouedec/qwen2-0.5b-deepmath-grpo

qgallouedec/my-finetuned-model

0.8B • Updated Jan 2 • 1

qgallouedec/Qwen3-0.6B-SFT-20251113163732

Updated Nov 13, 2025

View 789 models

datasets 85

qgallouedec/test-grpo-vlm-log-completions

Viewer • Updated Mar 20 • 435 • 247

qgallouedec/llama_star_formatted

Viewer • Updated Feb 21 • 7.21k • 9

qgallouedec/deepmath-completions-logs2

Viewer • Updated Jan 22 • 48 • 50

qgallouedec/deepmath-completions-logs

Viewer • Updated Jan 13 • 232 • 75 • 1

qgallouedec/Dolci-Think-DPO-7B

Viewer • Updated Nov 28, 2025 • 150k • 12

qgallouedec/biogrid_qa

Viewer • Updated Nov 18, 2025 • 59.4k • 229

qgallouedec/human_gene_interaction_qa_v2

Viewer • Updated Nov 18, 2025 • 79.2k • 13

qgallouedec/human_gene_interaction_qa

Viewer • Updated Nov 17, 2025 • 1.84M • 13

qgallouedec/biogrid

Viewer • Updated Nov 17, 2025 • 2.82M • 1.07k

qgallouedec/trl-metrics

Viewer • Updated Oct 7, 2025 • 148k • 42 • 1

View 85 datasets