Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning
Abstract
The Darwin Family framework enables training-free evolutionary merging of large language models through gradient-free weight-space recombination, achieving superior reasoning performance without additional training.
We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.
Community
FINAL Bench introduces a new evaluation paradigm for LLMs:
functional metacognitive reasoning — not just "can the model solve it,"
but "does the model know when, why, and how it solves it."
- 100 tasks across 15 domains, built on the TICOS framework
(Task / Introspection / Calibration / Output / Self-correction) - Already #5 globally on HF Datasets popularity
- Officially endorsed by the HF Evaluation Team (Nathan Habib)
We believe metacognition is the missing axis in current LLM benchmarks.
Feedback welcome.
Darwin Family — Architecture Overview
Flagship update: Darwin-36B-Opus achieves 88.4% on GPQA Diamond,
matching Qwen3.5-397B-A17B with ~10× fewer params, training-free.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RuPLaR : Efficient Latent Compression of LLM Reasoning Chains with Rule-Based Priors From Multi-Step to One-Step (2026)
- Evolutionary Task Discovery: Advancing Reasoning Frontiers via Skill Composition and Complexity Scaling (2026)
- ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning (2026)
- EvolveRouter: Co-Evolving Routing and Prompt for Multi-Agent Question Answering (2026)
- Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph (2026)
- EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent (2026)
- Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.14386 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash 