arxiv:2605.14386

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Published on May 14

· Submitted by

seawolf on May 15

FINAL_Bench

Upvote

Authors:

Abstract

The Darwin Family framework enables training-free evolutionary merging of large language models through gradient-free weight-space recombination, achieving superior reasoning performance without additional training.

AI-generated summary

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.

View arXiv page View PDF Project page Add to collection

Community

seawolf2357

Paper submitter 3 days ago

FINAL Bench introduces a new evaluation paradigm for LLMs:
functional metacognitive reasoning — not just "can the model solve it,"
but "does the model know when, why, and how it solves it."

100 tasks across 15 domains, built on the TICOS framework
(Task / Introspection / Calibration / Output / Self-correction)
Already #5 globally on HF Datasets popularity
Officially endorsed by the HF Evaluation Team (Nathan Habib)

We believe metacognition is the missing axis in current LLM benchmarks.
Feedback welcome.

SeaWolf-AI

3 days ago

Darwin Family — Architecture Overview

Flagship update: Darwin-36B-Opus achieves 88.4% on GPQA Diamond,
matching Qwen3.5-397B-A17B with ~10× fewer params, training-free.

Model: https://huggingface.co/FINAL-Bench/Darwin-36B-Opus
Paper: https://arxiv.org/abs/2605.14386

librarian-bot

about 9 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.14386

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 18

Browse 18 models citing this paper

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Abstract

Community

Models citing this paper 18

Datasets citing this paper 1

Spaces citing this paper 14

Collections including this paper 7