YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ€– LLM Pipeline

An end-to-end automated pipeline for discovering, merging, evaluating, and fine-tuning open-source LLMs β€” with full MLOps integration.

Python 3.10+ License: Apache 2.0 HF Hub W&B


πŸ—ΊοΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 1 β€” Discovery    β”‚  Scan HF Hub β†’ filter β†’ rank  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Phase 2 β€” Merging      β”‚  SLERP Β· TIES Β· DARE Β· TA     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Phase 3 β€” Evaluation   β”‚  ROUGE Β· BERTScore Β· Judge    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Phase 4 β€” Fine-Tuning  β”‚  LoRA/QLoRA Β· Synthetic Data  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Phase 5 β€” MLOps        β”‚  vLLM Β· W&B Β· MLflow Β· HF Hub β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            ↑__________________________|
                 Iterative improvement loop

✨ Features

Phase 1 β€” Model Discovery

  • Automated HF Hub crawler with category-based keyword search
  • Quality filtering: downloads, likes, parameter count, model card completeness
  • Optional lightweight perplexity probe for fast quality estimation
  • Composite scoring and ranked shortlist output

Phase 2 β€” Model Composition

  • Union merging (capability aggregation):
    • SLERP β€” spherical linear interpolation
    • TIES β€” trim, elect sign, merge
    • DARE-TIES β€” dropout + TIES
    • Task Arithmetic β€” delta-weight addition
  • Intersection merging (conservative):
    • Breadcrumbs β€” consensus-only parameter updates
  • DOM-tree-style architecture introspection (layers, attention heads, MLP blocks)
  • mergekit integration + pure-PyTorch fallback

Phase 3 β€” Evaluation Framework

  • ROUGE (rouge1, rouge2, rougeL)
  • BERTScore (semantic similarity)
  • Faithfulness + hallucination detection (NLI-based)
  • LLM-as-Judge scoring (0–10)
  • Multi-model side-by-side comparison
  • Knowledge gap detector β†’ feeds Phase 4

Phase 4 β€” Efficient Fine-Tuning

  • QLoRA (4-bit NF4 quantization + LoRA adapters)
  • Response-only training (loss on assistant turns only)
  • Synthetic data generation per detected gap category
  • Delta adapter extraction (merge-ready weights)
  • Iterative improvement loop: eval β†’ gap detect β†’ generate β†’ train β†’ repeat

Phase 5 β€” MLOps

  • vLLM inference with PagedAttention (OpenAI-compatible API)
  • Throughput benchmarking
  • Dual tracking: Weights & Biases + MLflow
  • Auto-generated model cards
  • One-command HF Hub deployment

πŸš€ Quick Start

1. Install

git clone https://github.com/YOUR_USERNAME/llm-pipeline.git
cd llm-pipeline
pip install -r requirements.txt

2. Set environment variables

export HF_TOKEN="hf_..."          # Hugging Face token
export WANDB_API_KEY="..."        # W&B token (optional)

3. Run the full pipeline

# Full pipeline for reasoning models
python pipeline.py run reasoning

# With iterative improvement loop
python pipeline.py run code --loop --max-iter 3

# Custom merge strategy
python pipeline.py run medical --strategy breadcrumbs --top-k 3

πŸ“– Usage β€” Individual Phases

Phase 1: Discovery

python -m phase1_discovery.discover run reasoning --top-k 5
python -m phase1_discovery.discover run code --perplexity  # adds perplexity probe
python -m phase1_discovery.discover run --all              # all categories

Phase 2: Merging

# TIES merge (recommended for union)
python -m phase2_merging.merge run ties \
  --model mistralai/Mistral-7B-v0.3 \
  --model teknium/OpenHermes-2.5-Mistral-7B \
  --base mistralai/Mistral-7B-v0.3

# SLERP interpolation
python -m phase2_merging.merge run slerp \
  --model model_a --model model_b --alpha 0.6

# Breadcrumbs (conservative / intersection)
python -m phase2_merging.merge run breadcrumbs \
  --model base --model ft_a --model ft_b --density 0.7

# Inspect architecture (DOM-tree view)
python -m phase2_merging.merge run ties \
  --introspect mistralai/Mistral-7B-v0.3

Phase 3: Evaluation

# Evaluate on SQuAD v2
python -m phase3_evaluation.evaluate run ./merged_model --dataset squad --n-samples 200

# Compare multiple models
python -m phase3_evaluation.evaluate run model_a \
  --compare model_b --compare model_c

# Disable LLM judge (faster)
python -m phase3_evaluation.evaluate run ./merged --no-judge

Phase 4: Fine-Tuning

# Fine-tune targeting specific gaps
python -m phase4_finetuning.finetune run \
  --base mistralai/Mistral-7B-v0.3 \
  --gap factual_recall --gap numerical \
  --n-syn 100 --output ./adapters/run1

# Use existing synthetic data
python -m phase4_finetuning.finetune run \
  --base mistralai/Mistral-7B-v0.3 \
  --data-path ./artifacts/data/synthetic_data.jsonl

# Iterative loop
python -m phase4_finetuning.finetune run --loop --max-iter 3

Phase 5: Inference & MLOps

# Start vLLM server (OpenAI-compatible)
python -m phase5_mlops.serve serve ./merged_model --port 8000

# Benchmark throughput
python -m phase5_mlops.serve serve ./merged_model --bench

# Track experiment
python -m phase5_mlops.serve track my-run \
  --model ./merged --strategy ties \
  --rouge1 0.42 --bertscore 0.71 --judge 7.3

# Deploy to HF Hub
python -m phase5_mlops.serve deploy ./merged_model \
  --repo your-username/my-merged-7b

# View leaderboard
python -m phase5_mlops.serve leaderboard

πŸ“ Project Structure

llm-pipeline/
β”œβ”€β”€ pipeline.py                  # Master orchestrator
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ configs/
β”‚   └── settings.py              # All config: paths, scale, hyperparams
β”œβ”€β”€ utils/
β”‚   └── logger.py                # Centralized logging
β”œβ”€β”€ phase1_discovery/
β”‚   └── discover.py              # HF Hub crawler + ranking
β”œβ”€β”€ phase2_merging/
β”‚   └── merge.py                 # Merging + architecture introspection
β”œβ”€β”€ phase3_evaluation/
β”‚   └── evaluate.py              # Multi-metric eval + gap detection
β”œβ”€β”€ phase4_finetuning/
β”‚   └── finetune.py              # QLoRA + synthetic data + loop
β”œβ”€β”€ phase5_mlops/
β”‚   └── serve.py                 # vLLM + W&B + MLflow + HF deploy
└── artifacts/                   # Auto-created at runtime
    β”œβ”€β”€ models/
    β”œβ”€β”€ merges/
    β”œβ”€β”€ adapters/
    β”œβ”€β”€ evaluations/
    └── data/

βš™οΈ Configuration

Edit configs/settings.py to customize:

# Scale preset (currently: medium = 7B, single A100)
SCALE = "medium"

# Categories and keywords for discovery
HF_MODEL_CATEGORIES = {
    "code":      ["starcoder", "codellama", "deepseek-coder"],
    "reasoning": ["mistral", "llama", "qwen"],
    ...
}

# Fine-tuning defaults
FT_BASE_MODEL = "mistralai/Mistral-7B-v0.3"
FT_EPOCHS     = 3
FT_LR         = 2e-4

# vLLM
VLLM_GPU_MEMORY_UTIL = 0.90
VLLM_MAX_MODEL_LEN   = 4096

🧩 Supported Merge Strategies

Strategy Type Best For
slerp Union Two-model smooth interpolation
ties Union Multi-model, removes conflicting deltas
dare_ties Union Aggressive sparsification before TIES
task_arithmetic Union Adding task-specific capabilities
breadcrumbs Intersection Conservative, safety-preserving merge

πŸ“Š Evaluation Metrics

Metric Tool Threshold
ROUGE-1/2/L rouge-score β‰₯ 0.30
BERTScore F1 bert-score β‰₯ 0.50
Faithfulness cross-encoder/nli-deberta-v3-small β‰₯ 0.50
Hallucination Heuristic + NLI < 10%
Judge Score LLM-as-Judge β‰₯ 5.0/10

πŸ”„ Iterative Improvement Loop

β”Œβ”€ Evaluate model ──────────────────────────────────┐
β”‚   ROUGE / BERTScore / Judge / Faithfulness         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚ gaps detected?
                   β–Ό
β”Œβ”€ Detect knowledge gaps ───────────────────────────┐
β”‚   factual_recall / numerical / code / reasoning    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
β”Œβ”€ Generate synthetic data ─────────────────────────┐
β”‚   LLM generates targeted QA pairs per gap          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
β”Œβ”€ QLoRA fine-tune ─────────────────────────────────┐
β”‚   Response-only loss, 4-bit NF4, paged_adamw       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   └──────────────► repeat until target ROUGE or max_iter

πŸ› οΈ Hardware Requirements

Scale GPU RAM Notes
Small (1–3B) Any CUDA GPU 16GB CPU possible but slow
Medium (7B) A100 / H100 40GB 32GB Recommended
Large (13B+) 2Γ— A100 80GB 64GB Set tensor_parallel=2

πŸ“¦ Key Dependencies


πŸ“„ License

Apache 2.0


πŸ™ Acknowledgements

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for Tejha/MergeMind