YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Sparse Transformer: Experiment Suite + Triton Kernels

Comprehensive experiment infrastructure for the Chunked Sparse Backward Pass paper.

Files

File Description
triton_sparse.py Triton-fused sparse backward kernels (dW, dX, dBias) + Python-loop baseline + correctness tests + microbenchmark
e2e_full.py End-to-end training benchmark: Dense vs PyLoop vs Triton at d_model ∈ {512, 1024, 2048}
full_experiments.py 7-experiment ablation suite (baselines, predictor accuracy, chunk ablation, compute-matched, exploration, attention sparsification, sparsity sweep)
analyze_results.py Publication figure generator (matplotlib)

Quick Start

pip install torch triton tiktoken matplotlib numpy

# Correctness test + microbenchmark
python triton_sparse.py

# End-to-end training (needs ≥24GB GPU for d=2048)
python e2e_full.py

# Full ablation suite (7 experiments, ~4-6 hours on A10G)
python full_experiments.py --experiment all --device cuda --steps 2000 --seeds "42,123,456"

# Single experiment
python full_experiments.py --experiment baselines --device cuda

Results

See RESULTS.md for collected tables.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support