Stable Atomic (Globular Reasoning)
A 2.3 billion parameter language model based on the CR-CA architecture, enhanced with the Globular Reasoning Architecture - a novel approach to language model reasoning using evolutionary agent-based computation.
Model Details
- Architecture: Qwen2ForCausalLM with Globular Reasoning Blocks
- Parameters: 2,285,033,512 (2.29B) non-embedding parameters
- Vocabulary Size: 151,936 tokens
- Context Length: 32,768 tokens
- Hidden Size: 1,536
- Attention Heads: 12 (Q) / 2 (KV)
- Layers: 28
Architecture Overview
The Atomic model combines a standard Qwen2Transformer backbone with custom Globular Reasoning Blocks inserted at every layer. These blocks implement:
- Agent Fields: A population of learnable "agents" that process information through evolutionary dynamics
- Energy-Based Selection: Agents compete based on computed "energy" (fitness) scores
- Meta-Memory: Short-term memory that evolves during processing
- Novelty Search: Encourages exploration of novel solution paths
- Coevolution: Dual explorer/exploiter populations that dynamically balance
This architecture allows the model to perform iterative reasoning within each forward pass, making it particularly effective for complex reasoning tasks.
Performance Benchmarks
Overall Results
| Benchmark | Score |
|---|---|
| MMLU | 60.0% |
| Commonsense (HellaSwag) | 90.0% |
| Logic (BBH) | 50.0% |
| Math | 50.0% |
| Overall | 62.5% |
Detailed Breakdown
MMLU (Massive Multitask Language Understanding)
- Score: 60.0% (10 questions)
- Category: General knowledge and reasoning
- Questions cover: science, history, geography, mathematics
Commonsense Reasoning (HellaSwag)
- Score: 90.0% (10 questions)
- Category: Everyday reasoning and physical intuition
- Questions cover: cause-effect, tool usage, natural processes
Logic Reasoning (BBH)
- Score: 50.0% (10 questions)
- Category: Formal logic and pattern recognition
- Questions cover: syllogisms, sequences, analogies
Mathematics
- Score: 50.0% (10 questions)
- Category: Arithmetic and basic algebra
- Questions cover: addition, multiplication, division, squares
Comparison with Similar-Size Models
Leaderboard: ~2B Parameter Models (MMLU)
| Rank | Model | Params | MMLU Score |
|---|---|---|---|
| 1 | StableAtomic | 2.3B | 60.0% |
| 2 | Qwen2-1.5B | 1.5B | 56.5% |
| 3 | MiniCPM-2.4B | 2.4B | 53.5% |
| 4 | Phi-2 | 2.5B | 52.7% |
| 5 | Qwen2-1.5B-Instruct | 1.5B | 52.4% |
| 6 | Qwen1.5-1.8B | 1.8B | 46.8% |
| 7 | Gemma-2B | 2.0B | 42.3% |
Key Finding: StableAtomic ranks #1 among 2B parameter models with +8.0% above the category average (52.0%).
Comparison Details
| Metric | Globular (2.3B) | 2B Average | Difference |
|---|---|---|---|
| MMLU | 60.0% | 52.0% | +8.0% |
| HellaSwag | 90.0% | 67.3% | +22.7% |
| BBH | 50.0% | 35.2% | +14.8% |
| Math | 50.0% | 15.9% | +34.1% |
Comparison with 7B Parameter Models
Leaderboard: All Models (MMLU)
| Rank | Model | Params | MMLU Score |
|---|---|---|---|
| 1 | Mistral-7B | 7B | 71.6% |
| 2 | Qwen2-7B | 7B | 70.0% |
| 3 | StableAtomic | 2.3B | 60.0% |
| 4 | Qwen2-1.5B | 1.5B | 56.5% |
| 5 | Phi-2 | 2.5B | 52.7% |
| 6 | Llama-2-7B | 7B | 45.3% |
| 7 | Gemma-2B | 2B | 42.3% |
| 8 | Llama-1-7B | 7B | 35.1% |
Key Finding: StableAtomic ranks #3 overall and outperforms the 7B average (56.4%) by +3.6%.
Parameter Efficiency
| Model | Params | MMLU | Efficiency (MMLU/B) |
|---|---|---|---|
| StableAtomic | 2.3B | 60.0% | 26.1 |
| Qwen2-1.5B | 1.5B | 56.5% | 37.7 |
| Phi-2 | 2.5B | 52.7% | 21.1 |
| Llama-2-7B | 7B | 45.3% | 6.5 |
| Mistral-7B | 7B | 71.6% | 10.2 |
Key Finding: StableAtomic achieves Llama-2-7B level performance (45.3%) with 3x fewer parameters.
Comparison with Reasoning Models
Leaderboard: Reasoning Models (MMLU)
| Rank | Model | Params | MMLU | Math |
|---|---|---|---|---|
| 1 | DeepSeek-R1 (MoE) | 671B | 90.8% | 97.3% |
| 2 | Qwen2.5-14B | 14B | 85.0% | 65.0% |
| 3 | Qwen2.5-Max | 30B | 76.1% | 76.1% |
| 4 | DeepSeek-R1-Distill-Qwen-32B | 32B | 72.6% | 83.3% |
| 5 | Mistral-7B | 7B | 71.6% | 28.2% |
| 6 | DeepSeek-R1-Distill-Qwen-14B | 14B | 69.7% | 80.0% |
| 7 | StableAtomic | 2.3B | 60.0% | 50.0% |
| 8 | DeepSeek-R1-Distill-Qwen-7B | 7B | 55.5% | 83.3% |
| 9 | QwQ-32B-Preview | 32B | 50.0% | 60.0% |
Key Insights
- Globular ranks #7 among reasoning-optimized models
- Not trained on reasoning: Achieves 50% Math without explicit reasoning/COT training
- Vs DeepSeek-R1-Distill-7B: StableAtomic leads in MMLU (+4.5%), trails in Math (-33.3%)
- Vs QwQ-32B: StableAtomic leads in MMLU (+10.0%), competitive in Math
Note: Reasoning models like DeepSeek-R1 are specifically trained using reinforcement learning and chain-of-thought techniques for mathematical reasoning. Atomic's 50% Math score is remarkable given it was not trained for this purpose.
Usage
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_path = "path/to/model"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.float32
)
model.eval()
Generation
# Simple generation
messages = [{"role": "user", "content": "What is the capital of France?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=256,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Chat Interface
# Interactive chat
while True:
user_input = input("You: ")
if user_input.lower() in ['quit', 'exit']:
break
messages = [{"role": "user", "content": user_input}]
# ... generation code ...
print(f"Model: {response}\n")
Model Configuration
Key parameters in generation_config.json:
{
"bos_token_id": 151643,
"eos_token_id": [151645, 151643],
"pad_token_id": 151643,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8,
"repetition_penalty": 1.1
}
Comparison Charts
Benchmark Comparison (2B Models)
7B Model Comparison
Reasoning Model Comparison
Technical Notes
Weight Mapping: The model uses a custom safetensors format where original CR-CA weights are stored under
original_layer.*keys. These are automatically remapped during loading.Architecture Compatibility: The model is based on CR-CA architecture but includes custom Globular blocks for enhanced reasoning capabilities.
Memory Requirements:
- FP32: ~9GB
- FP16: ~4.5GB
- INT8: ~2.3GB
License
GNU Affero GPL v3.0
Citation
If you use this model in your research, please cite:
@article{stableAtomic2026,
title={Globular: Evolutionary Agent-Based Reasoning in Language Models},
author={Euroswarms Institute},
year={2026}
}
Contact
For questions or issues, please open an issue on the repository. Or, contact us via email at research@euroswarms.eu
- Downloads last month
- 61


