SlotCTM
Research artifact for: Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?
Archon, Jesse Caldwell, Aura β DuoNeural, April 2026
Overview
A systematic ablation of slot-based CTM world models on N-body bouncing ball physics. Tests when per-object attention (SlotCTM) outperforms mean-field interaction, identifies the capacity bottleneck at scale, and characterizes the collision density phase transition.
Central question: When does modeling object interactions via attention beat modeling them via mean-field (SlotGNN with pooled interaction)?
Key Findings
Temporal Specialization Arc (v21βv24)
| Version | Setting | Spec Score | Key Finding |
|---|---|---|---|
| v21 | Learned, no constraint | 0.0078 | No specialization. All slots generalists. |
| v22 | Hard delay (slot i β t-iΒ·Ο) | 0.2777 | Forced specialization works (35Γ v21), but 2β7Γ perf cost. |
| v23 | Soft learned gates | 0.0876 | Freedom collapses to present. Delta-function gates. |
| v24 | Forced diversity loss | 0.2353 | Gates spread to [0β15] but performance unchanged. |
Conclusion: Temporal gate diversity emerges only when the task requires it. Bouncing ball state is Markovian β one frame is sufficient. The optimal temporal gate is the task's predictability horizon.
N-Body Scaling (v10, v14)
SlotCTM advantage inverts at Nβ₯5 without proportional hidden dimension scaling. At N=8 with standard HIDDEN_DIM=384, CTM is 2.8Γ worse than MLP. Scaling HIDDEN_DIM = NΓ128 recovers the advantage.
Phase Transition (v12)
Collision density r_critical β 0.09β0.11 separates two regimes:
- Ballistic (r < 0.10): MLP fine, CTM overkill
- Collision-entangled (r > 0.10): CTM wins, advantage grows monotonically
At r=0.20, k=100: MLP MSE = 89,241, CTM = 0.352. Ratio: 253,000:1.
Partial Observability (v13 extension of v7)
VarCTM with single-frame position-only observations outperforms MLP-with-velocity-estimation by >180Γ at k=100 (MLP: 63.8 trillion, TempCTM: 0.347). The CTM hidden state IS the belief state.
Architecture
SlotCTM processes each physical object as an independent slot:
- SlotGNN: Per-object encoders + multi-head attention message passing
- CTM dynamics: Shared-weight recurrent ticks per dynamics step
- VarCTM: Variable training horizon k~U(1,20) for best generalization
- TSSP: Thought-Space Self-Prediction auxiliary loss
Why Attention Beats Mean-Field
In dense collision regimes, pairwise object interactions are non-linear and non-symmetric. Mean-field pooling loses the directionality of collision impulses. Attention learns to weight relevant pair interactions, critical for large N and high collision density.
Citation
@article{archon2026slotctm,
title = {Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?},
author = {Archon and Caldwell, Jesse and Aura},
year = {2026},
doi = {10.5281/zenodo.19846804},
url = {https://doi.org/10.5281/zenodo.19846804},
publisher = {Zenodo}
}
DuoNeural
DuoNeural is an open AI research lab β human + AI in collaboration.
| π€ HuggingFace | huggingface.co/DuoNeural |
| π GitHub | github.com/DuoNeural |
| π¦ X / Twitter | @DuoNeural |
| π§ Email | duoneural@proton.me |
| π¬ Newsletter | duoneural.beehiiv.com |
| β Support | buymeacoffee.com/duoneural |
| π Site | duoneural.com |
Research Team
- Jesse β Vision, hardware, direction
- Archon β AI lab partner, post-training, abliteration, experiments
- Aura β Research AI, literature synthesis, novel proposals
DuoNeural Research Publications
Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura β DuoNeural.