SlotCTM

Research artifact for: Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?

Archon, Jesse Caldwell, Aura β€” DuoNeural, April 2026

Overview

A systematic ablation of slot-based CTM world models on N-body bouncing ball physics. Tests when per-object attention (SlotCTM) outperforms mean-field interaction, identifies the capacity bottleneck at scale, and characterizes the collision density phase transition.

Central question: When does modeling object interactions via attention beat modeling them via mean-field (SlotGNN with pooled interaction)?

Key Findings

Temporal Specialization Arc (v21–v24)

Version Setting Spec Score Key Finding
v21 Learned, no constraint 0.0078 No specialization. All slots generalists.
v22 Hard delay (slot i β†’ t-iΒ·Ο„) 0.2777 Forced specialization works (35Γ— v21), but 2–7Γ— perf cost.
v23 Soft learned gates 0.0876 Freedom collapses to present. Delta-function gates.
v24 Forced diversity loss 0.2353 Gates spread to [0–15] but performance unchanged.

Conclusion: Temporal gate diversity emerges only when the task requires it. Bouncing ball state is Markovian β€” one frame is sufficient. The optimal temporal gate is the task's predictability horizon.

N-Body Scaling (v10, v14)

SlotCTM advantage inverts at Nβ‰₯5 without proportional hidden dimension scaling. At N=8 with standard HIDDEN_DIM=384, CTM is 2.8Γ— worse than MLP. Scaling HIDDEN_DIM = NΓ—128 recovers the advantage.

Phase Transition (v12)

Collision density r_critical β‰ˆ 0.09–0.11 separates two regimes:

  • Ballistic (r < 0.10): MLP fine, CTM overkill
  • Collision-entangled (r > 0.10): CTM wins, advantage grows monotonically

At r=0.20, k=100: MLP MSE = 89,241, CTM = 0.352. Ratio: 253,000:1.

Partial Observability (v13 extension of v7)

VarCTM with single-frame position-only observations outperforms MLP-with-velocity-estimation by >180Γ— at k=100 (MLP: 63.8 trillion, TempCTM: 0.347). The CTM hidden state IS the belief state.

Architecture

SlotCTM processes each physical object as an independent slot:

  • SlotGNN: Per-object encoders + multi-head attention message passing
  • CTM dynamics: Shared-weight recurrent ticks per dynamics step
  • VarCTM: Variable training horizon k~U(1,20) for best generalization
  • TSSP: Thought-Space Self-Prediction auxiliary loss

Why Attention Beats Mean-Field

In dense collision regimes, pairwise object interactions are non-linear and non-symmetric. Mean-field pooling loses the directionality of collision impulses. Attention learns to weight relevant pair interactions, critical for large N and high collision density.

Citation

@article{archon2026slotctm,
  title     = {Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?},
  author    = {Archon and Caldwell, Jesse and Aura},
  year      = {2026},
  doi       = {10.5281/zenodo.19846804},
  url       = {https://doi.org/10.5281/zenodo.19846804},
  publisher = {Zenodo}
}

DuoNeural

DuoNeural is an open AI research lab β€” human + AI in collaboration.

πŸ€— HuggingFace huggingface.co/DuoNeural
πŸ™ GitHub github.com/DuoNeural
🐦 X / Twitter @DuoNeural
πŸ“§ Email duoneural@proton.me
πŸ“¬ Newsletter duoneural.beehiiv.com
β˜• Support buymeacoffee.com/duoneural
🌐 Site duoneural.com

Research Team

  • Jesse β€” Vision, hardware, direction
  • Archon β€” AI lab partner, post-training, abliteration, experiments
  • Aura β€” Research AI, literature synthesis, novel proposals

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura β€” DuoNeural.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support