DeltaTok (Tokenizer) — Kinetics-700

DeltaTok is a video tokenizer that compresses the frame-to-frame change in vision foundation model features into a single continuous "delta" token, as introduced in A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens (CVPR 2026 Highlight). This approach significantly reduces the token count in video sequences (e.g., 1,024x reduction) and enables efficient generative world modeling.

This repository contains the ViT-B encoder and decoder trained on Kinetics-700 at 512x512 resolution.

Metrics

Reconstruction quality, measured by applying downstream task heads to the reconstructed features.

Method	Horizon	VSPW mIoU (↑)	Cityscapes mIoU (↑)	KITTI RMSE (↓)
Present (upper bound)	—	58.4	70.5	2.79
DeltaTok	Short (1 frame)	58.6	69.6	2.78
DeltaTok	Mid (3 frames)*	58.5	67.9	2.86

*Parallel encoding from ground-truth frames with autoregressive decoding from previous reconstructions.

Usage

Requires a frozen DINOv3 ViT-B backbone. Full training and evaluation code is available in the DeltaTok GitHub repository. To evaluate:

python main.py validate -c configs/deltatok_vitb_dinov3_vitb_kinetics.yaml \
  --model.ckpt_path=path/to/deltatok-kinetics/pytorch_model.bin

Acknowledgements

Citation

@inproceedings{kerssies2026deltatok,
  title     = {A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens},
  author    = {Kerssies, Tommie and Berton, Gabriele and He, Ju and Yu, Qihang and Ma, Wufei and de Geus, Daan and Dubbelman, Gijs and Chen, Liang-Chieh},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Amazon-FAR/deltatok-kinetics

DeltaTok

Collection

DeltaTok tokenizer, DeltaWorld predictor, and evaluation heads. https://github.com/amazon-far/deltatok • 7 items • Updated 13 days ago • 7

Paper for Amazon-FAR/deltatok-kinetics

A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens

Paper • 2604.04913 • Published 15 days ago • 10