V2-VLNCE Models (R2R / RxR)
Overview
This repository provides pretrained checkpoints for Vision-and-Language Navigation in Continuous Environments (VLN-CE).
The models are developed based on the V2-VLNCE framework and are trained on standard VLN benchmarks such as Room-to-Room (R2R) and Room-across-Room (RxR). These models learn to follow natural language instructions and navigate in 3D environments using vision-and-language representations.
This release is intended as a checkpoint dump for research use and reproducibility.
Model Variants
The following checkpoints are included:
VILBEV_r2r_release/ckpt.iter4900.pthVILETP_r2r_release/ckpt.iter12550.pthVILETP_rxr_release/ckpt.iter19100.pth
Usage
These checkpoints are stored as raw PyTorch weights.
To properly load and run the model, you need the original training codebase.
Paper
Huggingface: https://huggingface.co/papers/2507.08831 ArXiv: https://arxiv.org/abs/2507.08831
Code
Full training and evaluation pipeline is available at:
https://github.com/realjoshqsun/V2-VLNCE
Citation
If you use these models, please cite the corresponding work:
@ARTICLE{11419772,
author={Sun, Josh Qixuan and Weng, Huaiyuan and Xing, Xiaoying and Yeum, Chul Min and Crowley, Mark},
journal={IEEE Robotics and Automation Letters},
title={View Invariant Learning for Vision-Language Navigation in Continuous Environments},
year={2026},
volume={11},
number={5},
pages={5861-5868},
doi={10.1109/LRA.2026.3669785}}