V2-VLNCE Models (R2R / RxR)

Overview

This repository provides pretrained checkpoints for Vision-and-Language Navigation in Continuous Environments (VLN-CE).

The models are developed based on the V2-VLNCE framework and are trained on standard VLN benchmarks such as Room-to-Room (R2R) and Room-across-Room (RxR). These models learn to follow natural language instructions and navigate in 3D environments using vision-and-language representations.

This release is intended as a checkpoint dump for research use and reproducibility.


Model Variants

The following checkpoints are included:

  • VILBEV_r2r_release/ckpt.iter4900.pth
  • VILETP_r2r_release/ckpt.iter12550.pth
  • VILETP_rxr_release/ckpt.iter19100.pth

Usage

These checkpoints are stored as raw PyTorch weights.

To properly load and run the model, you need the original training codebase.

Paper

Huggingface: https://huggingface.co/papers/2507.08831 ArXiv: https://arxiv.org/abs/2507.08831

Code

Full training and evaluation pipeline is available at:

https://github.com/realjoshqsun/V2-VLNCE

Citation

If you use these models, please cite the corresponding work:

@ARTICLE{11419772,
  author={Sun, Josh Qixuan and Weng, Huaiyuan and Xing, Xiaoying and Yeum, Chul Min and Crowley, Mark},
  journal={IEEE Robotics and Automation Letters}, 
  title={View Invariant Learning for Vision-Language Navigation in Continuous Environments}, 
  year={2026},
  volume={11},
  number={5},
  pages={5861-5868},
  doi={10.1109/LRA.2026.3669785}}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for joshalchemist/VIL