pretraining dataset is Libri-Light, not LibriSpeech

by gaunernst - opened Jul 1, 2024

As per paper (https://arxiv.org/pdf/2202.03555), table 2, data2vec audio large was pre-trained on Libri-Light.
GitHub page (https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec) also shows that large variants were pre-trained on Libri-Light.

Datasets tag should be updated accordingly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment