You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NbAiLab / nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim

Norwegian verbatim ASR checkpoint for the NB-ASR beta program

This repository hosts a Norwegian verbatim ASR checkpoint built from a Parakeet RNNT XXL 1.1B training run and packaged for NB-ASR beta evaluation.

Internal run reference: dgx-8gpu-eval4-20260404-1053

Attribution

This model is derived from NVIDIA Parakeet RNNT checkpoints and adapted by NbAiLab for Norwegian NB-ASR beta use.

  • Base model family: nvidia/parakeet-rnnt-1.1b
  • Original model provider: NVIDIA
  • Modifications by: NbAiLab / NB-ASR project (fine-tuning, packaging, and evaluation setup)
  • This repository license: CC-BY-4.0

When redistributing or referencing this model, keep attribution to both NVIDIA (base model source) and NbAiLab (derived checkpoint work).

Checkpoint source (local training artifact):

/nfs/datastore0/nb-asr-export/parakeet-runs/dgx-8gpu-eval4-20260404-1053/2026-04-04_08-53-58/checkpoints

Main files from that run:

  • dgx-8gpu-eval4-20260404-1053.nemo

What This Model Is For

  • Norwegian speech-to-text (verbatim output)
  • checkpoint validation and benchmarking
  • timestamped transcription workflows
  • downstream diarization + speaker-attributed transcript generation

Installation

pip install -U "nemo_toolkit[asr]" soundfile huggingface_hub

For GPU systems, install a CUDA-compatible PyTorch build first.

Environment Setup (Recommended)

Use a fresh environment to avoid mixed dependency stacks.

python -m venv .venv-nb-asr-beta
source .venv-nb-asr-beta/bin/activate
export PYTHONNOUSERSITE=1
python -m pip install -U pip setuptools wheel
python -m pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision torchaudio
python -m pip install "nemo_toolkit[asr]" soundfile huggingface_hub
python -m pip install --force-reinstall --no-deps "lightning==2.4.0" "pytorch-lightning==2.4.0"

If your machine only supports CUDA 12.4 drivers, use the cu124 index URL instead of cu128.

Verify GPU is active:

python - << 'PY'
import torch
print("torch:", torch.__version__)
print("torch cuda:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
PY

Canonical Load Method For This Repo

Use hf_hub_download(...) + EncDecRNNTBPEModel.restore_from(...) for this model.

Do not use:

  • nemo_asr.models.ASRModel.from_pretrained("NbAiLab/nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim")

That path may trigger a NeMo cache resolution issue for this repository shape and fail with:

FileNotFoundError: .../hf_hub_cache/.../model_config.yaml

cuDNN Initialization Troubleshooting

If you see:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

use this GPU-safe command:

python run_demo.py --audio audio/audio.wav --disable-cudnn

Quick Start: Transcription

Two GPU paths are supported:

  • Path A (default): standard cuDNN behavior.
  • Path B (optional fallback): disable cuDNN with --disable-cudnn if needed.

Path A: Default GPU Path

from huggingface_hub import hf_hub_download
from nemo.collections.asr.models import EncDecRNNTBPEModel

MODEL_ID = "NbAiLab/nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim"
MODEL_FILE = "dgx-8gpu-eval4-20260404-1053.nemo"
AUDIO = "audio/audio.wav"

nemo_file = hf_hub_download(repo_id=MODEL_ID, filename=MODEL_FILE)
asr_model = EncDecRNNTBPEModel.restore_from(restore_path=nemo_file)
results = asr_model.transcribe([AUDIO])

print(results[0].text)

Path B: GPU With cuDNN Disabled (Optional Fallback)

import torch
from huggingface_hub import hf_hub_download
from nemo.collections.asr.models import EncDecRNNTBPEModel

MODEL_ID = "NbAiLab/nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim"
MODEL_FILE = "dgx-8gpu-eval4-20260404-1053.nemo"
AUDIO = "audio/audio.wav"

torch.backends.cudnn.enabled = False

nemo_file = hf_hub_download(repo_id=MODEL_ID, filename=MODEL_FILE)
asr_model = EncDecRNNTBPEModel.restore_from(restore_path=nemo_file, map_location="cuda:0")
results = asr_model.transcribe([AUDIO])

print(results[0].text)

Timestamping (Word + Segment)

NeMo supports timestamps for Parakeet models, including RNNT.

import nemo.collections.asr as nemo_asr
from huggingface_hub import hf_hub_download
from nemo.collections.asr.models import EncDecRNNTBPEModel

MODEL_ID = "NbAiLab/nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim"
MODEL_FILE = "dgx-8gpu-eval4-20260404-1053.nemo"
AUDIO = "audio/audio.wav"

nemo_file = hf_hub_download(repo_id=MODEL_ID, filename=MODEL_FILE)
asr_model = EncDecRNNTBPEModel.restore_from(restore_path=nemo_file)

# return_hypotheses=True gives direct access to timestamp metadata
hyp = asr_model.transcribe([AUDIO], timestamps=True, return_hypotheses=True)[0]

print("TEXT:", hyp.text)

time_stride = 8 * asr_model.cfg.preprocessor.window_stride

for w in hyp.timestamp.get("word", []):
    # Some models return seconds directly; others return frame offsets.
    if "start" in w and "end" in w:
        start = w["start"]
        end = w["end"]
    else:
        start = w.get("start_offset", 0.0) * time_stride
        end = w.get("end_offset", 0.0) * time_stride
    token = w.get("word", w.get("char", ""))
    print(f"{start} -> {end} : {token}")

Speaker Diarization + ASR Merge

Recommended diarization model

For new projects, prefer:

  • nvidia/diar_streaming_sortformer_4spk-v2.1

It is the newer NVIDIA diarizer and supports direct Python usage with NeMo. NVIDIA also provides nvidia/diar_sortformer_4spk-v1 as an alternative.

Important licensing note

  • nvidia/parakeet-rnnt-1.1b and nvidia/parakeet-tdt-0.6b-v3 are published under CC-BY-4.0.
  • nvidia/diar_sortformer_4spk-v1 is CC-BY-NC-4.0 (non-commercial).
  • nvidia/diar_streaming_sortformer_4spk-v2.1 is under NVIDIA Open Model License.

If you redistribute a combined workflow or bundle model artifacts, verify that your intended usage complies with each model's license.

License Compatibility

This repository is part of the open nb-asr-beta group. Licensing is fixed as follows:

  • This ASR model repo: CC-BY-4.0
  • Diarization companion used in this README: nvidia/diar_streaming_sortformer_4spk-v2.1 (NVIDIA Open Model License)
  • nvidia/diar_sortformer_4spk-v1 is available from NVIDIA under CC-BY-NC-4.0 (non-commercial)

What this means for users

  • This repo's model artifacts are distributed under CC-BY-4.0.
  • Diarization usage follows the diarizer model's own license terms.
  • diar_streaming_sortformer_4spk-v2.1 avoids the non-commercial restriction of diar_sortformer_4spk-v1, but users must follow NVIDIA Open Model License obligations.

Why license this repo as CC-BY-4.0

This checkpoint is derived from NVIDIA Parakeet models published as CC-BY-4.0. The nb-asr-beta decision is to keep this repository under CC-BY-4.0 and provide explicit attribution to NVIDIA and NbAiLab.

This keeps distribution terms straightforward for downstream users and avoids NC restrictions at the ASR-model level.

End-to-end example (ASR words + diarization segments)

from dataclasses import dataclass
from typing import List, Dict, Any

import nemo.collections.asr as nemo_asr
from nemo.collections.asr.models import SortformerEncLabelModel
from huggingface_hub import hf_hub_download
from nemo.collections.asr.models import EncDecRNNTBPEModel

ASR_MODEL_ID = "NbAiLab/nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim"
ASR_MODEL_FILE = "dgx-8gpu-eval4-20260404-1053.nemo"
DIAR_MODEL_ID = "nvidia/diar_streaming_sortformer_4spk-v2.1"
AUDIO = "audio/audio_all.wav"


@dataclass
class Segment:
    start: float
    end: float
    speaker: str


def to_float(x, default=0.0):
    try:
        return float(x)
    except Exception:
        return default


def parse_diar_segments(raw_segments: List[Any]) -> List[Segment]:
    out = []
    for s in raw_segments:
        # Covers common NeMo outputs:
        # - "start end speaker" strings
        # - tuple/list [start, end, speaker]
        # - object-style fields
        if isinstance(s, str):
            parts = s.strip().split()
            if len(parts) >= 3:
                start, end, speaker = parts[0], parts[1], parts[2]
            else:
                start, end, speaker = 0.0, 0.0, "speaker_unknown"
        elif isinstance(s, (tuple, list)) and len(s) >= 3:
            start, end, speaker = s[0], s[1], s[2]
        else:
            start = getattr(s, "start", 0.0)
            end = getattr(s, "end", 0.0)
            speaker = getattr(s, "speaker", "speaker_unknown")
        out.append(Segment(to_float(start), to_float(end), str(speaker)))
    return out


def overlap(a0, a1, b0, b1):
    return max(0.0, min(a1, b1) - max(a0, b0))


def attach_speakers(word_stamps: List[Dict[str, Any]], diar_segments: List[Segment]):
    enriched = []
    time_stride = 8 * asr_model.cfg.preprocessor.window_stride

    for w in word_stamps:
        if "start" in w and "end" in w:
            ws = to_float(w.get("start", 0.0))
            we = to_float(w.get("end", ws))
        else:
            ws = to_float(w.get("start_offset", 0.0)) * time_stride
            we = to_float(w.get("end_offset", 0.0)) * time_stride
        word = w.get("word", w.get("char", "")).strip()

        best_spk = "speaker_unknown"
        best_ov = -1.0
        for seg in diar_segments:
            ov = overlap(ws, we, seg.start, seg.end)
            if ov > best_ov:
                best_ov = ov
                best_spk = seg.speaker

        enriched.append({"start": ws, "end": we, "word": word, "speaker": best_spk})
    return enriched


# 1) ASR with timestamps
asr_nemo_file = hf_hub_download(repo_id=ASR_MODEL_ID, filename=ASR_MODEL_FILE)
asr_model = EncDecRNNTBPEModel.restore_from(restore_path=asr_nemo_file)
hyp = asr_model.transcribe([AUDIO], timestamps=True, return_hypotheses=True)[0]
word_stamps = hyp.timestamp.get("word", [])

# 2) Diarization
diar_model = SortformerEncLabelModel.from_pretrained(DIAR_MODEL_ID)
diar_model.eval()
raw = diar_model.diarize(audio=[AUDIO], batch_size=1)[0]
segments = parse_diar_segments(raw)

# 3) Merge words with speaker labels
speaker_words = attach_speakers(word_stamps, segments)

for row in speaker_words:
    print(f"[{row['start']:.2f}-{row['end']:.2f}] {row['speaker']}: {row['word']}")

Included Files

  • audio/audio.wav (single-utterance example file)
  • audio/audio2.mp3 and audio/audio3.mp3 (source clips)
  • audio/audio2.wav and audio/audio3.wav (wav16-mono conversions)
  • audio/audio_all.wav (concatenated wav16-mono file for diarization demos)
  • audio/aimilliarden.wav (long-form example file)
  • inference package file: dgx-8gpu-eval4-20260404-1053.nemo
  • run_demo.py (CLI demo for ASR + timestamps + optional diarization)

Included audio files used in the examples are sourced from https://huggingface.co/datasets/NbAiLab/NST and follow this repository's license.

Training-state checkpoint files (*.ckpt, optimizer/scheduler states, etc.) are not distributed in this HF repository.

One-Command Demo Script

Run ASR + timestamps with the included audio:

python run_demo.py --audio audio/audio.wav

Optional fallback (with the flag):

python run_demo.py --audio audio/audio.wav --disable-cudnn

Run ASR + timestamps + diarization and save JSON:

python run_demo.py \
  --audio audio/audio_all.wav \
  --disable-cudnn \
  --with-diarization \
  --output demo_output.json

Run long-form transcription example:

python run_demo.py \
  --audio audio/aimilliarden.wav \
  --disable-cudnn \
  --output demo_output_long.json

Run from local .nemo artifact (instead of HF pull):

python run_demo.py \
  --audio audio/audio_all.wav \
  --disable-cudnn \
  --asr-local-nemo dgx-8gpu-eval4-20260404-1053.nemo \
  --with-diarization

Readable output examples:

jq '.speaker_turns' demo_output.json
jq -r '.speaker_turns[] | "[\(.start)-\(.end)] \(.speaker): \(.text)"' demo_output.json

Intended Scope

This is a beta checkpoint for controlled NB-ASR evaluation and integration. It is not yet a final public production release.

Acknowledgements

This model is based on the NVIDIA NeMo Parakeet RNNT family and adapted by the NB-ASR project at the National Library.

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NbAiLab/nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim

Finetuned
(2)
this model

Collection including NbAiLab/nb-asr-beta-Parakeet-RNNT-XXL-1.1b-verbatim