Parakeet TDT (0.6B v3) β€” ExecuTorch Exports

This repository contains ExecuTorch .pte exports of the Hugging Face ASR model nvidia/parakeet-tdt-0.6b-v3 for:

  • XNNPACK (CPU): fp32 and INT4 weights (8da4w)
  • Metal (macOS): bf16

Each export directory contains:

  • model.pte β€” ExecuTorch program
  • tokenizer.model β€” SentencePiece tokenizer (needed by the runner)

Artifacts

Variant Backend Precision / Quantization Path
xnnpack-fp32 XNNPACK fp32 xnnpack/fp32/model.pte
xnnpack-int4 XNNPACK 8da4w (int8 dynamic activations + int4 weights), group size 32 xnnpack/int4/model.pte
metal-bf16 Metal (MPS) bf16 metal/bf16/model.pte

Tokenizers:

  • xnnpack/fp32/tokenizer.model
  • xnnpack/int4/tokenizer.model
  • metal/bf16/tokenizer.model

How to run (C++ runner)

Build the Parakeet runner from an ExecuTorch checkout:

./install_executorch.sh
make parakeet-cpu
make parakeet-metal  # macOS only

Then run from the ExecuTorch repo root:

CPU / XNNPACK

./cmake-out/examples/models/parakeet/parakeet_runner \
  --model_path /path/to/model.pte \
  --audio_path /path/to/audio.wav \
  --tokenizer_path /path/to/tokenizer.model

Metal (macOS)

DYLD_LIBRARY_PATH=/usr/lib ./cmake-out/examples/models/parakeet/parakeet_runner \
  --model_path /path/to/model.pte \
  --audio_path /path/to/audio.wav \
  --tokenizer_path /path/to/tokenizer.model

Export recipe (ExecuTorch)

Exports were produced with examples/models/parakeet/export_parakeet_tdt.py from ExecuTorch commit efe4f0cce36c4b95ac70eac969c0d55163f23c78 using:

  • PyTorch 2.11.0.dev20251222
  • TORCHAO_FORCE_SKIP_LOADING_SO_FILES=1 (export-only workaround for torchao C++ extension loading)

Commands:

# XNNPACK fp32
TORCHAO_FORCE_SKIP_LOADING_SO_FILES=1 python examples/models/parakeet/export_parakeet_tdt.py \
  --backend xnnpack \
  --output-dir ./parakeet_xnnpack_fp32

# XNNPACK int4 (8da4w)
TORCHAO_FORCE_SKIP_LOADING_SO_FILES=1 python examples/models/parakeet/export_parakeet_tdt.py \
  --backend xnnpack \
  --qlinear_encoder 8da4w --qlinear_encoder_group_size 32 \
  --qlinear 8da4w --qlinear_group_size 32 \
  --output-dir ./parakeet_xnnpack_int4

# Metal bf16
TORCHAO_FORCE_SKIP_LOADING_SO_FILES=1 python examples/models/parakeet/export_parakeet_tdt.py \
  --backend metal --dtype bf16 \
  --output-dir ./parakeet_metal_bf16

Verification

All variants were verified to transcribe the same test audio correctly (and match eager PyTorch output for the XNNPACK exports).

  • Audio sample: hf-internal-testing/librispeech_asr_dummy (clean, validation[0])
  • Reference transcript (dataset):
    • MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL
  • Model output:
    • mister Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.

License / Attribution

  • Upstream model license: CC-BY-4.0 (see nvidia/parakeet-tdt-0.6b-v3).
  • Please attribute NVIDIA when using these exported artifacts.
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for larryliu0820/parakeet-tdt-0.6b-v3-executorch

Finetuned
(35)
this model