metadata
language:
- en
- ja
license: mit
pipeline_tag: automatic-speech-recognition
library_name: transformers
base_model: microsoft/VibeVoice-ASR
tags:
- automatic-speech-recognition
- audio-to-text
- speech-recognition
- vibevoice
- qwen2
- awq
- int4
VibeVoice-ASR AWQ INT4
This repository contains a 4-bit AWQ quantized export of microsoft/VibeVoice-ASR.
Quantization
- Method: AWQ
- Bits: 4
- Group size: 128
- Logical parameter count: 8,674,021,857
Repository layout
This model is stored in a split VibeVoice layout:
- root directory: VibeVoice audio and non-decoder weights
decoder-awq/: quantized Qwen2 decoder weights
Keep this layout intact when downloading or mirroring the repository.
Metadata
The root config.json includes:
vibevoice_metadatavibevoice_decoder_model_pathvibevoice_decoder_quantization
These fields identify the split decoder path and preserve the logical source-model metadata.
Validation
This AWQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.
- outputs remained valid JSON transcript arrays
- output similarity to the full model remained high on tested samples
Serving note for vLLM 0.17.x
On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the faster Marlin-backed AWQ path.
- prefer letting vLLM infer the backend from
config.json - if you must set it explicitly, use
awq_marlinrather than plainawq
In local testing on an RTX A6000, forcing plain awq was substantially slower than letting vLLM auto-select the Marlin kernel.
Upstream references
- Code: https://github.com/microsoft/VibeVoice
- Base model: https://huggingface.co/microsoft/VibeVoice-ASR
- Report: https://arxiv.org/pdf/2601.18184
Notes
- This is a quantized derivative export, not the original upstream checkpoint.
- Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
- Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under
patches/vllm_0_17/.