Upload README.md with huggingface_hub

8b4957f verified about 1 month ago

2.08 kB

language:
  - en
  - ja
license: mit
pipeline_tag: automatic-speech-recognition
library_name: transformers
base_model: microsoft/VibeVoice-ASR
tags:
  - automatic-speech-recognition
  - audio-to-text
  - speech-recognition
  - vibevoice
  - qwen2
  - awq
  - int4

VibeVoice-ASR AWQ INT4

This repository contains a 4-bit AWQ quantized export of microsoft/VibeVoice-ASR.

Quantization

Method: AWQ
Bits: 4
Group size: 128
Logical parameter count: 8,674,021,857

Repository layout

This model is stored in a split VibeVoice layout:

root directory: VibeVoice audio and non-decoder weights
decoder-awq/: quantized Qwen2 decoder weights

Keep this layout intact when downloading or mirroring the repository.

Metadata

The root config.json includes:

vibevoice_metadata
vibevoice_decoder_model_path
vibevoice_decoder_quantization

These fields identify the split decoder path and preserve the logical source-model metadata.

Validation

This AWQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.

outputs remained valid JSON transcript arrays
output similarity to the full model remained high on tested samples

Serving note for vLLM 0.17.x

On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the faster Marlin-backed AWQ path.

prefer letting vLLM infer the backend from config.json
if you must set it explicitly, use awq_marlin rather than plain awq

In local testing on an RTX A6000, forcing plain awq was substantially slower than letting vLLM auto-select the Marlin kernel.

Upstream references

Code: https://github.com/microsoft/VibeVoice
Base model: https://huggingface.co/microsoft/VibeVoice-ASR
Report: https://arxiv.org/pdf/2601.18184

Notes

This is a quantized derivative export, not the original upstream checkpoint.
Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under patches/vllm_0_17/.