lemuriandezapada's picture
Upload README.md with huggingface_hub
8b4957f verified
metadata
language:
  - en
  - ja
license: mit
pipeline_tag: automatic-speech-recognition
library_name: transformers
base_model: microsoft/VibeVoice-ASR
tags:
  - automatic-speech-recognition
  - audio-to-text
  - speech-recognition
  - vibevoice
  - qwen2
  - awq
  - int4

VibeVoice-ASR AWQ INT4

This repository contains a 4-bit AWQ quantized export of microsoft/VibeVoice-ASR.

Quantization

  • Method: AWQ
  • Bits: 4
  • Group size: 128
  • Logical parameter count: 8,674,021,857

Repository layout

This model is stored in a split VibeVoice layout:

  • root directory: VibeVoice audio and non-decoder weights
  • decoder-awq/: quantized Qwen2 decoder weights

Keep this layout intact when downloading or mirroring the repository.

Metadata

The root config.json includes:

  • vibevoice_metadata
  • vibevoice_decoder_model_path
  • vibevoice_decoder_quantization

These fields identify the split decoder path and preserve the logical source-model metadata.

Validation

This AWQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.

  • outputs remained valid JSON transcript arrays
  • output similarity to the full model remained high on tested samples

Serving note for vLLM 0.17.x

On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the faster Marlin-backed AWQ path.

  • prefer letting vLLM infer the backend from config.json
  • if you must set it explicitly, use awq_marlin rather than plain awq

In local testing on an RTX A6000, forcing plain awq was substantially slower than letting vLLM auto-select the Marlin kernel.

Upstream references

Notes

  • This is a quantized derivative export, not the original upstream checkpoint.
  • Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
  • Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under patches/vllm_0_17/.