Instructions to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True)
model = AutoModel.from_pretrained("ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance-Seed/Stable-DiffCoder-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct

SGLang

How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance-Seed/Stable-DiffCoder-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance-Seed/Stable-DiffCoder-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct
```

Stable-DiffCoder-8B-Instruct / README.md

Facico

Update to adapt transformers v5.3.0 (#5)

bc14582 2 months ago

preview code

raw

history blame contribute delete

9.39 kB

	---
	license: mit
	base_model:
	- ByteDance-Seed/Stable-DiffCoder-8B-Base
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Stable-DiffCoder-8B-Instruct

	<div align="left" style="line-height: 1;">
	<a href="https://bytedance-seed.github.io/Stable-DiffCoder/" target="_blank" style="margin: 2px;">
	<img alt="Homepage" src="https://img.shields.io/badge/Stable--DiffCoder-Homepage-a468fe?color=a468fe&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>

	<a href="https://arxiv.org/abs/2601.15892" target="_blank" style="margin: 2px;">
	<img alt="Technical Report" src="https://img.shields.io/badge/arXiv-Technical%20Report-brightgreen?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>

	<a href="https://huggingface.co/ByteDance-Seed" target="_blank" style="margin: 2px;">
	<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ByteDance%20Seed-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>

	<a href="https://github.com/ByteDance-Seed/Stable-DiffCoder/blob/master/LICENSE" style="margin: 2px;">
	<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?color=f5de53&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>


	## Introduction
	We are thrilled to introduce Stable-DiffCoder, which is a strong code diffusion large language model. Built directly on the Seed-Coder architecture, data, and training pipeline, it introduces a block diffusion continual pretraining (CPT) stage with a tailored warmup and block-wise clipped noise schedule.

	Under identical architecture and data settings, we systematically analyze and design an efficient diffusion training pipeline that is not only stable but also potentially lifts the model’s performance ceiling. With this recipe, Stable-DiffCoder demonstrates overall performance improvements compared to its autoregressive (AR) counterpart across a broad set of code benchmarks, while any-order modeling improves structured code handling for editing and reasoning, and diffusion-based corruption aids learning for low-resource programming languages.

	Notably, with only CPT followed by supervised fine-tuning, Stable-DiffCoder further surpasses many strong ∼8B AR and diffusion-based code models. These results demonstrate that diffusion-based training can improve code modeling quality beyond what AR training alone can achieve, even under tightly controlled data and architecture constraints.

	<p align="center">
	<img width="100%" src="imgs/intro_performance.png">
	</p>

	This repo contains the Stable-DiffCoder-8B-Instruct model, which has the following features:
	- Type: Mask Diffusion Language Models
	- Training Stage: Pretraining & Post-training
	- Data Source: Public datasets, synthetic data
	- Context Length: 8192


	## Model Downloads
	\| Model Name \| Length \| Download \| Notes \|
	\|---------------------------------------------------------\|--------\|------------------------------------\|-----------------------\|
	\| Stable-DiffCoder-8B-Base \| 8K \| 🤗 [Model](https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Base) \| Pretrained on our model-centric code data. \|
	\| 👉 Stable-DiffCoder-8B-Instruct \| 8K \| 🤗 [Model](https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct) \| Instruction-tuned for alignment with user intent. \|

	## Requirements
	Current (v5.3.0) `transformers` is available for inference:
	```bash
	pip install transformers~=5.3.0
	```
	## Explanation of Inference Parameters
	- `steps`: Number of steps for diffusion generation
	- `gen_length`: Maximum length of the generated output
	- `block_length`: Length of the diffusion block, with a default value of 4
	- `temperature`: Temperature for generation, with a default value of 0.0
	- `remasking`: Remasking strategy, optional values are 'low_confidence' or 'random', default value is 'low_confidence' (for principle, refer to [LLADA](https://github.com/ML-GSAI/LLaDA))
	- `tokenizer`: Tokenizer used for text encoding and decoding
	- `shift`: Whether to shift the output to the right by one position (similar to AutoRegressive/AR), default value is False
	- `threshold`: Threshold for decoding (range: 0-1.0), default value is None; a smaller value results in faster decoding speed (for principle, refer to [Fast-DLLM](https://github.com/NVlabs/Fast-dLLM))
	- `eos_id`: ID of the end-of-sequence token, default value is `tokenizer.eos_token_id`

	## Quickstart

	Here is a simple example demonstrating how to load the model and generate code.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	device = 'cuda'
	model = AutoModelForCausalLM.from_pretrained('ByteDance-Seed/Stable-DiffCoder-8B-Instruct', trust_remote_code=True, torch_dtype=torch.bfloat16).to(device).eval()
	tokenizer = AutoTokenizer.from_pretrained('ByteDance-Seed/Stable-DiffCoder-8B-Instruct', trust_remote_code=True)

	prompt = 'Write a quick sort algorithm.'
	m = [{"role": "user", "content": prompt}, ]
	prompt = tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=False)
	input_ids = tokenizer(prompt)['input_ids']
	input_ids = torch.tensor(input_ids).to(device).unsqueeze(0)

	out = model.generate(input_ids, steps=512, gen_length=512, block_length=4, temperature=0., remasking='low_confidence', tokenizer=tokenizer, shift=False, threshold=None, eos_id=tokenizer.eos_token_id)
	print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))
	```

	## Evaluation

	Stable-DiffCoder-8B-Instruct has been evaluated on a wide range of coding tasks, including code generation, code reasoning, code editing, achieving stronger performance than
	a wide range of ∼8B ARs and DLLMs,

	- Compared with ∼8B AR models：

	\| Model \| HumanEval \| MBPP \| MHPP \| BigCodeBench (Full) \| BigCodeBench (Hard) \| LiveCodeBench (v5) \|
	\|:-----------------------------:\|:---------:\|:----:\|:----:\|:-------------------:\|:-------------------:\|:-------------------------:\|
	\| CodeLlama-7B-Instruct \| 40.9 \| 54.0 \| 6.7 \| 25.7 \| 4.1 \| 3.6 \|
	\| DeepSeek-Coder-6.7B-Instruct \| 74.4 \| 74.9 \| 20.0 \| 43.8 \| 15.5 \| 9.6 \|
	\| CodeQwen1.5-7B-Chat \| 83.5 \| 77.7 \| 17.6 \| 43.6 \| 15.5 \| 3.0 \|
	\| Yi-Coder-9B-Chat \| 82.3 \| 82.0 \| 26.7 \| 49.0 \| 17.6 \| 17.5 \|
	\| Llama-3.1-8B-Instruct \| 68.3 \| 70.1 \| 17.1 \| 40.5 \| 13.5 \| 11.5 \|
	\| OpenCoder-8B-Instruct \| 83.5 \| 79.1 \| 30.5 \| 50.9 \| 18.9 \| 17.1 \|
	\| Qwen2.5-Coder-7B-Instruct \| 88.4 \| 83.5 \| 26.7 \| 48.8 \| 20.3 \| 17.3 \|
	\| Qwen3-8B \| 84.8 \| 77.0 \| 32.8 \| 51.7 \| 23.0 \| 23.5 \|
	\| Seed-Coder-8B-Instruct \| 84.8 \| 85.2 \| 36.2 \| 53.3 \| 26.4 \| 24.7 \|
	\| Stable-DiffCoder-8B-Instruct \| 86.6 \| 85.7 \| 42.4 \| 54.8 \| 31.8 \| 23.5 \|

	- Compared with ∼8B DLLM models：

	\| Model \| HumanEval \| HumanEval+\| MBPP \| MBPP+\| BigCodeBench (Full) \|
	\|:-----------------------------:\|:---------:\|:---------:\|:----:\|:----:\|:-------------------:\|
	\| LLaDA-8B-Instruct \| 49.4 \| - \| 41.0 \| - \| 16.5 \|
	\| Dream-7B-Instruct \| 63.4 \| - \| 68.3 \| - \| 10.6 \|
	\| LLaDA-MoE-7B-Instruct \| 61.6 \| - \| 70.0 \| - \| 20.4 \|
	\| Fast-dLLMv2 \| 43.9 \| 40.2 \| 50.0 \| 41.3 \| 49.0 \|
	\| DiffuCoder-7B-Instruct \| 72.0 \| 65.2 \| 75.1 \| 61.9 \| 35.7 \|
	\| Dream-Coder-7B-Instruct \| 82.9 \| - \| 79.6 \| - \| 37.1 \|
	\| SDAR-8B-Chat \| 78.7 \| - \| 72.0 \| - \| - \|
	\| WeDLM-8B-Chat \| 80.5 \| 73.8 \| 70.5 \| - \| - \|
	\| Stable-DiffCoder-8B-Instruct \| 86.6 \| 82.3 \|85.7\|72.8\| 54.8 \|

	For detailed benchmark performance, please refer to our [📑 Technical Report](https://github.com/ByteDance-Seed/Stable-DiffCoder/blob/master/Stable_DiffCoder.pdf).

	## License

	This project is licensed under the MIT License. See the [LICENSE file](https://github.com/ByteDance-Seed/Stable-DiffCoder/blob/master/LICENSE) for details.

	## Citation

	If you find our work helpful, feel free to give us a cite.

	```
	@misc{fan2026stablediffcoderpushingfrontiercode,
	title={Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model},
	author={Chenghao Fan and Wen Heng and Bo Li and Sichen Liu and Yuxuan Song and Jing Su and Xiaoye Qu and Kai Shen and Wei Wei},
	year={2026},
	eprint={2601.15892},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2601.15892},
	}
	```