Instructions to use prithivMLmods/Muscae-Qwen3-UI-Code-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Muscae-Qwen3-UI-Code-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Muscae-Qwen3-UI-Code-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Muscae-Qwen3-UI-Code-4B")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Muscae-Qwen3-UI-Code-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Muscae-Qwen3-UI-Code-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Muscae-Qwen3-UI-Code-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Muscae-Qwen3-UI-Code-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Muscae-Qwen3-UI-Code-4B

SGLang

How to use prithivMLmods/Muscae-Qwen3-UI-Code-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Muscae-Qwen3-UI-Code-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Muscae-Qwen3-UI-Code-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Muscae-Qwen3-UI-Code-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Muscae-Qwen3-UI-Code-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Muscae-Qwen3-UI-Code-4B with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Muscae-Qwen3-UI-Code-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Muscae-Qwen3-UI-Code-4B

Muscae-Qwen3-UI-Code-4B is a web-UI-focused model fine-tuned on UIGEN-T3-4B-Preview (built upon Qwen3-4B) for controlled Abliterated Reasoning and polished token probabilities, designed exclusively for experimental use. It excels at modern web UI coding tasks, structured component generation, and layout-aware reasoning, making it ideal for frontend developers, UI engineers, and research prototypes exploring structured code generation.

GGUF: https://huggingface.co/prithivMLmods/Muscae-Qwen3-UI-Code-4B-GGUF

Key Features

UI-Oriented Abliterated Reasoning Controlled reasoning precision tailored for frontend development and code generation, with polished token distributions ensuring structured, maintainable output.
Web UI Component Generation Excels at generating responsive components, semantic HTML, and Tailwind-based layouts with reasoning-aware structure and minimal boilerplate.
Layout-Aware Structured Logic Understands UI state flows, component hierarchies, and responsive design patterns, producing logically consistent, production-ready UI code.
Hybrid Reasoning for Code Combines symbolic reasoning with probabilistic inference to deliver optimized component logic, conditional rendering, and event-driven UI behavior.
Structured Output Mastery Natively outputs in HTML, React, Markdown, JSON, and YAML, making it ideal for UI prototyping, design systems, and documentation generation.
Optimized Lightweight Footprint With a 4B parameter size, it’s deployable on mid-range GPUs, offline workstations, or edge devices while retaining strong UI coding capabilities.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Muscae-Qwen3-UI-Code-4B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Generate a responsive landing page hero section with Tailwind and semantic HTML."

messages = [
    {"role": "system", "content": "You are a frontend coding assistant skilled in UI generation, semantic HTML, and component structuring."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use

Web UI coding and component generation
Responsive layout and frontend architecture prototyping
Semantic HTML, Tailwind, and React code generation
Research and experimental projects on structured code synthesis
Design-system-driven development workflows

Limitations

Experimental model – not optimized for production-critical deployments
Focused on UI coding – not suitable for general reasoning or creative writing
May produce inconsistent results with very long prompts or cross-framework tasks
Prioritizes structure and correctness over stylistic creativity or verbosity