Carnice Qwen3.6 MoE 35B-A3B — Hermes-Focused Agentic Model

QLoRA fine-tune of Qwen3.6-35B-A3B (MoE, 3B active parameters) optimized for agentic workflows and Hermes Agent runtime. Two-stage training adapted from kai-os/Carnice-9b.

This is the successor to Carnice-MoE-35B-A3B (based on Qwen3.5), retrained on the newer Qwen3.6 base which brings improved agentic coding, extended context (262K native, up to 1M with RoPE scaling), and native multimodal support.

Credits

Training methodology adapted from kai-os/Carnice-9b — same two-stage approach and datasets, applied to the larger MoE architecture. Key inspiration: training on actual Hermes Agent execution traces for native agentic behavior.

Available Formats

Format	Size	Location	Use Case
BF16 SafeTensors	67 GB	Root	Full precision, Transformers / vLLM
FP8 Dynamic	34 GB	`fp8/`	vLLM optimized, ~2x faster inference
GGUF	19-65 GB	GGUF repo	llama.cpp, Ollama, LM Studio

FP8 Usage (vLLM)

# Clone the repo and point vLLM to the fp8/ subfolder
vllm serve samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B --quantization fp8 --dtype auto

Model Details

Property	Value
Base Model	Qwen/Qwen3.6-35B-A3B
Architecture	Mixture of Experts (MoE)
Total Parameters	~35B
Active Parameters	~3B per token
Native Context Length	262,144 tokens
Thinking Modes	Thinking / Non-thinking (native Qwen3.6)

What Makes This Different

Unlike generic reasoning distillation, this model was trained on actual Hermes Agent execution traces — real conversations where an AI agent:

Executes terminal commands and processes output
Performs file editing operations
Chains multi-step tool calls with results feeding back
Uses browser-assisted workflows
Makes decisions based on environmental feedback

This teaches the model the exact conversation patterns Hermes expects, rather than just generic reasoning.

Training Details

Two-Stage Approach

Stage A — Reasoning Repair (1 epoch)

Strengthens base model reasoning before agent-specific training
Loss: 0.4281

Dataset	Examples
bespokelabs/Bespoke-Stratos-17k	16,710
AI-MO/NuminaMath-CoT	17,000 (capped)

Stage B — Hermes Traces (2 epochs)

Agent-specific behavioral training on real execution traces
Loss: 0.3045

Dataset	Examples
kai-os/carnice-glm5-hermes-traces	1,627 (high quality)
open-thoughts/OpenThoughts-Agent-v1-SFT	15,209

Training Configuration

Parameter	Stage A	Stage B
LoRA Rank	64	64
LoRA Alpha	64	64
LoRA Targets	q, k, v, o projections	q, k, v, o projections
Learning Rate	2e-5 (linear)	1e-5 (cosine)
Epochs	1	2
Effective Batch	12	12
Context Length	4096	4096
Precision	4-bit QLoRA + BF16 adapters	Same
GPU	RTX PRO 6000 Blackwell (98GB)	Same
Total Training Time	~55 hours (both stages)

Trainable Parameters

13,762,560 (0.04% of 35.1B total)

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B")

messages = [{"role": "user", "content": "Explain the Riemann hypothesis in simple terms."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM

vllm serve samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B --dtype auto --max-model-len 262144

llama.cpp

For llama.cpp usage, see the GGUF repo.

Acknowledgements

kai-os — Carnice training methodology and Hermes traces dataset
open-thoughts — Agent SFT dataset
bespokelabs — Bespoke-Stratos reasoning dataset
Unsloth — QLoRA training framework
Qwen — Base model

Downloads last month: -

Safetensors

Model size

36B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

(58)

this model

Quantizations

1 model

samuelcardillo
/

Carnice-Qwen3.6-MoE-35B-A3B