Carnice Qwen3.6 MoE 35B-A3B — Hermes-Focused Agentic Model

QLoRA fine-tune of Qwen3.6-35B-A3B (MoE, 3B active parameters) optimized for agentic workflows and Hermes Agent runtime. Two-stage training adapted from kai-os/Carnice-9b.

This is the successor to Carnice-MoE-35B-A3B (based on Qwen3.5), retrained on the newer Qwen3.6 base which brings improved agentic coding, extended context (262K native, up to 1M with RoPE scaling), and native multimodal support.

Credits

Training methodology adapted from kai-os/Carnice-9b — same two-stage approach and datasets, applied to the larger MoE architecture. Key inspiration: training on actual Hermes Agent execution traces for native agentic behavior.

Available Formats

Format Size Location Use Case
BF16 SafeTensors 67 GB Root Full precision, Transformers / vLLM
FP8 Dynamic 34 GB fp8/ vLLM optimized, ~2x faster inference
GGUF 19-65 GB GGUF repo llama.cpp, Ollama, LM Studio

FP8 Usage (vLLM)

# Clone the repo and point vLLM to the fp8/ subfolder
vllm serve samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B --quantization fp8 --dtype auto

Model Details

Property Value
Base Model Qwen/Qwen3.6-35B-A3B
Architecture Mixture of Experts (MoE)
Total Parameters ~35B
Active Parameters ~3B per token
Native Context Length 262,144 tokens
Thinking Modes Thinking / Non-thinking (native Qwen3.6)

What Makes This Different

Unlike generic reasoning distillation, this model was trained on actual Hermes Agent execution traces — real conversations where an AI agent:

  • Executes terminal commands and processes output
  • Performs file editing operations
  • Chains multi-step tool calls with results feeding back
  • Uses browser-assisted workflows
  • Makes decisions based on environmental feedback

This teaches the model the exact conversation patterns Hermes expects, rather than just generic reasoning.

Training Details

Two-Stage Approach

Stage A — Reasoning Repair (1 epoch)

  • Strengthens base model reasoning before agent-specific training
  • Loss: 0.4281
Dataset Examples
bespokelabs/Bespoke-Stratos-17k 16,710
AI-MO/NuminaMath-CoT 17,000 (capped)

Stage B — Hermes Traces (2 epochs)

  • Agent-specific behavioral training on real execution traces
  • Loss: 0.3045
Dataset Examples
kai-os/carnice-glm5-hermes-traces 1,627 (high quality)
open-thoughts/OpenThoughts-Agent-v1-SFT 15,209

Training Configuration

Parameter Stage A Stage B
LoRA Rank 64 64
LoRA Alpha 64 64
LoRA Targets q, k, v, o projections q, k, v, o projections
Learning Rate 2e-5 (linear) 1e-5 (cosine)
Epochs 1 2
Effective Batch 12 12
Context Length 4096 4096
Precision 4-bit QLoRA + BF16 adapters Same
GPU RTX PRO 6000 Blackwell (98GB) Same
Total Training Time ~55 hours (both stages)

Trainable Parameters

13,762,560 (0.04% of 35.1B total)

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B")

messages = [{"role": "user", "content": "Explain the Riemann hypothesis in simple terms."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM

vllm serve samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B --dtype auto --max-model-len 262144

llama.cpp

For llama.cpp usage, see the GGUF repo.

Acknowledgements

  • kai-os — Carnice training methodology and Hermes traces dataset
  • open-thoughts — Agent SFT dataset
  • bespokelabs — Bespoke-Stratos reasoning dataset
  • Unsloth — QLoRA training framework
  • Qwen — Base model
Downloads last month
-
Safetensors
Model size
36B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B

Finetuned
(58)
this model
Quantizations
1 model

Datasets used to train samuelcardillo/Carnice-Qwen3.6-MoE-35B-A3B