It's trash. It has a serious drop in score compared to the base model.But it was my first RL learning and it was monumental.
minpeter/calculator-agent-qwen3-0.6b: Accuracy: 15.19% (24/158)minpeter/Qwen3-0.6B-Instruct: Accuracy: 27.22% (43/158)
Chat template
Files info
Base model