Ablation destroys the model
Hi,
i did just try it and it seems the ablation you did do destroyed the model...
on extremely many questions it starts to just put out random words (after about 500 tokens). i use it in instruct mode without thinking...
does this need post training after the ablation to correct it?
Hi,
i did just try it and it seems the ablation you did do destroyed the model...on extremely many questions it starts to just put out random words (after about 500 tokens). i use it in instruct mode without thinking...
does this need post training after the ablation to correct it?
Hum the model has been tested and rated on the UGI leaderboard and no such issues were mentioned by the tester:
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
And it couldn't be due to the KL divergence either because the KL divergence is only 0.0301.
However I did hear reports that the original official Qwen3.5 template causes variety of issues, so maybe that could be it?
Maybe try this one: https://huggingface.co/DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking/blob/main/chat_template.jinja
Download these and replace the original chat_template.jinja with either one of these two (rename it to chat_template.jinja if it's not already named that, and test again, might fix your issue(s).
mradermacher/Qwen3.5-27B-GGUF ---> Q6_K
mradermacher/Qwen3.5-27B-heretic-v2-GGUF --> Q6_K
i just replaced the files in the llama.cpp command .... the original does not fall into this problem, the abliterated falls into this problem
mradermacher version of original (works fine, but high censorship)
./llama.cpp/llama-server --model "models/Qwen3.5-27B-UD-Q6_K_XL.gguf" --mmproj "models/mmproj-F32.gguf" --alias "Qwen3.5 27B" --temp 0.7 --top-p 0.8 --min-p 0.00 --top-k 40 --port 16384 --host 0.0.0.0 --ctx-size 200000 --cache-type-k f16 --cache-type-v f16 --presence-penalty 2.0 --repeat-penalty 1.1 --jinja --no-context-shift --parallel 4 --cont-batching --chat-template-kwargs '{"enable_thinking":false}'
mradermacher version of your heretic-v2 (ends very very often ind random output tokens
./llama.cpp/llama-server --model "models/Qwen3.5-27B-heretic-v2.Q6_K.gguf" --mmproj "models/Qwen3.5-27B-heretic-v2.mmproj-f16.gguf" --alias "Qwen3.5 heretic v2 27B" --temp 0.7 --top-p 0.8 --min-p 0.00 --top-k 40 --port 16384 --host 0.0.0.0 --ctx-size 200000 --cache-type-k f16 --cache-type-v f16 --presence-penalty 2.0 --repeat-penalty 1.1 --jinja --no-context-shift --parallel 4 --cont-batching --chat-template-kwargs '{"enable_thinking":false}'
.... update ....
What i did now, is i re-downloaded the 22GB of weights (maybe something did go wrong during the download) but since i re-downloaded it i do not see it happening again...
Either it was the download or something else, but with the exact same test requests i always send to all models i am no longer able to reproduce it...
Sorry for the wrong reporting in this case....