Ablation destroys the model

by snapo - opened Mar 16

Mar 16

Hi,
i did just try it and it seems the ablation you did do destroyed the model...

on extremely many questions it starts to just put out random words (after about 500 tokens). i use it in instruct mode without thinking...

does this need post training after the ablation to correct it?

llmfan46

Owner Mar 16

•

edited Mar 17

Hi,
i did just try it and it seems the ablation you did do destroyed the model...

on extremely many questions it starts to just put out random words (after about 500 tokens). i use it in instruct mode without thinking...

does this need post training after the ablation to correct it?

Hum the model has been tested and rated on the UGI leaderboard and no such issues were mentioned by the tester:

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

And it couldn't be due to the KL divergence either because the KL divergence is only 0.0301.

However I did hear reports that the original official Qwen3.5 template causes variety of issues, so maybe that could be it?

Maybe try this one: https://huggingface.co/DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking/blob/main/chat_template.jinja

Or this: https://huggingface.co/win10/Huihui-Qwen3.5-27B-abliterated-FP8/blob/main/chat_template_qwen35_mm_interleaved_thinking.jinja

Download these and replace the original chat_template.jinja with either one of these two (rename it to chat_template.jinja if it's not already named that, and test again, might fix your issue(s).

snapo

Mar 17

•

edited Mar 17

mradermacher/Qwen3.5-27B-GGUF ---> Q6_K
mradermacher/Qwen3.5-27B-heretic-v2-GGUF --> Q6_K

i just replaced the files in the llama.cpp command .... the original does not fall into this problem, the abliterated falls into this problem

mradermacher version of original (works fine, but high censorship)
./llama.cpp/llama-server --model "models/Qwen3.5-27B-UD-Q6_K_XL.gguf" --mmproj "models/mmproj-F32.gguf" --alias "Qwen3.5 27B" --temp 0.7 --top-p 0.8 --min-p 0.00 --top-k 40 --port 16384 --host 0.0.0.0 --ctx-size 200000 --cache-type-k f16 --cache-type-v f16 --presence-penalty 2.0 --repeat-penalty 1.1 --jinja --no-context-shift --parallel 4 --cont-batching --chat-template-kwargs '{"enable_thinking":false}'

mradermacher version of your heretic-v2 (ends very very often ind random output tokens
./llama.cpp/llama-server --model "models/Qwen3.5-27B-heretic-v2.Q6_K.gguf" --mmproj "models/Qwen3.5-27B-heretic-v2.mmproj-f16.gguf" --alias "Qwen3.5 heretic v2 27B" --temp 0.7 --top-p 0.8 --min-p 0.00 --top-k 40 --port 16384 --host 0.0.0.0 --ctx-size 200000 --cache-type-k f16 --cache-type-v f16 --presence-penalty 2.0 --repeat-penalty 1.1 --jinja --no-context-shift --parallel 4 --cont-batching --chat-template-kwargs '{"enable_thinking":false}'

.... update ....
What i did now, is i re-downloaded the 22GB of weights (maybe something did go wrong during the download) but since i re-downloaded it i do not see it happening again...
Either it was the download or something else, but with the exact same test requests i always send to all models i am no longer able to reproduce it...

Sorry for the wrong reporting in this case....

llmfan46 changed discussion status to closed 20 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment