That's a wonderful repo that I used as my starting point! The main problem with that one is that it supports only models that are on transformerlenses and unfortunately they are not a lot...
> Do these techniques train models while performing the modifications?
Depend on what you mean by training, they change the weights.
> Do these techniques train models while performing the modifications?
I'm not sure I understand, but there is an example of performing an obliteration on gemma to make it never refuse an answer. It's about 10 lines of code.
> > Do these techniques train models while performing the modifications?
> Depend on what you mean by training, they change the weights.
What I wonder: is there a separate model, not the LLM, that gets trained only on how to modify LLMs?
I imagine a model that could learn something like: “if I remove this whole network here, then the LLM runs 50% faster, but drops 30% in accuracy for certain topics”, or “if I add these connections, the LLM will now be able to solve more complex mathematical problems”
So a model that is not an LLM, but is trained on how to modify them for certain goals
While the weights are open source, and there is a paper about methodology, the information I mentioned is considered proprietary therefore DeepSeek refuses any requests to provide it.
https://huggingface.co/blog/mlabonne/abliteration#%E2%9A%96%...