tsadoq's comments

tsadoq · 2025-01-27T20:02:41 1738008161

Not necessarily true, one quick pass might be needed but quite not as devastating as it might seem

https://huggingface.co/blog/mlabonne/abliteration#%E2%9A%96%...

tsadoq · 2025-01-27T18:43:09 1738003389

That's a wonderful repo that I used as my starting point! The main problem with that one is that it supports only models that are on transformerlenses and unfortunately they are not a lot...

tsadoq · 2025-01-27T17:33:59 1737999239

The other link is quite good, i also suggest this for some practical application

https://huggingface.co/blog/leonardlin/chinese-llm-censorshi...

tsadoq · 2025-01-27T17:28:01 1737998881

please give feedbacks! It's quite a raw first implementation and would be very nice to have suggestions and improvements.

tsadoq · 2025-01-27T17:27:25 1737998845

> Do these techniques train models while performing the modifications?

Depend on what you mean by training, they change the weights.

> Do these techniques train models while performing the modifications?

I'm not sure I understand, but there is an example of performing an obliteration on gemma to make it never refuse an answer. It's about 10 lines of code.

nico · 2025-01-27T18:39:01 1738003141

> > Do these techniques train models while performing the modifications?

> Depend on what you mean by training, they change the weights.

What I wonder: is there a separate model, not the LLM, that gets trained only on how to modify LLMs?

I imagine a model that could learn something like: “if I remove this whole network here, then the LLM runs 50% faster, but drops 30% in accuracy for certain topics”, or “if I add these connections, the LLM will now be able to solve more complex mathematical problems”

So a model that is not an LLM, but is trained on how to modify them for certain goals

Is that how this tool works?

tsadoq · 2025-01-27T17:25:45 1737998745

as someone that studied mainly acient greek and latin in high school, I tend to have quite a limited pool of inspiration for naming what I build haha.

weeksie · 2025-01-28T00:53:06 1738025586

Check out Robert Anton Wilson (The Illuminatus Trilogy), you're in for a treat -- the references above were to Discordianism

* https://en.wikipedia.org/wiki/The_Illuminatus!_Trilogy * https://en.wikipedia.org/wiki/Principia_Discordia

shemtay · 2025-01-27T18:47:55 1738003675

Is the apple in the logo splashing into "wine dark sea"?

tsadoq · 2025-01-28T10:06:23 1738058783

L’alleato was the name given by Eris to the Golden Apple of Discord.

tsadoq · 2025-01-27T17:24:50 1737998690

planning to update it to be able to run on it. It's just a matter of finding the keys in the layer dict of the model.

therealpygon · 2025-01-28T00:02:51 1738022571

Would be nice to get it to output its guardrails/system prompt to see what specific instructions it was given regarding refusals.

CamperBob2 · 2025-01-28T00:33:06 1738024386

Isn't DeepSeek open source?

therealpygon · 2025-01-28T13:59:09 1738072749

While the weights are open source, and there is a paper about methodology, the information I mentioned is considered proprietary therefore DeepSeek refuses any requests to provide it.

CamperBob2 · 2025-01-28T18:08:12 1738087692

Given the weights, though, can't we use any system prompt we like? I only have a vague notion of how these constraints are actually applied.