Actually doesn't surprise me. Floating point always struck me as a strange repre...

Actually doesn't surprise me.

Floating point always struck me as a strange representation for language. If we zoomed down on just one variable does it have some set of meanings like

https://vinaire.me/2019/07/17/scn-8-8008-the-emotional-scale...

which are on some kind of gradient more-or-less but end up with special meanings associated with particular ranges? I can picture carefully designed neural circuits that could decode such a variable and how you'd build a network that's specifically designed to do so, but it's not intuitive that neural networks would learn to have a structure like that. (e.g. I can believe a scale from "good" to "bad" but not there being a large number of specific meanings at different values)

If you think about it that way you'd think some kind of binary network could be highly effective, that doesn't seem to be the case, but it seems neural networks don't really use more than about 4 bits worth of precision internally.

These "unlearning" systems aren't really removing the "engram" of the memory in the network but they are rather learning a new behavior to suppress certain outputs. (It's not too different from the problem of incrementally adding new knowledge to the network, except that what it is learning in phase 2 is quite different from general learning) If you didn't want to really screw a network up you can imagine adding a new behavior by adding another bit of precision. The network keeps its old behavior at low precision but at higher precision the network makes distinctions that are important to the "(un)learned" behavior.