As the author of the original post above, let me say that if that's word salad, it's a Michelin star salad. Just the right mix of lettuce and tomato, and the dressing is spot on :-)
Seriously, though, differentiable hash tables is an awesome way to look at them, I wish I'd heard it before.
Well FWIW I have done an implementation of Llama LLM (3 8-70b to be specific) in non-python so I do sort of know what I'm talking about.
I'm not the originator of the hash table analogy. I got it from here:
https://www.youtube.com/watch?v=iDulhoQ2pro
Hell, the vectors that are generated are called K/V, so. Yeah it's a hash table.
And the idea that first order logic and other facets of intelligence can just be cleverly arranged lookup tables comes from here:
https://en.wikipedia.org/wiki/Fluid_Concepts_and_Creative_An...