Hacker News new | past | comments | ask | show | jobs | submit login

I like this, but think there is some crucial motivation missing in steps 10.1-10.3 regarding what query/key weights are and why they're needed.



They are like "continuous" databases. See slides 4-5 here [1] - this is from a talk I had given a while ago.

[1] https://drive.google.com/file/d/12uHo9QIfS-jBpVTs3lmQ3BEpxhD...


this post made sense to me https://teltam.github.io/posts/soft-dictionary-keys.html

It helps to think of kqv as a form of look up.


yes, same issue in all transformer tutorials


I suspect this is because most people (including people writing these tutorials) don't have a strong grasp on this piece as well.


The 2b1b video was the first to make it click for me


You mean 3b1b (three blue one brown)?


Ah that's right, miscounted the blues




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: