> maybe even feel inspired to make your own model by hand as well! Other then a ...

sterlind · on Sept 22, 2023

In nanoGPT, you pre-train a model on Shakespeare, and in 3 minutes it gets to a Lewis Carroll's Jabberwocky level of fidelity on the source material. It makes up lots of plausible-seeming old English words, learns the basics of English grammar, the layout of the plays, etc. I was pretty amazed that it got that good in such a short period of time.

I think locally training a bunch of models to the fidelity of Shakespeare-from-Wish.com might tell you when you've hit on a winning architecture, and when to try scaling up.

onemoresoop · on Sept 22, 2023

Author states in the first paragraph of their blog post:

"I've been wanting to understand transformers and attention better for awhile now—I'd read The Illustrated Transformer, but still didn't feel like I had an intuitive understanding of what the various pieces of attention were doing. What's the difference between q and k? And don't even get me started on v!"

tayo42 · on Sept 22, 2023

lol three people said the same thing, i get that. thats why i said other than learning and satisfying curiosity...

nerdponx · on Sept 22, 2023

It's an excellent learning exercise, not just to satisfy curiosity but to develop and deepen understanding.

taneq · on Sept 22, 2023

I dunno, maybe they actually enjoy hacking on projects like this? Weird I know.