> maybe even feel inspired to make your own model by hand as well!
Other then a learning exercise to satisfy your curiosity what are you doing with this? I'm starting to get the feeling that anything complex with ml models is unreasonable for a at home blog reader?
In nanoGPT, you pre-train a model on Shakespeare, and in 3 minutes it gets to a Lewis Carroll's Jabberwocky level of fidelity on the source material. It makes up lots of plausible-seeming old English words, learns the basics of English grammar, the layout of the plays, etc. I was pretty amazed that it got that good in such a short period of time.
I think locally training a bunch of models to the fidelity of Shakespeare-from-Wish.com might tell you when you've hit on a winning architecture, and when to try scaling up.
Author states in the first paragraph of their blog post:
"I've been wanting to understand transformers and attention better for awhile now—I'd read The Illustrated Transformer, but still didn't feel like I had an intuitive understanding of what the various pieces of attention were doing. What's the difference between q and k? And don't even get me started on v!"
Other then a learning exercise to satisfy your curiosity what are you doing with this? I'm starting to get the feeling that anything complex with ml models is unreasonable for a at home blog reader?