Hacker News new | past | comments | ask | show | jobs | submit login

> maybe even feel inspired to make your own model by hand as well!

Other then a learning exercise to satisfy your curiosity what are you doing with this? I'm starting to get the feeling that anything complex with ml models is unreasonable for a at home blog reader?




In nanoGPT, you pre-train a model on Shakespeare, and in 3 minutes it gets to a Lewis Carroll's Jabberwocky level of fidelity on the source material. It makes up lots of plausible-seeming old English words, learns the basics of English grammar, the layout of the plays, etc. I was pretty amazed that it got that good in such a short period of time.

I think locally training a bunch of models to the fidelity of Shakespeare-from-Wish.com might tell you when you've hit on a winning architecture, and when to try scaling up.


Author states in the first paragraph of their blog post:

"I've been wanting to understand transformers and attention better for awhile now—I'd read The Illustrated Transformer, but still didn't feel like I had an intuitive understanding of what the various pieces of attention were doing. What's the difference between q and k? And don't even get me started on v!"


lol three people said the same thing, i get that. thats why i said other than learning and satisfying curiosity...


It's an excellent learning exercise, not just to satisfy curiosity but to develop and deepen understanding.


I dunno, maybe they actually enjoy hacking on projects like this? Weird I know.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: