Hacker News new | past | comments | ask | show | jobs | submit login

It's discouraging that an LLM can accurately recall a book. That is, in a sense, overfitting. The LLM is supposed to be much smaller than the training set, having in some sense abstracted the training inputs.

Did they try this on obscure bible excerpts, or just ones likely to be well known and quoted elsewhere? Well known quotes would be reinforced by all the copies.




> Did they try this on obscure bible excerpts, or just ones likely to be well known and quoted elsewhere?

the article contains examples of both


The bible is probably in enough different training sets (not just in whole, various papers making some religious argument that quote a few verses to make their point) that the model should have most of the bible.


Does GPT now query in real-time? If so, it should be able to reproduce anything searchable verbatim. It just needs to determine when verbatim quoting is appropriate given the prompt.


Some services may overlay this functionality (e.g. Bing), but in the article I'm making direct LLM calls without any external function calling.


This is actually a good point.

Are reciting and not-overfitting at odds?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: