It's discouraging that an LLM can accurately recall a book. That is, in a sense, overfitting. The LLM is supposed to be much smaller than the training set, having in some sense abstracted the training inputs.
Did they try this on obscure bible excerpts, or just ones likely to be well known and quoted elsewhere? Well known quotes would be reinforced by all the copies.
The bible is probably in enough different training sets (not just in whole, various papers making some religious argument that quote a few verses to make their point) that the model should have most of the bible.
Does GPT now query in real-time? If so, it should be able to reproduce anything searchable verbatim. It just needs to determine when verbatim quoting is appropriate given the prompt.
Did they try this on obscure bible excerpts, or just ones likely to be well known and quoted elsewhere? Well known quotes would be reinforced by all the copies.