This discussion (the GP and your response) perhaps suggests that a way to evaluate the intelligence of an AI may need to be more than the generation of some content, but also citations and supporting work for that content. I guess I'm suggesting that the field could benefit from a shift towards explainability-first models.