Gigapixel scan of test takers retinas will be trivial in this fantasy world. Not that anyone wants to be in a cheating arms race, but it's been the case for centuries. The more easily accessible the rote information is (i.e. what the language models are decent at), the less important it will be on the test. Different scoring weights for different types of questions and more novel questions that don't appear in the corpus of study material.
While I broadly agree with your core argument, I would disagree that LLMs are good at rote learning — LLMs can do a mediocre job of that after a huge training run, while a simple index search is much easier, much more compact, and much faster.