Given it has probably "hardcoded" a lot of questions by the usage of "finetuning...

Given it has probably "hardcoded" a lot of questions by the usage of "finetuning" via RL, probably the latter statement is true. Also it has one way only to understand your query. If that way is wrong: welcome synonyms replaced by the tokenizer, welcome hallucinations by raising the temperature. Or you can introduce "context" (8000 tokens) or retrain.