It's trained on labelled data - to figure out how to interpret the LLM. But the external system is used only to interpret the hidden states already present in the analysed network. That means the original LLM already contains the "knows/doesn't" signal. It's just not output by default.
So, no, current models can't. You always need an external system for verifiability.