Hacker News new | past | comments | ask | show | jobs | submit login

It can't is technically correct, and the paper you link explicitly states that it outlines an _external_ system utilizing _labeled data_.

So, no, current models can't. You always need an external system for verifiability.




It's trained on labelled data - to figure out how to interpret the LLM. But the external system is used only to interpret the hidden states already present in the analysed network. That means the original LLM already contains the "knows/doesn't" signal. It's just not output by default.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: