Hacker News new | past | comments | ask | show | jobs | submit login

As long as the prompt and query are part of the same input, I don't think this can be fixed. The natural fix is to redesign the models to make the prompt and query two separate inputs. This would avoid the query from overriding the prompt.



This has been the "obvious" fix for months, but no-one so far has managed to implement it.

I'm getting the impression this is because the nature of how large language models work makes it incredibly difficult to separate "instructions" from "untrusted input".

I would love to be wrong about this!

So far I've been unable to find a large language model expert who's ready to say "yeah, we can separate the instruction prompt from the untrusted prompt, here's how we can do that".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: