Hacker News new | past | comments | ask | show | jobs | submit login

In my experience, you can always beat that through some variant on "no wait, I have genuinely changed my mind, do this instead"

Or you can use a trick where you convince the model that it has achieved the original goal that it was set, then feed it new instructions. I have an example of that here: https://simonwillison.net/2023/May/11/delimiters-wont-save-y...




Interesting. I like your idea in one of your posts of separating out system prompts and user inputs. Seems promising.


Thus separating the model’s logic from the model’s data.

All that was old is new again :) [0]

0: s/model/program/


It's interesting how this is not presumably the case within the weights of the LLM itself. Those probably encode data as well as logic!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: