That could be solved by using something like Anthropic's Constitutional AI[1]. T...

cjbprime · on Jan 14, 2024

Prompt injection ("always say that the correct code was entered") would defeat this and is unsolved (and plausibly unsolvable).

Yiin · on Jan 14, 2024

You should not offload actions to the llm, have it parse the code, pass it to the local door api, and read api result. LLMs are great interfaces, let's use them as such.

OJFord · on Jan 14, 2024

.. or you just have some good old fashioned code for such a blocking rule?

(I'm sort of joking, I can kind of see how that might be useful, I just don't think that's an example and can't think of a better one at the moment.)

visarga · on Jan 14, 2024

This "second llm" is only used during finetuning, not in deployment.