The point is that if the limitations of current LLMs persist, regardless of how much better they get, this is not a problem at all, or at least not a new one.
Let's say you are given the declaration but not the implementation of a function with the following prototype:
const char * AskTheLLM(const char *prompt);
Putting this function in charge of anything, unless a restricted interface is provided so that it can't do much damage, is simply terrible engineering and not at all how anything is done. This is irrespective of whether the function is "aligned", "intelligent" or any number of other adjectives that are frankly not really useful to describe the behavior of software.
The same function prototype and lack of guarantees about the output is shared by a lot of other functions that are similarly very useful but cannot be given unrestricted access to your system for precisely the same reason. You wouldn't allow users to issue random commands on a root shell of your VM, you wouldn't let them run arbitrary SQL, you wouldn't exec() random code you found lying around, you wouldn't pipe any old string into execvpe().
It's not a new problem, and for all those who haven't learned their lesson yet: may Bobby Tables'mom pwn you for a hundred years.
> Let's say you are given the declaration but not the implementation of a function with the following prototype:
> const char * AskTheLLM(const char prompt);
> Putting this function in charge of anything, unless a restricted interface is provided so that it can't do much damage, is simply terrible engineering and not at all how anything is done.
Yes, but that's exactly how people* use any system that has an air of authority, unless they're being very careful to apply critical thinking and skepticism. It's why confidence scams and advertising work.
This is also at the heart of current "alignment" practices. The goal isn't so much to have a model that can't automate harm as it is to have one that won't provide authoritative-sounding but "bad" answers to people who might believe them. "Bad," of course, covers everything from dangerously incorrect to reputational embarrassments.
> The goal isn't so much to have a model that can't automate harm as it is to have one that won't provide authoritative-sounding but "bad" answers to people who might believe them.
We already know it will do this - which is part of why LLM output is banned on Stack Overflow.
None of the properties being argued about - intelligence, consciousness, volition etc. - are required for that outcome.
This argument has been addressed quite a bit by "AI safety" types. See e.g. https://en.wikipedia.org/wiki/AI_capability_control ; related: https://www.explainxkcd.com/wiki/index.php?title=1450:_AI-Bo... . The short version: people concerned about this sort of thing often also believe that an AI system (not necessarily just an LLM) could reach the point where, inevitably, the output from a run of this function would convince an engineer to break the "restricted interface". At a sufficient level of sophistication, it would only have to happen once. (If you say "just make sure nobody reads the output" - at that point, having the function is useless.)
I technically left myself some wiggle room, but to face the argument head on: that is begging the question more than a little bit. A "sufficiently advanced" system can be assumed to have any capability. Why? Because it's "sufficiently advanced". How would it get those capabilities? Just have a "sufficiently advanced" system build it. lol. lmao, even.
Let's say you are given the declaration but not the implementation of a function with the following prototype:
Putting this function in charge of anything, unless a restricted interface is provided so that it can't do much damage, is simply terrible engineering and not at all how anything is done. This is irrespective of whether the function is "aligned", "intelligent" or any number of other adjectives that are frankly not really useful to describe the behavior of software.The same function prototype and lack of guarantees about the output is shared by a lot of other functions that are similarly very useful but cannot be given unrestricted access to your system for precisely the same reason. You wouldn't allow users to issue random commands on a root shell of your VM, you wouldn't let them run arbitrary SQL, you wouldn't exec() random code you found lying around, you wouldn't pipe any old string into execvpe().
It's not a new problem, and for all those who haven't learned their lesson yet: may Bobby Tables'mom pwn you for a hundred years.