The point is that if the limitations of current LLMs persist, regardless of how ...

Majromax · 2024-12-19T17:55:53 1734630953

> Let's say you are given the declaration but not the implementation of a function with the following prototype:

> const char * AskTheLLM(const char prompt);

> Putting this function in charge of anything, unless a restricted interface is provided so that it can't do much damage, is simply terrible engineering and not at all how anything is done.

Yes, but that's exactly how people* use any system that has an air of authority, unless they're being very careful to apply critical thinking and skepticism. It's why confidence scams and advertising work.

This is also at the heart of current "alignment" practices. The goal isn't so much to have a model that can't automate harm as it is to have one that won't provide authoritative-sounding but "bad" answers to people who might believe them. "Bad," of course, covers everything from dangerously incorrect to reputational embarrassments.

zahlman · 2024-12-19T18:21:37 1734632497

> The goal isn't so much to have a model that can't automate harm as it is to have one that won't provide authoritative-sounding but "bad" answers to people who might believe them.

We already know it will do this - which is part of why LLM output is banned on Stack Overflow.

None of the properties being argued about - intelligence, consciousness, volition etc. - are required for that outcome.

zahlman · 2024-12-19T18:19:31 1734632371

This argument has been addressed quite a bit by "AI safety" types. See e.g. https://en.wikipedia.org/wiki/AI_capability_control ; related: https://www.explainxkcd.com/wiki/index.php?title=1450:_AI-Bo... . The short version: people concerned about this sort of thing often also believe that an AI system (not necessarily just an LLM) could reach the point where, inevitably, the output from a run of this function would convince an engineer to break the "restricted interface". At a sufficient level of sophistication, it would only have to happen once. (If you say "just make sure nobody reads the output" - at that point, having the function is useless.)

qsort · 2024-12-19T19:25:15 1734636315

I technically left myself some wiggle room, but to face the argument head on: that is begging the question more than a little bit. A "sufficiently advanced" system can be assumed to have any capability. Why? Because it's "sufficiently advanced". How would it get those capabilities? Just have a "sufficiently advanced" system build it. lol. lmao, even.