Hacker News new | past | comments | ask | show | jobs | submit login

> Let's say you are given the declaration but not the implementation of a function with the following prototype:

> const char * AskTheLLM(const char prompt);

> Putting this function in charge of anything, unless a restricted interface is provided so that it can't do much damage, is simply terrible engineering and not at all how anything is done.

Yes, but that's exactly how people* use any system that has an air of authority, unless they're being very careful to apply critical thinking and skepticism. It's why confidence scams and advertising work.

This is also at the heart of current "alignment" practices. The goal isn't so much to have a model that can't automate harm as it is to have one that won't provide authoritative-sounding but "bad" answers to people who might believe them. "Bad," of course, covers everything from dangerously incorrect to reputational embarrassments.




> The goal isn't so much to have a model that can't automate harm as it is to have one that won't provide authoritative-sounding but "bad" answers to people who might believe them.

We already know it will do this - which is part of why LLM output is banned on Stack Overflow.

None of the properties being argued about - intelligence, consciousness, volition etc. - are required for that outcome.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: