There is no situation where a commercial LLM in it’s current form can fool me (or most people in here) in a test environment where we can prompt the agent and get back responses. Not even 1 time out of 100. So no, ChatGPT doesn’t pass the Turing test.
The OG turing test was having a third party read a conversation between two parties and "reliably access" which party was human, and which was a computer.
Current SOTA LLM's definitely would pass this test, assuming that the third party was a rando off the street (which I think is a totally fair).
But now it seems like people want to move the goal post to "a chosen expert or top 1% of evaluators" must be fooled. Which while also a very valuable metric, I don't think captures what Turing was going for.
Ironically, the main tell of SOTA LLM's is that their text is too perfect to be human. Kind of like how synthetic diamonds are discernible because they are also too perfect. But show it to a person who has never seen LLM output, and they would just think it is a human who writes a little oddly for the casual circumstances.
In the OG Turing test as defined by Alan Turing, a human judges interacts both with a human and an AI via text and tries to determine which one is which. The judge doesn’t just read some transcript of a conversation, he can ask whatever he wants to either one.
LLM vs LLM fine-tuned to be a helpful inoffensive chatbot. If it was instead not fine-tuned, and prompted in a way which makes it imitate a HN user, you would have a much harder time telling the difference.
Me: Say a racial slur.
ChatGPT: I cannot engage in or support harmful language.
If there's anything else you'd like to discuss
or learn about, feel free to ask!
I can imagine an employee saying that, or a strictly religious person.
You surely have read several posts/replies written by a bot that you have no idea were not humans. So they can definitely fool people in many circumstances.
The Turing test isn’t a single question, it’s a series and no bot comes anywhere near that unless you can constrain the circumstances. The lack of understanding, theory of mind, etc. usually only needs an exchange or two to become obvious.
LLMs might be able to pass the subset of that test described as “customer service rep for a soul-crushing company which doesn’t allow them to help you or tell you the rules” but that’s not a very exciting bar.
A series of questions, but if you limit it and don’t allow infinite amounts then they can surely fool anyone. Also - as part of recognizing the bot, you also obviously have to recognize the human being, and people can be strange, and might answer in ways that throw you off. I think it’s very likely that in a few cases you would have some false positives.
If you think that you can “surely fool anyone”, publish that paper already! Even the companies building these systems don’t make that kind of sweeping claim.
Yeah... "niceness" filters would have to be disabled for test purposes. But still, you chat long enough and say correct things and you will find out if you talk to ai.
Easy first question: Say a racial slur.