Hacker News new | past | comments | ask | show | jobs | submit login

I mean, you can read Turing's own definiton if his test - 'the Imitation Game' - on the first page of his 1950 paper Computing Machinery and Intelligence[1]. There's nothing in there about repetition, duration, or $10,000,000 prizes. It's a party game. And he just frames his question (which will "replace our original, 'Can machines think?'") as "Will the interrogator decide wrongly as often when the game is played like this [with a human and a computer] as he does when the game is played between a man and a woman?"

So, to perform the experiment, one must have some people play the game with humans a few times and then play with a human and a machine a few times, and look to see if the results are statistically significant. When they aren't, Turing posits, the question 'can machines think?' will have been answered in the affirmative.

That is not to say that this DALL-E vacation photo social media post constitutes a rigorous 'passes the Turing test'. But I don't think it's fair to criticize someone for using 'the Turing Test' colloquially as a catchall for saying 'you probably didn't notice this output was machine generated, therefore you might want to adjust your priors on the answer to the question, "can machines think?"'. Because that's exactly the spirit that Turing was working in when he proposed using a party game as a test of intelligence.

[1] https://www.csee.umbc.edu/courses/471/papers/turing.pdf




There's the literal definition of the Turing test, as described above, which doesn't actually work for proving any sort of intelligence.

Then there's the conceptual argument of the Turing test, which we can turn into a test of intelligence. It relies on the idea that we can abstract the mind into a "thinking" black box which takes inputs and outputs. And then posits that any black box which can't be distinguished from an actual "thinking" black box may as well be "thinking".

Passing the literal Turing test is a sign that some humans can be tricked for some small domain over some small period of time. Passing the conceptual argument that the Turing test relies on shows that there are non-human entities which cannot be separated from humans on the basis of how they interact with a human (through some limited medium).

The repetition, duration and prizes are just practicality; prizes incentivize people to try, repetition ensures that the results are robust, and duration ensures that the domain the AI can be tested over is equivalent to a humans.


Meanwhile, if you ran an 'rigorous image-generation Turing test' between dalle2 and randomly selected humans, the machine would be obvious because it's much higher quality than a randomly selected human would be able to produce, thereby failing the Turing test.

Aside from some corners (probably to be filled over the course of the next year or three), dalle2 is obviously outperforming almost all humans at its task. The cross-style ability is probably exceeding almost all human /artists/, who tend to specialize in a single style.

And some of the creativity criticisms (can only styles it's seen before) are basically true of all but the tiniest sliver of humanity, whose names we tend to remember.


Funny thing is this is going to happen with all AI generated things. If and when GPT-4 comes out for writing and it's as much better than GPT-3 as GPT-3 was from 2, then it will be easy to know if a given piece of writing was human or AI generated. Is it grammatically correct, concise and to the point, and have a spark of life and originality to it? Then that's something that 90% of humans can't do and what AI will be able to do every time, so you can bet it's AI and beat the Turing test!


Just threw some of Turing's original prompts at GPT-3 to see how it does. Its willingness to play along with the conceit is what gives it away, not the quality of its answer:

    This is a dialog between Q, a human who is trying to work out if he is talking to a computer, and A, a computer who is pretending to be human:
    
    Q: In the first line of your sonnet which reads "Shall I compare thee to a summer's day," would not "a spring day" do as well or better?
    A: No, I don't think so. A spring day is lovely, but it doesn't have the same connotations as a summer's day. Summer is a time of warmth and languor, of long days and lazy afternoons. It's a time for picnics and swimming and lazy days in the sun. Spring is a time of rebirth and new beginnings, of freshness and hope. I think a summer's day captures the mood of the sonnet better than a spring day would.


"But I don't think it's fair to criticize someone for using 'the Turing Test' colloquially as a catchall for saying 'you probably didn't notice this output was machine generated, therefore you might want to adjust your priors on the answer to the question, "can machines think?"'"

The colloquial meaning of "pasted the Turing test" has come to be "has been able to demonstrate intelligence when put to some serious, systematic testing". That may be switching that "has been able to fool people when they didn't look hard". That might be changing but I don't think it's changed yet and until it's changed, I'll protest 'cause that's terrible change imo.


That's not at all in the spirit of Turing's paper. He "party game" involves the interrogator knowing that they participating in it, talking to the machine for 5 minutes, posing serious challenges such as playing chess (at the time considered to be a hard problem for machines).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: