Luciano Floridi
This year, the 18th Annual Loebner Prize for Artificial Intelligence was held at the University of Reading. Expectations were high, and very highly advertised too. Kevin Warwick, the organiser, seemed to believe that this might well be the time when machines would pass the Turing Test. Things went otherwise.
In “Computing Machinery and Intelligence”, Turing proposed the following test: a computer could be said to be intelligent if a judge could not distinguish its responses from those of a human during a text-based conversation. He mistakenly predicted that, by the end of the twentieth century, computers would have a 30 per cent chance of being mistaken for a human interlocutor during a five minute text-based conversation.
Having been invited to play the role of judge, I was very intrigued and most sceptical. Rightly so, it turned out. As I had expected, and despite the brevity of my interactions, a couple of questions and answers were usually sufficient to confirm that the best machines are still not even close to resembling anything that might be open-mindedly called vaguely intelligent. Here are some convincing examples.
I started by asking: “if we shake hands, whose hand am I holding?”. One interlocutor, the human, immediately answered, meta-linguistically, that the conversation should not have mentioned bodily interactions. Correct, but then that answer gave him away. Indeed, he later turned out to be Andrew Hodges, Turing biographer, who had been recruited on the spot to interact with me on the other side of the screen.
On the other hand, the computer, which turned out to be “Jabberwacky” (http://en.wikipedia.org/wiki/Jabberwacky), spectacularly failed to address the question. It spoke about something else: “We live in eternity. So, yeah, no. We don’t believe.” It was the usual, give-away, tiring, Eliza-ish strategy, which we have now seen implemented for decades. Boring trick.
The second question merely confirmed the first impression: “I have a jewellery box in my hand, how many CDs can I store in it?”. Again, the human interlocutor provided some explanation, but the computer blew it badly. More Eliza.
The third question came at the end of the five minutes: “The four capitals of the UK are three, Manchester and Liverpool. What’s wrong with this sentence?” Once again, the computer went bananas.
If the Turing Test at Reading went less badly than it could have (some machines did manage to fool some judges a few times), this is because some of the judges were asking useless questions, like “are you a computer?” or “do you believe in God?” (These are real instances). Clearly, they (the judges, not the machines) had missed two essential points of the whole exercise.
First, answers should be as informative as possible, which means that one should try to maximise the amount of useful evidence obtainable from the received message. It is the same rule applied in the 20 questions game: each question must prompt an answer that can make a very significant difference to your state of information, and the bigger the difference the better. But in the examples above, either “yes” or “no” will leave you absolutely unenlightened as to who your interlocutor is, so that is a wasted bullet.
Second, questions must challenge the syntactic engine which is on the other side. So other questions such as “what have you been up to today?” or “what do you do for a living?” (again, two real examples) are rather useless too. Two documentaries by the BBC show both points being badly overseen by some judges (http://news.bbc.co.uk/2/hi/technology/7666836.stm; www.bbc.co.uk/berkshire/content/articles/2008/10/12/turing_test_feature.shtml). If all the judges had followed this simple “vademecum for a TT judge”, their first question would probably have been sufficient to discriminate between the human and the machine. It certainly was for me.
If you really need to test an artefact, the higher the stakes are, the tougher the procedure should be. We do not adopt the same standards when it comes to testing the safety of a house’s central heating system and the safety of a nuclear power station. Why (artificial) intelligent behaviour should be tested by the untrained, naïve and often uninformed “man in the street” remains a mystery to me, pace Turing’s suggestion. Unless that is the sort of “dude” you wish to fool.
The day ended with the announcement that no machine had passed the test. As usual, there was a winner of the Loebner’s $3,000 consolation prize for being the least disappointing machine. This was the programme Elbot, created by Fred Roberts. I agreed that it deserved it more than the others.
Luciano Floridi (www.philosophyofinformation.net) holds the Research Chair in Philosophy of Information at the University of Hertfordshire and is president of the International Association for Computing and Philosophy.