← Back to map

Turing test and the behavioural criterion

Alan M. Turing
EraFirst half of the 20th century · 1950
RegionEurope · United Kingdom
DisciplineComputing / AI

Explanation

The Turing Test was proposed by Alan Turing (1912-1954) in his seminal article Computing Machinery and Intelligence, published in the journal Mind in 1950. Turing, British mathematician, father of modern theoretical computing and a hero of the Enigma decryption during the Second World War, raised in this paper the question can machines think? and, considering the question ill-formulated, proposed to substitute it with an operational criterion he called the imitation game: if a machine can converse with a human interrogator in a way indistinguishable from a human, then we should grant it the category of thinking.

The original test was conceived as follows: the interrogator communicates by text (to eliminate physical clues) with two hidden interlocutors, one human and one a machine. Both try to convince the interrogator that they are human. If after a reasonable time the interrogator cannot distinguish them more than 50% of the time, the machine passes the test. Turing predicted that by the year 2000 there would be machines passing the test easily.

The test has virtues: it is operational (gives a concrete, non-speculative criterion); it is behavioural (does not require speculation about internal states); it is anthropocentric in a good sense (uses human conversation, language, as a benchmark). But it has also been heavily criticised. John Searle, in his famous Chinese Room argument (1980), argued that a machine could pass the test without really understanding anything: it could simply manipulate symbols according to rules, without genuine comprehension. What would matter is semantics (understanding), not syntax (formal manipulation).

Other critics point to limitations: the test is only about verbal abilities (does not consider bodily consciousness, emotions, creativity in non-linguistic domains); it is relatively easy to fool with conversational tricks; it does not distinguish between understanding and skilful simulation; it does not address phenomenal subjectivity (qualia). Defenders reply that Turing was proposing a pragmatic criterion, not a metaphysical definition of thought or consciousness.

In recent years, with the emergence of large language models (GPT, Claude, etc.), the test has returned to the centre of debate. Some hold that modern LLMs already pass reasonable versions of the Turing Test in conversation. Others argue that the test was always a low bar and that what is genuinely important —do LLMs understand? are they conscious? do they have intention?— is not answered by the test. Proposals for more sophisticated tests have appeared (Marcus and colleagues).

For the theory of consciousness, the Turing Test has been at once provocative and limiting. Provocative because it places the question of mind in verifiable, not merely speculative terms. Limiting because it tends to reduce mind to observable behaviour, leaving out crucial aspects (subjectivity, phenomenal consciousness). Today, with the growing sophistication of AI systems, the debate over whether machines can really think, understand, be conscious, is more urgent than ever and requires finer criteria than the original test. Turing himself probably understood this: his article is philosophically far richer than its popular legacy suggests. As a historical starting point and as a pivot for discussing what we mean by intelligence and artificial consciousness, the Turing Test remains an unavoidable reference.

Strengths

  • Operational reformulation that avoids sterile metaphysical discussions.
  • Philosophical foundation of AI and functionalism.
  • Replicable intersubjective criteria.
  • Anticipates and addresses classical objections.
  • Epistemological bridge with the problem of other minds.

Main critiques

  • Conflates behavioural simulation with phenomenal consciousness.
  • Centres evaluation on dialogue, ignoring embodied aspects.
  • Can give false positives (statistical systems without understanding).
  • Can give false negatives (non-verbal or alien consciousnesses).
  • Insufficient after advances in LLMs: the criterion needs refining.

Connections with other theories