=Paper= {{Paper |id=Vol-3007/2020-paper-2 |storemode=property |title=The Multimodal Turing Test for Realistic Humanoid Robots with Embodied Artificial Intelligence |pdfUrl=https://ceur-ws.org/Vol-3007/2020-paper-2.pdf |volume=Vol-3007 |authors=Carl Strathearn,Minhua Ma |dblpUrl=https://dblp.org/rec/conf/lifelike/StrathearnM21 }} ==The Multimodal Turing Test for Realistic Humanoid Robots with Embodied Artificial Intelligence== https://ceur-ws.org/Vol-3007/2020-paper-2.pdf
        The Multimodal Turing Test for Realistic Humanoid Robots with
                      Embodied Artificial Intelligence
                                                   Carl Strathearn 1 and Minhua Ma 2
   1 School of Computing and Digital Technologies, Staffordshire University, UK. Carl.Strathearn@research.staffs.ac.uk
                                     2 Provost, Falmouth University, UK. M.Ma@falmouth.ac.uk




Abstract                                                                         Goostman passed the 30% benchmark of the Turing
                                                                                 Test in 2014, scoring a marginal 33%, at the Royal
Alan Turing developed the Turing Test as a method to                             Society AI competition in 2014. However,
determine whether artificial intelligence (AI) can                               commentators such as Copeland (2014), Hern (2014)
deceive human interrogators into believing it is sentient                        and Robbins (2014) contest the validity of this
by competently answering questions at a confidence                               achievement, stating two significant flaws in the
rate of 30%+. However, the Turing Test is concerned                              evaluation procedure. Firstly, human interrogators had
with natural language processing (NLP) and neglects                              prior knowledge that the AI system emulated a 13-year
the significance of appearance, communication and                                old Ukrainian boy. This approach dissolves the
movement. The theoretical proposition at the core of                             integrity of the Turing Test, which states the removal
this paper: ‘can machines emulate human beings?’ is                              of all identifiers is vital in maintaining impartiality
concerned with both functionality and materiality.                               (Turing, 1950). Secondly, the creators of the Eugene
Many scholars consider the creation of a realistic                               Goostman chatbot hand-selected the human
humanoid robot (RHR) that is perceptually                                        interrogators for the test, significantly increasing the
indistinguishable from a human as the apex of                                    probability for participant bias. Sample & Hern (2014)
humanity’s technological capabilities. Nevertheless, no                          argue that claiming the Eugene Goostman chatbot
comprehensive development framework exists for                                   passed the Turing Test is fundamentally absurd as
engineers to achieve higher modes of human                                       Turing’s prediction that in 50 years conversational AI
emulation, and no current evaluation method is                                   could pass as a human was merely hypothetical, akin to
nuanced enough to detect the causal effects of the                               a statistical survey or Gallup poll. Turing’s acumen is
Uncanny Valley (UV) effect. The Multimodal Turing                                a methodology to explain how the human mind
Test (MTT) provides such a methodology and offers a                              functions by developing a computer capable of
foundation for creating higher levels of human likeness                          proximal behaviour and intelligence, which includes
in RHRs for enhancing human-robot interaction (HRI)                              verbal processing and sensorimotor/robotic dimensions
Key Words: Turing Test, Humanoid Robots, Artificial                              in which AI is systematically grounded (Sample &
Intelligence, Embodied Artificial Intelligence, HRI                              Hern, 2014).

1. Introduction                                                                  In consideration, Harnad (2000) argues that the Turing
                                                                                 Test is not a measure of how an AI system operates
The      Turing    Test     hypothetically     evaluated
                                                                                 over five minutes; it is the system’s ability to simulate
computational AI using typesetters and pre-written
                                                                                 the human mind over a lifetime. According to Gehl
scripture to emulate human thought (Turing, 1950).
                                                                                 (2013), a similar text-based chatbot named Cleverbot
However, modern conversational AI systems function
                                                                                 claimed to pass the Turing Test in 2011 at the Technie
with greater accuracy at a higher rate of processing than
                                                                                 festival in India, four years before the Eugene
the analogue methods outlined in Turing’s paper.
                                                                                 Goostman chatbot. However, Cleverbot did not receive
Landgrebe & Smith (2019) explain that unlike the
                                                                                 the media coverage and scholarly attention of the
original Turing Test, the updated Turing Test for AI
                                                                                 Eugene Goostman program due to numerous
utilises two computer interfaces to replace the type-
                                                                                 irregularities in the results.
setter methodology. One computer system implements
a conversational AI application and the other controlled                         Aron (2011), Jacquet et al. (2019) and Mann (2014)
by a human agent concealed from the view of the                                  argue that although Cleverbot claimed to exceed the
human interrogator. The role of the human interrogator                           30% benchmark of the Turing Test scoring an
is to evaluate the authenticity and accuracy of the                              exceptional 59.3%, human interrogators rated human
agent’s responses to determine which system is                                   agents as AI at an even higher rate of 63.3%. Thus,
artificial and which is human. There are accounts of AI                          significant discrepancies in the results indicate
systems which claim to have passed the Turing Test.                              fundamental flaws in the evaluation procedure and
For example, Warwick & Shah (2015) and Aamoth                                    recruitment process.
(2014), advocate that a chatbot program named Eugene




        Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
However, Landgrebe & Smith (2019), Jacquet et al.           However, Turing applies constraints to the Turing Test
(2019) and Pereira (2019) argue that although               to establish equilibrium between the agents. Firstly,
numerous chatbot systems claim to pass the Turing           Turing narrows the scope of interaction between the
Test. The modernised tests are weak variations of           human interrogator and the human/machine agents to a
Turing’s original proposition, which are not                single topic of conversation, to prevent the human
representative of Turing’s hypothesis and therefore do      interrogator asking questions outside of the scope of the
not qualify as certified passes. Fawaz (2019) and           AI system’s capabilities which may allude to the
Wakefield (2019) explain that creating chatbots to pass     artificiality of the system. Similarly, Turing restricts
the Turing Test is a developer’s past-time as there is no   the human interrogator’s ability to propose
serious scientific research in developing AI to pass the    mathematical inquiries to the agents as machine’s are
Turing Test. In support, Sharkey (2012) suggest that as     capable of correctly answering complex equations
Turing is long deceased, clarifying the terms and           consistently, unlike humans.
conditions of passing the Turing Test is impossible.
                                                            Secondly, Turing imposes a 15-30 second time delay
In RHR design, Mori’s (1970) UV accounts for the            between the responses of the human interrogator as
negative psychological stimulus propagated by RHRs          machine agents require time to formulate and respond
upon observation, as the more human-like artificial         to questions, unlike the human mind to which
humans appear, the greater the potential for humans to      responses are immediate. Thirdly, Turing limits the
feel repulsed by their appearance. However, per             time-scale of the evaluation to 5 minutes to prevent the
Burleigh (2013), there are considerable arguments           machine agent producing incorrect or repetitive
against the scientific value of the UV theorem, as many     responses as the longer the interaction, the higher the
scholars regard it as purely academic. Thus, the UV         potentiality for error. However, Turing considers the
like the Turing Test remains a controversial topic in AI    physical emulation of the human being as a distraction
and robotics.                                               from the pursuit of intelligent machine’s (Turing, 1950,
                                                            p.2). Although Turing is correct in stating that the
2. The Turing Test                                          appearance of a machine is not indicative of its
                                                            intellectual capabilities, he neglects the capacity of the
Alan Turing (1950) formulated the Turing Test to
                                                            human body in tactile learning, socialisation and non-
determine if a machine agent could mislead human
                                                            verbal communication which are vital processes in
interrogators into believing answers provided by a
                                                            social learning and communication.
computer are those of a human. If the machine
convinces 30%+ of human interrogators into thinking         2.1 Arguments and Limitations of the Turing Test
it is sentient, the system passes the test and the higher
this percentage, the more humanistic the AI functions.      In Searl’s experiment, a human agent sat in the middle
Turing argues that if a machine agent is capable of         of a room is passed a series of random Chinese symbols
exhibiting human behaviour indistinguishable to that of     from under a door. The agent uses an instruction
a human, then the artificial mind functions in a manner     manual to arrange the symbols to form coherent
akin to the human mind (it can think). However, Turing      sentences. After a while, the agent becomes efficient in
questions a machine’s ability to think as ‘thinking’ is     arranging the symbols into sentences and no longer
problematic to define and thus proposes the Turing Test     requires the instruction manual. The instruction manual
as a methodology to explore this concept.                   is removed and interrogators who are fluent in written
                                                            Chinese observe the agent arrange the symbols into
Turing supposes that if a machine agent replaced either     sentences and state whether they think the agent is
of the male or female agents in the imitation game and      literate in Chinese or not.
could operate with a level of intelligence proximal to
the responses of a human, then it would replace his         In the experiment, the interrogators agree that the agent
original hypothesis ‘can machines think?’ (Turing,          is fluent in Chinese to form coherent sentences using
1950).In the Turing Test, the objective of the human        the symbols. However, the agent only understands the
interrogator is to identify which agent is AI and human     order of the symbols and not their meaning and
by posing a series of questions to evaluate the             therefore, lacks the vital process of comprehension.
authenticity of the responses to differentiate between      Thus, the perception of the interrogators in Searl’s
the AI agent and human agent. It is the agent’s role to     experiment is critical in understanding how humans
deceive the human interrogator into believing that they     interpret the appearance of intelligent behaviour as in
are the opposite agent by providing type-written            real-life conditions; there are no visual distinctions
answers that simulate the responses of the other.           between functional intelligence and comprehension,
                                                            visualised in Fig.1
                                                            This concept is significant in HRI and HCI as it
                                                            considers how humans interface and interact with
                                                            technologies that simulate human intelligence,
                                                            personalities and behaviour. Rapaport (2000) argues
                                                            that the Turing Test is limited in its scope of evaluation
                                                            as it only considers HCI via NLP. Stock-Homburg et
                                                            al. (2020) describe the Handshake Turing Test (HTT)
                                                            and similarly, Karniel et al. (2010) the Turing
                                                            Handshake Test (THT) as tests to determine if human
    Fig. 1: Visual representation of Searls Argument        interrogators can identify the differences between a
                                                            human and RHR by the act of a handshake (tactile
Similarly, Cole (2019) and Warwick & Shah (2015)
                                                            HRI). Moreover, this approach neglects the emulation
argue that the Turing Test is susceptible to human          of appearance, communication, AI and movement by
interference by a fundamental design flaw which
                                                            focusing on secondary aspects such as touch and
inverts the human perception of the nature of
                                                            temperature. Ishiguro (2005) developed the Total
computing by remaining silent. Ghose (2016) explains        Turing Test (TTT) for RHRs in HRI, formulated on
that if an AI system does not answer questions when
                                                            Harnad’s (1992) TTT for human-computer interaction
prompted, the human interrogators cannot distinguish
                                                            and Harnad’s (2000) Robot Turing Test (RTT) to
between the silence of the AI and human responses;          comprehensively evaluate the appearance, behaviour
hence, the AI agent would pass as human by default.
                                                            and movement of RHRs against a human counterpart.
Thus, it is the expectancy for a computer system to
                                                            Ishiguro’s (2005) TTT implements point of view
respond to the actions of a human operator. If a            (POV) cameras mounted on the heads of the human and
computer does not perform tasks in a manner                 RHR agents. The agents conduct logistical tasks, and it
accustomed in HCI, this processual irregularity has the
                                                            is the role of the human interrogator to discern which
potentiality to influence human perception of the nature
                                                            agent is human and RHR from observation. Secondly,
of the agent (Reynolds, 2016). In consideration, Hern       the human interrogator observes live ‘full body’ video
(2019) and Landgrebe & Smith (2019), suggest that
                                                            streams of the agents for two seconds and decides
silence during the Turing Test is not uncommon and
                                                            which agent is human and RHR. Kasaki et al. (2016)
typically the result of poor programming.                   cite 70% of subjects identified the movements of RHRs
                                                            as human. Ishiguro argues that the Turing Test
However, stricter policies regarding the time limit of
                                                            evaluates the intellectual capabilities of a computer on
agent responses are crucial in maintaining the integrity    the assumption that the human mind is divisible from
of the Turing Test to irradicate purposeful exploitation
                                                            the body.
of this loophole. Whitby (1996), argues that AI
developer’s and scholars have long misinterpreted the       Thus, the TTT evaluates embodied artificial
purpose of the Turing Test as Alan Turing designed the      intelligence (EAI) by combining intelligent behaviour
‘Imitation Game’ as a game and not a formal test.           with a robotic body for assessing the human likeness of
Whitby argues that Alan Turing never intended the           robotic behaviour, appearance and movement.
imitation game as an evaluation of machine                  However, the TTT is susceptible to design flaws;
intelligence, but rather as a thought experiment for        Firstly, live video footage is inaccessible. Secondly,
assessing a machine’s capacity to portray the               Marzano & Novembre (2017) argue that the 2-second
behaviours of a human authentically. Whitby suggests        evaluation window is too limited. Thirdly, according to
that Turing’s paper is not an operational guide for AI,     Schweizer (1998) & Bringsjord et al. (2000), the TTT
but a theoretical treatise to examine the sociological      is not a comprehensive approach as it neglects the
and scientific value of creating machine’s which can        evaluation of NLP to robotic mouth articulation during
mislead human beings into believing they are human.         HRI. Fourthly, Oppy (2003) stipulates that judging the
However, Whitby explains that simulating human              authenticity of intelligent behaviour by manipulating
personalities and emotion in AI is damaging as these        objects is not indicative of a machine’s intellectual
attributes tend to be misleading rather than progress the   capacity. In consideration, Schweizer (1998) created
intellectual capacity of AI. Thus, the practical value of   the Truly Total Turing Test (TTTT) to remove
Turing’s hypothesis is not in creating machine’s with       telepresence from the TTT and evaluate automated
intelligence proximal to humans known as artificial         RHR’s with EAI. However, the TTTT lacks vital
general intelligence (AGI), but in emulating the            processes such as physical examination, movement,
conditions of the human mind and behaviours using           appearance, materiality, EAI and communication when
computers.                                                  operating as one robotic system.
3. The Multimodal Turing Test

Per the findings of the literature review, current
evaluation methods used to determine degrees of
human likeness in RHRs in HRI and HCI, such as The
Turing Test, TTT, TTTT, RTT, THT and HTT are too
limited in their scope of evaluation as they neglect the
significance of amalgamating; communication (speech
and gesturing), movement, vision, aesthetics and
conversational AI into a single system, which is not
representative of the human condition. In
consideration, this study lays the foundations of a
comprehensive theoretical evaluation methodology
named the Multimodal Turing Test (MTT) to
determine if RHRs can attain a level of emulation
perceptually indivisible from a human being, (Houser,      Fig. 3: The Hierarchy of MMT: Level 1 (Appearance), Level 2.
2019). As cited in a recent article in the Guardian UK,    (Appearance & Movement), Level 3 (Appearance, Movement &
                                                           Communication)      Level  4.   (Appearance,    Movement,
the MTT is more holistic than the original Turing Test,    Communication & AI).
and previous evaluation methods in HRI by evaluating
an RHRs appearance, communication, movement and            However, replicating the appearance and materiality of
AI (Mathieson, 2019), shown in Fig. 2.                     a human is more straightforward than simulating
                                                           human movement due to the complexity of natural
                                                           kinetic variance. Therefore, per Baudrillard’s (1994)
                                                           order of simulacra, appearance forms the bedrock of
                                                           the hierarchy of human emulation because it is the
                                                           elementary form of simulation. Aesthetical appearance
                                                           envelops a body to which movement is applied, as
                                                           natural movement is more complicated to replicate than
                                                           a still model; kinetics forms the second level of
                                                           emulation. For speech to be a useful communication
                                                           tool in RHRs, requires both an authentic appearance
                                                           and naturalistic. AI is the apex of human emulation as
                                                           the human mind is the most challenging element to
                                                           simulate authentically due to its complexity. However,
      Fig. 2: The four evaluation modes of the MTT         for AI to be a useful tool in RHRs, the emulated mind
                                                           requires a human-like body and a method of
The MTT incorporates the examination structure of the
                                                           communication for naturalistic HRI.
1950 Turing Test by employing human interrogators to
evaluate the perceptual authenticity of RHRs.              The four evaluation categories of the hierarchy of
However, unlike the binary pass / fail system of the       human emulation formulate a unified whole, which
original Turing Test, the MTT provides engineers,          constitutes an RHR that can emulate (to degrees of
designers and programmers with a developmental             likeness) a living human being, as reviewed in an
framework to benchmark progress up to and in advance       article by Khatib (2019) which outlines the scope of the
of Turing’s 30% pass rate (Strathearn, 2019). Each         MTT. Furthermore, the MTT is an approach towards
stage of the MTT increases in complexity, which forms      humanising forms of AI as current robotic AI
the hierarchy of human emulation shown in Fig. 3. Like     predominantly focuses on logical, linguistical and
Turing, it is not argued that an RHR metamorphosis         kinesthetic intelligence and neglects interpersonal and
into an organic system by replicating the conditions of    intrapersonal intelligence to create higher modes of
a human being. However, if an RHR can appear and           EAI. Interpersonal and intrapersonal AI is synergetic,
function in a manner indistinguishable from a human        incorporating various visual and audible stimuli such
being in real-world conditions, then that RHR is           as facial expressions, vocal tonality, gesturing, and
perceptually indivisible from a living human being,        emotive responsivity to humanise AI interaction. This
The World Economic Forum (2019). Thus, equal               approach enhances the capacity for natural
consideration to the appearance and functionality of       communication and responsivity between humans, and
RHRs is essential to develop higher modes of human         RHRs founded on authentically assimilating natural
emulation.                                                 human-human interaction, (Barnfield, 2020).
Previous evaluation methods fall into the MTTs              The MTT is a method for overcoming many of the
categories of human emulation, but none are inclusive       design issues that are prevalent in RHRs such as
of all four stages of development. For example, The         inaccurate eye emulation, poor aesthetical design and
THT and HTT, in movement (handgrip), the TTT falls          unnatural movement. Furthermore, according to the
under appearance, AI: Wizard of Oz (WOZ) method             uncanny valley hypothesis, realistic humanoids
and kinetics (robotic vision, aesthetics and movement),     instigate negative perceptual feedback in humans
the Turing Test in AI in (AI) and the TTTT in               because they are void of variable organic nuances.
appearance, movement and AI. However, developing
an RHR as a complete system with components across          This consideration is vital in the development and
all four categories of the hierarchy of human emulation     progression of modern RHRs, as traditional methods of
(without consideration of the stages) will not achieve      evaluation and design overlook the significance of
levels of human likeness indivisible from a human           replicating nuances such as pupil dilation, gestures and
being. For example, comparing two RHR heads to              accurate lip movement. These facial expressions act as
determine which one is more visually authentic than the     visual cues and signifiers of sentience when discerning
other is a viable methodology for evaluating and testing    the authenticity of an RHR.
new components by increasing the realism of one             Thus, when evaluating an RHR, all elements are
robotic head over the other.                                interconnected to the perceptual whole. To achieve
However, this approach is futile when comparing             this, an imitation head structure and cloaking device to
RHRs against a living human being to determine              cover empty areas around the developed feature is a
authenticity as the distinctions in form and function are   practical method of resolving this issue. This approach
highly apparent, as exemplified in Fig. 4.                  permits a holistic evaluation compared to analysing
                                                            individual facial features outside of the body (unified
                                                            whole).

                                                            The Multimodal Turing Test: three orders of human
                                                            emulation: The three orders of human emulation are a
                                                            framework for developing RHRs that appear and
                                                            function in a manner that is indistinguishable from the
                                                            natural human being under the conditions and
                                                            limitations of the MTT evaluation procedure.
Fig. 4: RHRs developed in this study / Human Comparison.
Left: RHR, Baudi. Middle: RHR, Euclid. Right: Human head
                                                            1. Fragmentary Emulation: A unified subgroup that
Therefore, a multimodal approach is required using a        qualifies as perceptually indistinguishable in form / and
controlled evaluation methodology by combining              or function when compared to a human.
features that belong to the same body (subgroup), such
                                                            2. Synchronised Emulation: A set of two or more
as, EAI, natural speech synthesis and a robotic jaw,
                                                            subgroups that are perceptually indivisible in form /
tongue and lips. This evaluation procedure applies to
                                                            and or function from a living human being.
other subgroups such as eyes: (sclera, pupil dilation,
iris, eyelid, eyelashes, veins, eye movement, blink rate,   3. Absolute Emulation: A fully assembled human
skin, hair, aesthetics) and so on. This approach is         replicant consisting of all subgroups working as a
similar to the functional constraints of the original       unified whole to emulate the human form and function.
Turing Test to control the direction and flow of a
conversation by narrowing it to a specified theme or        The total length of the MTT is 20 minutes and divided
topic of discussion. This technique permits the             into four 5-minute evaluation sections, covering:
refinement of smaller intricate motor functions and         appearance, movement, voice and AI founded on the
aesthetics within the subgroups, indicated in Fig. 5.       five-minute evaluation rule of the original Turing test.
                                                            The MTT has broader applications outside the field of
                                                            RHRs and EAI in realistic virtual humanoids (RVHs)
                                                            with EAI for HCI. Developing higher modes of human
                                                            likeness in RVHs is significant in EAI interface design
                                                            for HCI and exploring the UV in RVHs. Therefore, it
                                                            is essential to provide evaluation conditions for
                                                            assessing the perceptual authenticity of RVHs for the
                                                            future progression of virtual humanoids towards a
                                                            simulacrum indivisible from living humans.
   Fig. 5: Mouth comparison Left: RHR. Right: Human
4. The Multimodal Turing Test for RHRs                        4.2. Second Stage: Movement and Dexterity

The MTT is more comprehensive than the Turing Test,           The second stage of the MTT incorporates both
TTT, RTT, THT and HTT by systematically examining             movement and appearance; The human interrogator
appearance, functionality, AI and voice processing to         (A) selects an expression or gesture from a list of
provide a universal evaluation procedure for all types        commands, such as smile, frown, wave, open mouth.
of humanoid robots with varying degrees of human              The Human interrogator (A) selects which agent
likeness. This multimodality requires several                 performs the command by addressing the agent and
constraints to ensure the integrity of the evaluation         saying aloud the command. As in the Turing Test, a
procedure. In Fig. 6, the human Interrogator (A)              delay in the response time (5-10s) of the agents allows
evaluates the authenticity of agents (B) and (C) who are      time for NLP. Servomotor sounds must be triggered by
separated by a solid screen to minimise interference.         the human agent when performing physical movements
Significantly, both agents (B) and (C) inhabit the same       to reduce signifiers such as sound interference that may
physical environment and visual spectrum as the               allude to the mechanical nature of the RHR. It is
human interrogator for greater perceptual authenticity.       essential to assess tongue movement to match vowel
                                                              and consonant sound as the internal components are
                                                              exposed by the robotic mouth during verbal
                                                              communication. The accurate replication of acute
                                                              motor functions such as pupil dilation, breathing, facial
                                                              tics and blink rate must be considered in the second
                                                              stage. Furthermore, the complexity and level of
                                                              movement are variable on the style of the humanoid
Fig. 6: MTT for RHRs Evaluation Environment. A: Human         robot; for instance, robotic heads do not require the
Interrogator. B: Human / RHR Agent C: Human / RHR Agent       evaluation of body movement such as hand gesturing.

4.1 First Stage: Appearance
                                                              However, evaluating hand gestures is essential for a
The first stage of the MTT requires human interrogator        ‘waist up’ robot design. Comparatively, a waist up
(A) to evaluate the appearance of agents (B) and (C).         robot does not require the evaluation of leg movement
Different subgroups contain different visual elements         and balance, unlike a full-body humanoid robot which
such as lips, hair, skin tone and wrinkles. Therefore,        needs the robot to stand and move the lower parts of its
imperfections in synthetic skin such as wrinkles, spots       body. Therefore, applying constraints to control the
and blemishes are essential as these defects are not          evaluation area for different styles of RHRs is
typically associated with RHRs. The first level               significant, for example; seating robotic heads and
examines the visual authenticity of the agents, such as       waist-up robots and at a table during the evaluation
an area of natural skin of Agent (B) with the                 procedure will reduce and concentrate the evaluation
corresponding synthetic skin area from Agent (C). The         area. This method is standard in HRI to conceal an
MTT is significant to the progression of RHRs as the          RHRs lower body and external mechanical
Turing Test does not provide a developmental                  components from the observer. If an RHR can pass the
framework due to the binary pass/fail system.                 first two stages of the MTT at a rate of 30%+, is the
                                                              same as saying in real-world conditions, an RHR is
Thus, allowing engineers to gauge the authenticity of         visually indistinguishable from a living human being
specific facial/bodily areas individually, as a group, or     (without speaking or AI interaction).
as a complete form towards attaining the pass threshold
(emulation that is indivisible from a living human) is        4.3 Third Stage: Speech and Mouth Articulation
essential. It is crucial to evaluate Agent (B) against (C)
                                                              The third stage of the MTT evaluates an RHRs speech,
and then Agent (C) against (B) for a detailed and
                                                              lip dexterity and aesthetical appearance. It is not the
comprehensive analysis. For example, imagine Agent
                                                              objective of the MTT to develop a more human-
(B) is a robotic mouth and (C) a human mouth, and the
                                                              sounding robotic voice as this field is continually
human interrogator (A) identifies a visual irregularity
                                                              evolving outside of RHR design. However, the MTT
in the bottom lip of Agent (B) leading to the human
                                                              examines the compatibility and accuracy of speech
interrogator identifying Agent (B) as an RHR. This
                                                              synthesis with robotic mouth articulation. Speech
process applies to every item within a subgroup to
                                                              synthesis technologies are advancing rapidly and
pinpoint the precise location of the visual irregularity.
                                                              continually improving in human likeness, and the use
It is vital to access the aesthetical quality of the inside
                                                              of current and future speech synthesis technologies in
of the robotic mouth during the first stage evaluation as
                                                              RHRs is significant towards total automation.
this area is exposed during operation.
Using NLP in the MTT is preferable to human speech          Therefore, a time limitation of 10 seconds is imposed
as it protects the integrity of the test environment by     and strictly monitored throughout the evaluation
seamlessly interchanging between the previous               procedure, with time added to the end of each session
evaluation stages. However, as speech synthesis is yet      if silence is excessive or exceeds the 10-second
to replicate human speech, implementing current             maxima. If an RHR can pass the third stage of the
speech synthesis is counterproductive when developing       MTT, then that systems autonomous speech processing
RHRs that are perceptually indivisible from humans.         and tonal expressions are proximal to natural human
Therefore, it is essential to outline an alternative        speech and mouth/lip movement, facial expressions
methodology of natural speech processing to overcome        and appearance. However, for an RHR to progress to
the current limitations of computerised speech              the final stage (AI) of the hierarchy of human
technologies. The WOZ approach permits a second             emulation, the system must be fully automated without
human agent (D) to speak in place of the robotic voice,     human control for the integration of speech and AI.
as demonstrated in Fig. 7. The speech of Agents (D)         Therefore, implementing the alternate speech
and (C) are relayed to the human interrogator (A) by        evaluation procedure is an acceptable method for
headphones to minimise the sound difference between         passing the third level of the MTT but not for
the speaker system and natural human voice.                 progressing onto the final level.

                                                            4.4 Final Stage: AI (Absolute Emulation)

                                                            The final stage of the MTT is inclusive of all four
                                                            elements: intelligence, movement, speech and
                                                            appearance. It is vital at this stage that all human
                                                            control is removed, permitting the RHR to function
                                                            autonomously and the AI to control the operations of
                                                            movement and speech. As EAI constitutes the
                                                            ‘personality’ of the RHR, developer’s need to create an
                                                            AI people personality with interests and traits that
Fig. 7: Natural voice to a robot speech/mouth actuation
                                                            match the appearance, speech synthesis and movement
This approach permits the examination of human              of the RHR. Passing the final stage of the MTT would
speech using a robotic mouth system, allowing for a         answer the question: can machines emulate a human
greater accurate comparative evaluation than current        being? Therefore, developing an EAI program to
speech synthesis. However, real-time human speech to        control accurately trigger facial expressions, voice
lip synchronisation is less reliable than speech            tone, emotions and gestures are crucial in the final
synthesis due to the variability in pitch, volume,          evaluation. This method is the foundation for
frequency and tonality of human speech. Therefore, it       developing more sophisticated modes of interpersonal
is essential to configure the robotic mouth to function     AI for robots. Like the Turing Test, the final stage
with one human voice for optimum lip-synchronisation        evaluation focuses on a single topic of discussion
accuracy. Although the evaluation for natural human         selected by the human interrogator from a pre-
speech and computerised speech is different, the            established list of subjects. The final test lasts 5 minutes
procedure is identical. The human interrogator (A)          with the human interrogator (A) posing 2.5 minutes of
engages in an interactive game with agents (B+D) and        questioning to agents (B) and (C) on the selected topic.
(C). The objective of the game is for the human             As technology improves NLP, RHR and AI efficiency,
interrogator (A) to guess what animals that agents          this time limit should be extended until the RHR can
(B+D) and (C) are thinking of by posing questions to        deceive a human interrogator indefinitely. At the end
each of them about the animal’s appearance, habitat,        of the evaluation procedure, the human interrogator (A)
movement and diet. The human interrogator (A) rates         chooses which agent (B) or (C) is human (or unsure)
and compares the authenticity of Agents (B) and (C)         and provide a detailed account of the decision-making
voice and mouth articulation. This approach is vital for    process covering all evaluation categories. If 30%+ of
evaluating speech, as implementing a structured             test subjects misidentify or are unable to discern the
gamification methodology does not require deep              difference between the RHR and the human agent, then
learning or machine learning methods and permits the        the RHR has succeeded in passing the final stage of the
human interrogator to focus on speech quality rather        MTT However, if an RHR does not pass all stages of
than correct or incorrect AI responses. Finally, time       the MTT, the data gathered during the test stages will
limitations on ‘silence’ are significant to upholding the   provide engineers with information concerning specific
integrity of the MTT as suggested in an article on the      area/s that emit irregular feedback through the layered
MTT and the Turing Test, (Cole, 2019).                      evaluation process for revision or calibration.
5. Conclusion                                                                Karniel, A & Avraham, G, Peles, Ba & Levy-Tzedek (2010). Turing-Like
                                                                             Handshake Test for Motor Intelligence. JoVE. 10.3791/2492.

The MTT is an essential evaluation method towards                            Kasaki, M. & Ishiguro, H. & Asada, M. & Osaka, M. & Fujikado, T.
achieving higher modes of human likeness in RHRs                             (2016). Cognitive neuroscience Robotics: Synthetic Approaches to human
                                                                             understanding. 10.1007/978-4-431-54595.
and EAI as in other methods of evaluation; slight
                                                                             Khatib. H (2019) Just because they are robots? Retrieved:
miscalculations of an otherwise realistic-looking robot                      www.ameinfo.com/industry/technology/robots-treat-humanoids-racial-
can allude to the robot’s artificiality resulting in other                   gender-bias Acc: 15.09.19
high-quality components becoming part of that failure.                       Landgrebe. J Smith. B (2019) There is no AGI. Retrieved:
The objective of the MTT is to permit engineers to                           https://arxiv.org/abs/1906.05833. Acc: 25.02.20

work systematically and build up areas of the face and                       Mann. A (2014) The computer actually got an F on the Turing Test. Ret:
body to ensure all components are equal to that of a                         wired.com/2014/06/turing-test-not-so-fast/. Acc:9.2.20

human before expanding the fields and adding more                            Marzano, G & Novembre, A. (2016). Machine’s that Dream: A New
                                                                             Challenge in Behavioral-Basic Robotics. Procedia Computer Science. 104.
features towards creating a complete RHR that is                             146-151. 10.1016/j.procs.2017.01.089.
perceptually indivisible from a living human being.
                                                                             Mathieson. S (2019) Will androids ever be able to convince people they
                                                                             are           human?         Retrieved:          www.researchgate.net
References                                                                   /publication/341756234_Mr_Robot_Will_androids_ever_be_able_to_con
                                                                             vince_people_they_are_human_Guardian_Online. Acc 07.07.20
AAmoth. D (2014) The Fake Kid Who Passed the Turing Test.
Ret:time.com/2847900/eugene-goostman-turing-test/.Acc 5.2.20                 Mori. M (1970). The Uncanny Valley. Energy, Issue 7, pp.33-35. DOI:
                                                                             10.1109/MRA.2012.2192811
Aron. J (2011) AI tricks people into thinking it is human. Retrieved:
newscientist.com/article/dn20865-software-tricks-people-into-thinking-it-    Oppy, G. R., & Dowe, D. L. (2003). The Turing Test. Stanford
ishuman/#ixzz6Ez0JgQi1. Acc: 25.02.20                                        Encyclopedia of Philosophy, 1(online), 1 - 26.

Barnfield. N (2020) Face to Face With The Future of AI. Horizon              Pereira. D (2019) You should fear Super Stupidity, not ASI
Magazine. Riley Raven. DOI: https://www.staffs.ac.uk/alumni/ horizon-        Retrieved:towardsdatascience.com/you-should-fear-super-stupidity-not-
alumni-magazine pp.8-11                                                      super-intelligence-19f93a46fa4d. Acc.17.02.20

Baudrillard. J (1994). Simulacra and simulation. Trans: Ann Arbor :          Rapaport, W. (2000). How to Pass a Turing Test. Journal of Logic,
University of Michigan Press, ISBN-10: 0472065211.                           Language,     and      Information,   9(4),    467-490. Retrieved:
                                                                             www.jstor.org/stable/40180238. Acc: 24.04 2020
Bringsjord, S., Caporale, C., & Noel, R. (2000). The Total Turing Test, J-
LLI, 9(4), 397-418. DOI:www.jstor.org/stable/40180234                        Reynolds. E (2016) Does the Fifth Amendment ‘expose a serious flaw’ in
                                                                             Turing Test? Retrieved:www.wired.co.uk/article/major-flaw-turing-test-
Burleigh. T, Schoenherr. J, Lacroix. G (2013). Does the uncanny valley       silence. Acc:25.0 2.20
exist?    Computers     in    Human      Behaviour.    29.   759-771.
DOI:10.1016/j.chb.2012.11.021.                                               Robbins. M (2014) A Machine Did not ‘Pass’ the Turing Test. Retrieved:
                                                                             https://www.vice.com/en_uk/article/gq8ddw/eugene-goostman-alan-
Cole. E (2019) What is New in Robotics? Retrieved:                           turing-test-kevin-warwick. Acc: 25.02.20
blog.robotiq.com/whats-new-in-robotics-06.12.2019. A:25.02.20
                                                                             Sample. I & Hern. A (2014) Scientists dispute if ‘Eugene Goostman’
Copeland. J (2014) Why Eugene Goostman Did Not Pass the Turing Test.         passed    Turing    Test.     Retrieved:   www.theguardian.com/techno
Retrieved:                      https://www.huffingtonpost.co.uk/jack-       logy/2014/jun/09/scientistsdisagree-over-whether-Turing-test-has-been-
copeland/turingtesteugenegoostman. Acc:25.02.20                              passed. Acc. 19.04.20

Fawaz.    A    (2019)    A    tangible    Turing     Test.    Retrieved:     Schweizer. P (1998). The Truly Total Turing Test. Minds Mach. 8, 2 (May
https://www.neowin.net/news/a-tangible-turing-testthe-loebner-prizeis-       1998), 263–272. DOI:10.1023/A:1008229619541
coming-to-swansea-this-weekend/. Acc: 21.02.20
                                                                             Searle, J (1980). Minds, brains, and programs. Behavioural and Brain
Gehl. R (2014). Teaching to the Turing Test with Cleverbot.                  Sciences 3 (3): 417-457, DOI: 10.1.1.83.5248.
Transformations: The Journal of Inclusive Scholarship and Pedagogy,
24(1-2), 56-66. Retrieved February 25, 2020.                                 Sharkey. N (2012) Alan Turing: The experiment that shaped AI. Retrieved:
                                                                             bbc.co.uk/news/technology-18475646. Acc:22.02.20
Ghose. T (2016) Robots Could Hack Turing Test by Keeping Silent.
Retrieved: www.scientificamerican.com/article/robots-could-hack-turing-      Stock-Homburg. R, Peters. J, Schneider. K, Prasad. V, Nukovic. L (2020)
test-by-keeping-silent/. Acc: 25.02.20                                       Evaluation of the HTT for anthropomorphic Robots, Int Conf HRI. DOI:
                                                                             10.1145/3371382.3378260
Harnad, S. (1992) The Turing Test Is Not A Trick: Turing
Indistinguishability Is A Scientific Criterion. SIGART Bulletin 3(4)         The World Economic Forum (2019) Can machine’s think? A new Turing
(October 1992) pp. 9 - 10.                                                   Test       may         have        the       answer.       Retrieved:
                                                                             www.weforum.org/agenda/2019/08/our-turing-test-for-androids-will-
Harnad, S. (2000) Minds Machine’s and Turing, Journal of Logic,              judge-how-lifelike-humanoid-robots-can-be/. Acc: 14.03.20
Language and Information, vol 9: p.425. https://doi.or g/10.1023/A:100
8315308862                                                                   Turing, A. (1950). Computing Machinery and Intelligence, Mind, (236),
                                                                             pp.433-460. doi.org/10.1093/mind/LIX.236.433.
Hern. A (2014) What is the Turing Test? Retrieved:
www.theguardian.com/technology/2014/jun/09/what-is-the-alan-turing-          Wakefield. J (2019) The hobbyists competing to make AI human.
test. Acc: 25.02.20                                                          Retrieved:www.bbc.co.uk/news/technology-49578503 Acc: 25.02.20

Houser. K (2019) Advanced Robotics Forced Scientist To Invent A New          Warwick, K., & Shah, H. (2015). Passing the Turing Test Does Not Mean
Turing Test. Retrieved: https://futurism.com/the-byte/scientists-invented-   the End of Humanity. Cognitive Computation, 8, 409-419. DOI:
new-turing-test. Acc: 18.04.20                                               10.1007/s12559-015-9372-6

Ishiguro, H. (2005). Android science: Toward a new cross-interdisciplinary   Whitby, B (1996) Reflections on AI: the legal, moral and ethical
framework. J-Comp-Sci, Corpus ID: 6105971Jacquet, B., Baratgin, J., &        dimensions. Intellect, Oxford, UK. ISBN 9781871516685
Jamet, F. (2019). Cooperation in Online Conversations. Journal of
Psychology, 10, 727. https://doi.org/10.3389/fpsyg.2019.00727