=Paper=
{{Paper
|id=Vol-3007/2020-paper-2
|storemode=property
|title=The Multimodal Turing Test for Realistic Humanoid Robots with Embodied Artificial Intelligence
|pdfUrl=https://ceur-ws.org/Vol-3007/2020-paper-2.pdf
|volume=Vol-3007
|authors=Carl Strathearn,Minhua Ma
|dblpUrl=https://dblp.org/rec/conf/lifelike/StrathearnM21
}}
==The Multimodal Turing Test for Realistic Humanoid Robots with Embodied Artificial Intelligence==
The Multimodal Turing Test for Realistic Humanoid Robots with Embodied Artificial Intelligence Carl Strathearn 1 and Minhua Ma 2 1 School of Computing and Digital Technologies, Staffordshire University, UK. Carl.Strathearn@research.staffs.ac.uk 2 Provost, Falmouth University, UK. M.Ma@falmouth.ac.uk Abstract Goostman passed the 30% benchmark of the Turing Test in 2014, scoring a marginal 33%, at the Royal Alan Turing developed the Turing Test as a method to Society AI competition in 2014. However, determine whether artificial intelligence (AI) can commentators such as Copeland (2014), Hern (2014) deceive human interrogators into believing it is sentient and Robbins (2014) contest the validity of this by competently answering questions at a confidence achievement, stating two significant flaws in the rate of 30%+. However, the Turing Test is concerned evaluation procedure. Firstly, human interrogators had with natural language processing (NLP) and neglects prior knowledge that the AI system emulated a 13-year the significance of appearance, communication and old Ukrainian boy. This approach dissolves the movement. The theoretical proposition at the core of integrity of the Turing Test, which states the removal this paper: ‘can machines emulate human beings?’ is of all identifiers is vital in maintaining impartiality concerned with both functionality and materiality. (Turing, 1950). Secondly, the creators of the Eugene Many scholars consider the creation of a realistic Goostman chatbot hand-selected the human humanoid robot (RHR) that is perceptually interrogators for the test, significantly increasing the indistinguishable from a human as the apex of probability for participant bias. Sample & Hern (2014) humanity’s technological capabilities. Nevertheless, no argue that claiming the Eugene Goostman chatbot comprehensive development framework exists for passed the Turing Test is fundamentally absurd as engineers to achieve higher modes of human Turing’s prediction that in 50 years conversational AI emulation, and no current evaluation method is could pass as a human was merely hypothetical, akin to nuanced enough to detect the causal effects of the a statistical survey or Gallup poll. Turing’s acumen is Uncanny Valley (UV) effect. The Multimodal Turing a methodology to explain how the human mind Test (MTT) provides such a methodology and offers a functions by developing a computer capable of foundation for creating higher levels of human likeness proximal behaviour and intelligence, which includes in RHRs for enhancing human-robot interaction (HRI) verbal processing and sensorimotor/robotic dimensions Key Words: Turing Test, Humanoid Robots, Artificial in which AI is systematically grounded (Sample & Intelligence, Embodied Artificial Intelligence, HRI Hern, 2014). 1. Introduction In consideration, Harnad (2000) argues that the Turing Test is not a measure of how an AI system operates The Turing Test hypothetically evaluated over five minutes; it is the system’s ability to simulate computational AI using typesetters and pre-written the human mind over a lifetime. According to Gehl scripture to emulate human thought (Turing, 1950). (2013), a similar text-based chatbot named Cleverbot However, modern conversational AI systems function claimed to pass the Turing Test in 2011 at the Technie with greater accuracy at a higher rate of processing than festival in India, four years before the Eugene the analogue methods outlined in Turing’s paper. Goostman chatbot. However, Cleverbot did not receive Landgrebe & Smith (2019) explain that unlike the the media coverage and scholarly attention of the original Turing Test, the updated Turing Test for AI Eugene Goostman program due to numerous utilises two computer interfaces to replace the type- irregularities in the results. setter methodology. One computer system implements a conversational AI application and the other controlled Aron (2011), Jacquet et al. (2019) and Mann (2014) by a human agent concealed from the view of the argue that although Cleverbot claimed to exceed the human interrogator. The role of the human interrogator 30% benchmark of the Turing Test scoring an is to evaluate the authenticity and accuracy of the exceptional 59.3%, human interrogators rated human agent’s responses to determine which system is agents as AI at an even higher rate of 63.3%. Thus, artificial and which is human. There are accounts of AI significant discrepancies in the results indicate systems which claim to have passed the Turing Test. fundamental flaws in the evaluation procedure and For example, Warwick & Shah (2015) and Aamoth recruitment process. (2014), advocate that a chatbot program named Eugene Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). However, Landgrebe & Smith (2019), Jacquet et al. However, Turing applies constraints to the Turing Test (2019) and Pereira (2019) argue that although to establish equilibrium between the agents. Firstly, numerous chatbot systems claim to pass the Turing Turing narrows the scope of interaction between the Test. The modernised tests are weak variations of human interrogator and the human/machine agents to a Turing’s original proposition, which are not single topic of conversation, to prevent the human representative of Turing’s hypothesis and therefore do interrogator asking questions outside of the scope of the not qualify as certified passes. Fawaz (2019) and AI system’s capabilities which may allude to the Wakefield (2019) explain that creating chatbots to pass artificiality of the system. Similarly, Turing restricts the Turing Test is a developer’s past-time as there is no the human interrogator’s ability to propose serious scientific research in developing AI to pass the mathematical inquiries to the agents as machine’s are Turing Test. In support, Sharkey (2012) suggest that as capable of correctly answering complex equations Turing is long deceased, clarifying the terms and consistently, unlike humans. conditions of passing the Turing Test is impossible. Secondly, Turing imposes a 15-30 second time delay In RHR design, Mori’s (1970) UV accounts for the between the responses of the human interrogator as negative psychological stimulus propagated by RHRs machine agents require time to formulate and respond upon observation, as the more human-like artificial to questions, unlike the human mind to which humans appear, the greater the potential for humans to responses are immediate. Thirdly, Turing limits the feel repulsed by their appearance. However, per time-scale of the evaluation to 5 minutes to prevent the Burleigh (2013), there are considerable arguments machine agent producing incorrect or repetitive against the scientific value of the UV theorem, as many responses as the longer the interaction, the higher the scholars regard it as purely academic. Thus, the UV potentiality for error. However, Turing considers the like the Turing Test remains a controversial topic in AI physical emulation of the human being as a distraction and robotics. from the pursuit of intelligent machine’s (Turing, 1950, p.2). Although Turing is correct in stating that the 2. The Turing Test appearance of a machine is not indicative of its intellectual capabilities, he neglects the capacity of the Alan Turing (1950) formulated the Turing Test to human body in tactile learning, socialisation and non- determine if a machine agent could mislead human verbal communication which are vital processes in interrogators into believing answers provided by a social learning and communication. computer are those of a human. If the machine convinces 30%+ of human interrogators into thinking 2.1 Arguments and Limitations of the Turing Test it is sentient, the system passes the test and the higher this percentage, the more humanistic the AI functions. In Searl’s experiment, a human agent sat in the middle Turing argues that if a machine agent is capable of of a room is passed a series of random Chinese symbols exhibiting human behaviour indistinguishable to that of from under a door. The agent uses an instruction a human, then the artificial mind functions in a manner manual to arrange the symbols to form coherent akin to the human mind (it can think). However, Turing sentences. After a while, the agent becomes efficient in questions a machine’s ability to think as ‘thinking’ is arranging the symbols into sentences and no longer problematic to define and thus proposes the Turing Test requires the instruction manual. The instruction manual as a methodology to explore this concept. is removed and interrogators who are fluent in written Chinese observe the agent arrange the symbols into Turing supposes that if a machine agent replaced either sentences and state whether they think the agent is of the male or female agents in the imitation game and literate in Chinese or not. could operate with a level of intelligence proximal to the responses of a human, then it would replace his In the experiment, the interrogators agree that the agent original hypothesis ‘can machines think?’ (Turing, is fluent in Chinese to form coherent sentences using 1950).In the Turing Test, the objective of the human the symbols. However, the agent only understands the interrogator is to identify which agent is AI and human order of the symbols and not their meaning and by posing a series of questions to evaluate the therefore, lacks the vital process of comprehension. authenticity of the responses to differentiate between Thus, the perception of the interrogators in Searl’s the AI agent and human agent. It is the agent’s role to experiment is critical in understanding how humans deceive the human interrogator into believing that they interpret the appearance of intelligent behaviour as in are the opposite agent by providing type-written real-life conditions; there are no visual distinctions answers that simulate the responses of the other. between functional intelligence and comprehension, visualised in Fig.1 This concept is significant in HRI and HCI as it considers how humans interface and interact with technologies that simulate human intelligence, personalities and behaviour. Rapaport (2000) argues that the Turing Test is limited in its scope of evaluation as it only considers HCI via NLP. Stock-Homburg et al. (2020) describe the Handshake Turing Test (HTT) and similarly, Karniel et al. (2010) the Turing Handshake Test (THT) as tests to determine if human Fig. 1: Visual representation of Searls Argument interrogators can identify the differences between a human and RHR by the act of a handshake (tactile Similarly, Cole (2019) and Warwick & Shah (2015) HRI). Moreover, this approach neglects the emulation argue that the Turing Test is susceptible to human of appearance, communication, AI and movement by interference by a fundamental design flaw which focusing on secondary aspects such as touch and inverts the human perception of the nature of temperature. Ishiguro (2005) developed the Total computing by remaining silent. Ghose (2016) explains Turing Test (TTT) for RHRs in HRI, formulated on that if an AI system does not answer questions when Harnad’s (1992) TTT for human-computer interaction prompted, the human interrogators cannot distinguish and Harnad’s (2000) Robot Turing Test (RTT) to between the silence of the AI and human responses; comprehensively evaluate the appearance, behaviour hence, the AI agent would pass as human by default. and movement of RHRs against a human counterpart. Thus, it is the expectancy for a computer system to Ishiguro’s (2005) TTT implements point of view respond to the actions of a human operator. If a (POV) cameras mounted on the heads of the human and computer does not perform tasks in a manner RHR agents. The agents conduct logistical tasks, and it accustomed in HCI, this processual irregularity has the is the role of the human interrogator to discern which potentiality to influence human perception of the nature agent is human and RHR from observation. Secondly, of the agent (Reynolds, 2016). In consideration, Hern the human interrogator observes live ‘full body’ video (2019) and Landgrebe & Smith (2019), suggest that streams of the agents for two seconds and decides silence during the Turing Test is not uncommon and which agent is human and RHR. Kasaki et al. (2016) typically the result of poor programming. cite 70% of subjects identified the movements of RHRs as human. Ishiguro argues that the Turing Test However, stricter policies regarding the time limit of evaluates the intellectual capabilities of a computer on agent responses are crucial in maintaining the integrity the assumption that the human mind is divisible from of the Turing Test to irradicate purposeful exploitation the body. of this loophole. Whitby (1996), argues that AI developer’s and scholars have long misinterpreted the Thus, the TTT evaluates embodied artificial purpose of the Turing Test as Alan Turing designed the intelligence (EAI) by combining intelligent behaviour ‘Imitation Game’ as a game and not a formal test. with a robotic body for assessing the human likeness of Whitby argues that Alan Turing never intended the robotic behaviour, appearance and movement. imitation game as an evaluation of machine However, the TTT is susceptible to design flaws; intelligence, but rather as a thought experiment for Firstly, live video footage is inaccessible. Secondly, assessing a machine’s capacity to portray the Marzano & Novembre (2017) argue that the 2-second behaviours of a human authentically. Whitby suggests evaluation window is too limited. Thirdly, according to that Turing’s paper is not an operational guide for AI, Schweizer (1998) & Bringsjord et al. (2000), the TTT but a theoretical treatise to examine the sociological is not a comprehensive approach as it neglects the and scientific value of creating machine’s which can evaluation of NLP to robotic mouth articulation during mislead human beings into believing they are human. HRI. Fourthly, Oppy (2003) stipulates that judging the However, Whitby explains that simulating human authenticity of intelligent behaviour by manipulating personalities and emotion in AI is damaging as these objects is not indicative of a machine’s intellectual attributes tend to be misleading rather than progress the capacity. In consideration, Schweizer (1998) created intellectual capacity of AI. Thus, the practical value of the Truly Total Turing Test (TTTT) to remove Turing’s hypothesis is not in creating machine’s with telepresence from the TTT and evaluate automated intelligence proximal to humans known as artificial RHR’s with EAI. However, the TTTT lacks vital general intelligence (AGI), but in emulating the processes such as physical examination, movement, conditions of the human mind and behaviours using appearance, materiality, EAI and communication when computers. operating as one robotic system. 3. The Multimodal Turing Test Per the findings of the literature review, current evaluation methods used to determine degrees of human likeness in RHRs in HRI and HCI, such as The Turing Test, TTT, TTTT, RTT, THT and HTT are too limited in their scope of evaluation as they neglect the significance of amalgamating; communication (speech and gesturing), movement, vision, aesthetics and conversational AI into a single system, which is not representative of the human condition. In consideration, this study lays the foundations of a comprehensive theoretical evaluation methodology named the Multimodal Turing Test (MTT) to determine if RHRs can attain a level of emulation perceptually indivisible from a human being, (Houser, Fig. 3: The Hierarchy of MMT: Level 1 (Appearance), Level 2. 2019). As cited in a recent article in the Guardian UK, (Appearance & Movement), Level 3 (Appearance, Movement & Communication) Level 4. (Appearance, Movement, the MTT is more holistic than the original Turing Test, Communication & AI). and previous evaluation methods in HRI by evaluating an RHRs appearance, communication, movement and However, replicating the appearance and materiality of AI (Mathieson, 2019), shown in Fig. 2. a human is more straightforward than simulating human movement due to the complexity of natural kinetic variance. Therefore, per Baudrillard’s (1994) order of simulacra, appearance forms the bedrock of the hierarchy of human emulation because it is the elementary form of simulation. Aesthetical appearance envelops a body to which movement is applied, as natural movement is more complicated to replicate than a still model; kinetics forms the second level of emulation. For speech to be a useful communication tool in RHRs, requires both an authentic appearance and naturalistic. AI is the apex of human emulation as the human mind is the most challenging element to simulate authentically due to its complexity. However, Fig. 2: The four evaluation modes of the MTT for AI to be a useful tool in RHRs, the emulated mind requires a human-like body and a method of The MTT incorporates the examination structure of the communication for naturalistic HRI. 1950 Turing Test by employing human interrogators to evaluate the perceptual authenticity of RHRs. The four evaluation categories of the hierarchy of However, unlike the binary pass / fail system of the human emulation formulate a unified whole, which original Turing Test, the MTT provides engineers, constitutes an RHR that can emulate (to degrees of designers and programmers with a developmental likeness) a living human being, as reviewed in an framework to benchmark progress up to and in advance article by Khatib (2019) which outlines the scope of the of Turing’s 30% pass rate (Strathearn, 2019). Each MTT. Furthermore, the MTT is an approach towards stage of the MTT increases in complexity, which forms humanising forms of AI as current robotic AI the hierarchy of human emulation shown in Fig. 3. Like predominantly focuses on logical, linguistical and Turing, it is not argued that an RHR metamorphosis kinesthetic intelligence and neglects interpersonal and into an organic system by replicating the conditions of intrapersonal intelligence to create higher modes of a human being. However, if an RHR can appear and EAI. Interpersonal and intrapersonal AI is synergetic, function in a manner indistinguishable from a human incorporating various visual and audible stimuli such being in real-world conditions, then that RHR is as facial expressions, vocal tonality, gesturing, and perceptually indivisible from a living human being, emotive responsivity to humanise AI interaction. This The World Economic Forum (2019). Thus, equal approach enhances the capacity for natural consideration to the appearance and functionality of communication and responsivity between humans, and RHRs is essential to develop higher modes of human RHRs founded on authentically assimilating natural emulation. human-human interaction, (Barnfield, 2020). Previous evaluation methods fall into the MTTs The MTT is a method for overcoming many of the categories of human emulation, but none are inclusive design issues that are prevalent in RHRs such as of all four stages of development. For example, The inaccurate eye emulation, poor aesthetical design and THT and HTT, in movement (handgrip), the TTT falls unnatural movement. Furthermore, according to the under appearance, AI: Wizard of Oz (WOZ) method uncanny valley hypothesis, realistic humanoids and kinetics (robotic vision, aesthetics and movement), instigate negative perceptual feedback in humans the Turing Test in AI in (AI) and the TTTT in because they are void of variable organic nuances. appearance, movement and AI. However, developing an RHR as a complete system with components across This consideration is vital in the development and all four categories of the hierarchy of human emulation progression of modern RHRs, as traditional methods of (without consideration of the stages) will not achieve evaluation and design overlook the significance of levels of human likeness indivisible from a human replicating nuances such as pupil dilation, gestures and being. For example, comparing two RHR heads to accurate lip movement. These facial expressions act as determine which one is more visually authentic than the visual cues and signifiers of sentience when discerning other is a viable methodology for evaluating and testing the authenticity of an RHR. new components by increasing the realism of one Thus, when evaluating an RHR, all elements are robotic head over the other. interconnected to the perceptual whole. To achieve However, this approach is futile when comparing this, an imitation head structure and cloaking device to RHRs against a living human being to determine cover empty areas around the developed feature is a authenticity as the distinctions in form and function are practical method of resolving this issue. This approach highly apparent, as exemplified in Fig. 4. permits a holistic evaluation compared to analysing individual facial features outside of the body (unified whole). The Multimodal Turing Test: three orders of human emulation: The three orders of human emulation are a framework for developing RHRs that appear and function in a manner that is indistinguishable from the natural human being under the conditions and limitations of the MTT evaluation procedure. Fig. 4: RHRs developed in this study / Human Comparison. Left: RHR, Baudi. Middle: RHR, Euclid. Right: Human head 1. Fragmentary Emulation: A unified subgroup that Therefore, a multimodal approach is required using a qualifies as perceptually indistinguishable in form / and controlled evaluation methodology by combining or function when compared to a human. features that belong to the same body (subgroup), such 2. Synchronised Emulation: A set of two or more as, EAI, natural speech synthesis and a robotic jaw, subgroups that are perceptually indivisible in form / tongue and lips. This evaluation procedure applies to and or function from a living human being. other subgroups such as eyes: (sclera, pupil dilation, iris, eyelid, eyelashes, veins, eye movement, blink rate, 3. Absolute Emulation: A fully assembled human skin, hair, aesthetics) and so on. This approach is replicant consisting of all subgroups working as a similar to the functional constraints of the original unified whole to emulate the human form and function. Turing Test to control the direction and flow of a conversation by narrowing it to a specified theme or The total length of the MTT is 20 minutes and divided topic of discussion. This technique permits the into four 5-minute evaluation sections, covering: refinement of smaller intricate motor functions and appearance, movement, voice and AI founded on the aesthetics within the subgroups, indicated in Fig. 5. five-minute evaluation rule of the original Turing test. The MTT has broader applications outside the field of RHRs and EAI in realistic virtual humanoids (RVHs) with EAI for HCI. Developing higher modes of human likeness in RVHs is significant in EAI interface design for HCI and exploring the UV in RVHs. Therefore, it is essential to provide evaluation conditions for assessing the perceptual authenticity of RVHs for the future progression of virtual humanoids towards a simulacrum indivisible from living humans. Fig. 5: Mouth comparison Left: RHR. Right: Human 4. The Multimodal Turing Test for RHRs 4.2. Second Stage: Movement and Dexterity The MTT is more comprehensive than the Turing Test, The second stage of the MTT incorporates both TTT, RTT, THT and HTT by systematically examining movement and appearance; The human interrogator appearance, functionality, AI and voice processing to (A) selects an expression or gesture from a list of provide a universal evaluation procedure for all types commands, such as smile, frown, wave, open mouth. of humanoid robots with varying degrees of human The Human interrogator (A) selects which agent likeness. This multimodality requires several performs the command by addressing the agent and constraints to ensure the integrity of the evaluation saying aloud the command. As in the Turing Test, a procedure. In Fig. 6, the human Interrogator (A) delay in the response time (5-10s) of the agents allows evaluates the authenticity of agents (B) and (C) who are time for NLP. Servomotor sounds must be triggered by separated by a solid screen to minimise interference. the human agent when performing physical movements Significantly, both agents (B) and (C) inhabit the same to reduce signifiers such as sound interference that may physical environment and visual spectrum as the allude to the mechanical nature of the RHR. It is human interrogator for greater perceptual authenticity. essential to assess tongue movement to match vowel and consonant sound as the internal components are exposed by the robotic mouth during verbal communication. The accurate replication of acute motor functions such as pupil dilation, breathing, facial tics and blink rate must be considered in the second stage. Furthermore, the complexity and level of movement are variable on the style of the humanoid Fig. 6: MTT for RHRs Evaluation Environment. A: Human robot; for instance, robotic heads do not require the Interrogator. B: Human / RHR Agent C: Human / RHR Agent evaluation of body movement such as hand gesturing. 4.1 First Stage: Appearance However, evaluating hand gestures is essential for a The first stage of the MTT requires human interrogator ‘waist up’ robot design. Comparatively, a waist up (A) to evaluate the appearance of agents (B) and (C). robot does not require the evaluation of leg movement Different subgroups contain different visual elements and balance, unlike a full-body humanoid robot which such as lips, hair, skin tone and wrinkles. Therefore, needs the robot to stand and move the lower parts of its imperfections in synthetic skin such as wrinkles, spots body. Therefore, applying constraints to control the and blemishes are essential as these defects are not evaluation area for different styles of RHRs is typically associated with RHRs. The first level significant, for example; seating robotic heads and examines the visual authenticity of the agents, such as waist-up robots and at a table during the evaluation an area of natural skin of Agent (B) with the procedure will reduce and concentrate the evaluation corresponding synthetic skin area from Agent (C). The area. This method is standard in HRI to conceal an MTT is significant to the progression of RHRs as the RHRs lower body and external mechanical Turing Test does not provide a developmental components from the observer. If an RHR can pass the framework due to the binary pass/fail system. first two stages of the MTT at a rate of 30%+, is the same as saying in real-world conditions, an RHR is Thus, allowing engineers to gauge the authenticity of visually indistinguishable from a living human being specific facial/bodily areas individually, as a group, or (without speaking or AI interaction). as a complete form towards attaining the pass threshold (emulation that is indivisible from a living human) is 4.3 Third Stage: Speech and Mouth Articulation essential. It is crucial to evaluate Agent (B) against (C) The third stage of the MTT evaluates an RHRs speech, and then Agent (C) against (B) for a detailed and lip dexterity and aesthetical appearance. It is not the comprehensive analysis. For example, imagine Agent objective of the MTT to develop a more human- (B) is a robotic mouth and (C) a human mouth, and the sounding robotic voice as this field is continually human interrogator (A) identifies a visual irregularity evolving outside of RHR design. However, the MTT in the bottom lip of Agent (B) leading to the human examines the compatibility and accuracy of speech interrogator identifying Agent (B) as an RHR. This synthesis with robotic mouth articulation. Speech process applies to every item within a subgroup to synthesis technologies are advancing rapidly and pinpoint the precise location of the visual irregularity. continually improving in human likeness, and the use It is vital to access the aesthetical quality of the inside of current and future speech synthesis technologies in of the robotic mouth during the first stage evaluation as RHRs is significant towards total automation. this area is exposed during operation. Using NLP in the MTT is preferable to human speech Therefore, a time limitation of 10 seconds is imposed as it protects the integrity of the test environment by and strictly monitored throughout the evaluation seamlessly interchanging between the previous procedure, with time added to the end of each session evaluation stages. However, as speech synthesis is yet if silence is excessive or exceeds the 10-second to replicate human speech, implementing current maxima. If an RHR can pass the third stage of the speech synthesis is counterproductive when developing MTT, then that systems autonomous speech processing RHRs that are perceptually indivisible from humans. and tonal expressions are proximal to natural human Therefore, it is essential to outline an alternative speech and mouth/lip movement, facial expressions methodology of natural speech processing to overcome and appearance. However, for an RHR to progress to the current limitations of computerised speech the final stage (AI) of the hierarchy of human technologies. The WOZ approach permits a second emulation, the system must be fully automated without human agent (D) to speak in place of the robotic voice, human control for the integration of speech and AI. as demonstrated in Fig. 7. The speech of Agents (D) Therefore, implementing the alternate speech and (C) are relayed to the human interrogator (A) by evaluation procedure is an acceptable method for headphones to minimise the sound difference between passing the third level of the MTT but not for the speaker system and natural human voice. progressing onto the final level. 4.4 Final Stage: AI (Absolute Emulation) The final stage of the MTT is inclusive of all four elements: intelligence, movement, speech and appearance. It is vital at this stage that all human control is removed, permitting the RHR to function autonomously and the AI to control the operations of movement and speech. As EAI constitutes the ‘personality’ of the RHR, developer’s need to create an AI people personality with interests and traits that Fig. 7: Natural voice to a robot speech/mouth actuation match the appearance, speech synthesis and movement This approach permits the examination of human of the RHR. Passing the final stage of the MTT would speech using a robotic mouth system, allowing for a answer the question: can machines emulate a human greater accurate comparative evaluation than current being? Therefore, developing an EAI program to speech synthesis. However, real-time human speech to control accurately trigger facial expressions, voice lip synchronisation is less reliable than speech tone, emotions and gestures are crucial in the final synthesis due to the variability in pitch, volume, evaluation. This method is the foundation for frequency and tonality of human speech. Therefore, it developing more sophisticated modes of interpersonal is essential to configure the robotic mouth to function AI for robots. Like the Turing Test, the final stage with one human voice for optimum lip-synchronisation evaluation focuses on a single topic of discussion accuracy. Although the evaluation for natural human selected by the human interrogator from a pre- speech and computerised speech is different, the established list of subjects. The final test lasts 5 minutes procedure is identical. The human interrogator (A) with the human interrogator (A) posing 2.5 minutes of engages in an interactive game with agents (B+D) and questioning to agents (B) and (C) on the selected topic. (C). The objective of the game is for the human As technology improves NLP, RHR and AI efficiency, interrogator (A) to guess what animals that agents this time limit should be extended until the RHR can (B+D) and (C) are thinking of by posing questions to deceive a human interrogator indefinitely. At the end each of them about the animal’s appearance, habitat, of the evaluation procedure, the human interrogator (A) movement and diet. The human interrogator (A) rates chooses which agent (B) or (C) is human (or unsure) and compares the authenticity of Agents (B) and (C) and provide a detailed account of the decision-making voice and mouth articulation. This approach is vital for process covering all evaluation categories. If 30%+ of evaluating speech, as implementing a structured test subjects misidentify or are unable to discern the gamification methodology does not require deep difference between the RHR and the human agent, then learning or machine learning methods and permits the the RHR has succeeded in passing the final stage of the human interrogator to focus on speech quality rather MTT However, if an RHR does not pass all stages of than correct or incorrect AI responses. Finally, time the MTT, the data gathered during the test stages will limitations on ‘silence’ are significant to upholding the provide engineers with information concerning specific integrity of the MTT as suggested in an article on the area/s that emit irregular feedback through the layered MTT and the Turing Test, (Cole, 2019). evaluation process for revision or calibration. 5. Conclusion Karniel, A & Avraham, G, Peles, Ba & Levy-Tzedek (2010). Turing-Like Handshake Test for Motor Intelligence. JoVE. 10.3791/2492. The MTT is an essential evaluation method towards Kasaki, M. & Ishiguro, H. & Asada, M. & Osaka, M. & Fujikado, T. achieving higher modes of human likeness in RHRs (2016). Cognitive neuroscience Robotics: Synthetic Approaches to human understanding. 10.1007/978-4-431-54595. and EAI as in other methods of evaluation; slight Khatib. H (2019) Just because they are robots? Retrieved: miscalculations of an otherwise realistic-looking robot www.ameinfo.com/industry/technology/robots-treat-humanoids-racial- can allude to the robot’s artificiality resulting in other gender-bias Acc: 15.09.19 high-quality components becoming part of that failure. Landgrebe. J Smith. B (2019) There is no AGI. Retrieved: The objective of the MTT is to permit engineers to https://arxiv.org/abs/1906.05833. Acc: 25.02.20 work systematically and build up areas of the face and Mann. A (2014) The computer actually got an F on the Turing Test. Ret: body to ensure all components are equal to that of a wired.com/2014/06/turing-test-not-so-fast/. Acc:9.2.20 human before expanding the fields and adding more Marzano, G & Novembre, A. (2016). Machine’s that Dream: A New Challenge in Behavioral-Basic Robotics. Procedia Computer Science. 104. features towards creating a complete RHR that is 146-151. 10.1016/j.procs.2017.01.089. perceptually indivisible from a living human being. Mathieson. S (2019) Will androids ever be able to convince people they are human? Retrieved: www.researchgate.net References /publication/341756234_Mr_Robot_Will_androids_ever_be_able_to_con vince_people_they_are_human_Guardian_Online. Acc 07.07.20 AAmoth. D (2014) The Fake Kid Who Passed the Turing Test. Ret:time.com/2847900/eugene-goostman-turing-test/.Acc 5.2.20 Mori. M (1970). The Uncanny Valley. Energy, Issue 7, pp.33-35. DOI: 10.1109/MRA.2012.2192811 Aron. J (2011) AI tricks people into thinking it is human. Retrieved: newscientist.com/article/dn20865-software-tricks-people-into-thinking-it- Oppy, G. R., & Dowe, D. L. (2003). The Turing Test. Stanford ishuman/#ixzz6Ez0JgQi1. Acc: 25.02.20 Encyclopedia of Philosophy, 1(online), 1 - 26. Barnfield. N (2020) Face to Face With The Future of AI. Horizon Pereira. D (2019) You should fear Super Stupidity, not ASI Magazine. Riley Raven. DOI: https://www.staffs.ac.uk/alumni/ horizon- Retrieved:towardsdatascience.com/you-should-fear-super-stupidity-not- alumni-magazine pp.8-11 super-intelligence-19f93a46fa4d. Acc.17.02.20 Baudrillard. J (1994). Simulacra and simulation. Trans: Ann Arbor : Rapaport, W. (2000). How to Pass a Turing Test. Journal of Logic, University of Michigan Press, ISBN-10: 0472065211. Language, and Information, 9(4), 467-490. Retrieved: www.jstor.org/stable/40180238. Acc: 24.04 2020 Bringsjord, S., Caporale, C., & Noel, R. (2000). The Total Turing Test, J- LLI, 9(4), 397-418. DOI:www.jstor.org/stable/40180234 Reynolds. E (2016) Does the Fifth Amendment ‘expose a serious flaw’ in Turing Test? Retrieved:www.wired.co.uk/article/major-flaw-turing-test- Burleigh. T, Schoenherr. J, Lacroix. G (2013). Does the uncanny valley silence. Acc:25.0 2.20 exist? Computers in Human Behaviour. 29. 759-771. DOI:10.1016/j.chb.2012.11.021. Robbins. M (2014) A Machine Did not ‘Pass’ the Turing Test. Retrieved: https://www.vice.com/en_uk/article/gq8ddw/eugene-goostman-alan- Cole. E (2019) What is New in Robotics? Retrieved: turing-test-kevin-warwick. Acc: 25.02.20 blog.robotiq.com/whats-new-in-robotics-06.12.2019. A:25.02.20 Sample. I & Hern. A (2014) Scientists dispute if ‘Eugene Goostman’ Copeland. J (2014) Why Eugene Goostman Did Not Pass the Turing Test. passed Turing Test. Retrieved: www.theguardian.com/techno Retrieved: https://www.huffingtonpost.co.uk/jack- logy/2014/jun/09/scientistsdisagree-over-whether-Turing-test-has-been- copeland/turingtesteugenegoostman. Acc:25.02.20 passed. Acc. 19.04.20 Fawaz. A (2019) A tangible Turing Test. Retrieved: Schweizer. P (1998). The Truly Total Turing Test. Minds Mach. 8, 2 (May https://www.neowin.net/news/a-tangible-turing-testthe-loebner-prizeis- 1998), 263–272. DOI:10.1023/A:1008229619541 coming-to-swansea-this-weekend/. Acc: 21.02.20 Searle, J (1980). Minds, brains, and programs. Behavioural and Brain Gehl. R (2014). Teaching to the Turing Test with Cleverbot. Sciences 3 (3): 417-457, DOI: 10.1.1.83.5248. Transformations: The Journal of Inclusive Scholarship and Pedagogy, 24(1-2), 56-66. Retrieved February 25, 2020. Sharkey. N (2012) Alan Turing: The experiment that shaped AI. Retrieved: bbc.co.uk/news/technology-18475646. Acc:22.02.20 Ghose. T (2016) Robots Could Hack Turing Test by Keeping Silent. Retrieved: www.scientificamerican.com/article/robots-could-hack-turing- Stock-Homburg. R, Peters. J, Schneider. K, Prasad. V, Nukovic. L (2020) test-by-keeping-silent/. Acc: 25.02.20 Evaluation of the HTT for anthropomorphic Robots, Int Conf HRI. DOI: 10.1145/3371382.3378260 Harnad, S. (1992) The Turing Test Is Not A Trick: Turing Indistinguishability Is A Scientific Criterion. SIGART Bulletin 3(4) The World Economic Forum (2019) Can machine’s think? A new Turing (October 1992) pp. 9 - 10. Test may have the answer. Retrieved: www.weforum.org/agenda/2019/08/our-turing-test-for-androids-will- Harnad, S. (2000) Minds Machine’s and Turing, Journal of Logic, judge-how-lifelike-humanoid-robots-can-be/. Acc: 14.03.20 Language and Information, vol 9: p.425. https://doi.or g/10.1023/A:100 8315308862 Turing, A. (1950). Computing Machinery and Intelligence, Mind, (236), pp.433-460. doi.org/10.1093/mind/LIX.236.433. Hern. A (2014) What is the Turing Test? Retrieved: www.theguardian.com/technology/2014/jun/09/what-is-the-alan-turing- Wakefield. J (2019) The hobbyists competing to make AI human. test. Acc: 25.02.20 Retrieved:www.bbc.co.uk/news/technology-49578503 Acc: 25.02.20 Houser. K (2019) Advanced Robotics Forced Scientist To Invent A New Warwick, K., & Shah, H. (2015). Passing the Turing Test Does Not Mean Turing Test. Retrieved: https://futurism.com/the-byte/scientists-invented- the End of Humanity. Cognitive Computation, 8, 409-419. DOI: new-turing-test. Acc: 18.04.20 10.1007/s12559-015-9372-6 Ishiguro, H. (2005). Android science: Toward a new cross-interdisciplinary Whitby, B (1996) Reflections on AI: the legal, moral and ethical framework. J-Comp-Sci, Corpus ID: 6105971Jacquet, B., Baratgin, J., & dimensions. Intellect, Oxford, UK. ISBN 9781871516685 Jamet, F. (2019). Cooperation in Online Conversations. Journal of Psychology, 10, 727. https://doi.org/10.3389/fpsyg.2019.00727