<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Constructing Human-Robot Interaction with Standard Cognitive Architecture</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kristiina Jokinen</string-name>
          <email>kristiina.jokinen@aist.go.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AI Research Center, AIST Tokyo Waterfront</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper discusses how to extend cognitive models with an explicit interaction model. The work is based on the Standard Model of Cognitive Architecture which is extended by an explicit model for (spoken) interactions following the Constructive Dialogue Modelling (CDM) approach. The goal is to study how to integrate a cognitively appropriate framework into an architecture which allows smooth communication in human-robot interactions, and the starting point is to model construction of shared understanding of the dialogue context and the partner's intentions. Implementation of conversational interaction is considered important in the context of social robotics which aim to understand and respond to the user's needs and affective state. The paper describes integration of the architectures but not experimental work towards this goal.</p>
      </abstract>
      <kwd-group>
        <kwd>Human-robot interaction</kwd>
        <kwd>cognitive architecture</kwd>
        <kwd>constructive dialogue models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The robot agent’s communication capability can be regarded as one of the
fundamental enablements in cognitive robotics. Given the need for collaboration and
coordination of actions with the other partners as well as the agent’s self-motivated exploration
of the environment, it is necessary to be able to communicate one’s intentions, beliefs,
and desires, and for this, language is the most natural means due to its rich expressive
capabilities. In HRI, action possibilities for a human are determined by the dialogue
design and the models for processing inputs and generating responses, as well as by
the natural language capability which allows an intuitive way to interact with the
robot agent. In fact, considering the general notion of affordance, the robot’s language
capability can be said to afford intuitive interaction which is considered more usable
than simple command-based protocols [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Affordance was originally introduced by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to explain human visual capability to
recognize objects, and it was transferred to interface design by [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and finally, to
human-computer/robot interaction by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], to describe the capability of a (computer)
interface to readily suggest the appropriate way of behaviour. Affordance has also
been used for robot architectures [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ][
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to model action possibilities for a user who
wishes to interact with a robot.
      </p>
      <p>
        In the case of social robots, the use of natural language takes the interaction to a
qualitatively different level and supports the robot’s autonomous agent-like behaviour
with dialogue features such as turn-taking, feedback, and creation of common ground.
Natural dialogue interface is thus a more complex and technologically demanding
design task than simply adding speech modality to the interface, and it presupposes a
different frameset for the human user. In general, users tend to assign
anthropomorphic features to inanimate objects like personal computers even though the objects
are basically considered tools with no natural interaction capability [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Social
robots, however, have a dual character as a tool and as an interacting agent [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], so the
human-robot interaction starts to resemble human-human communicative situations.
      </p>
      <p>
        As argued in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], speech creates expectations for the system’s ability to conduct
natural language communication, and humanoid robots reinforce such expectations
with their human-like appearance, including aspects like personality [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and even
stereotypical roles and gender [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The need for natural language interaction and
affordable interfaces thus involves dialogue modelling concerning language analysis
and interpretation, and the robot agent should also be able to understand multimodal
sensory information that it receives from its environment. Conversely, it should be
able to produce behaviour that matches requirements of a relevant and coherent
response, combining spoken language and multimodality (gestures, gaze, body posture).
An important aspect of this work is to design a dialogue architecture which supports
natural interaction and allows experimentation with various multimodal modules so as
to explore human experience with humanoid robots and address larger societal needs
to find new ways to improve the robot agents’ acceptance and usability in society.
      </p>
      <p>In this paper, the discussion focuses on the architecture that supports these
requirements. Section 2 briefly describes the Constructive Dialogue Model and the
Standard Model of Cognitive Architecture. Section 3 shows the intended integration
of the models, and Section 4 provides short discussion of the topics and future work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Architectures for Cognitive Robots</title>
      <sec id="sec-2-1">
        <title>Constructive Dialogue model</title>
        <p>
          The Constructive Dialogue Model (CDM) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is a complementary architecture to
cognitive architectures (ACT-R, Soar, Standard Model [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]) which do not explicitly
concern dialogue communication. CDM can be implemented on top of the cognitive
perception-action modules as a component responsible for the higher-level reasoning
on verbal and multimodal communication. It is chosen because of its focus on natural
language dialogues and because of its links to cognitive aspects of interaction
(communicative enablements). Also, it has been used in robot applications [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ][
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
        <p>
          CDM is a conceptual and operational framework which regards conversational
interactions as cooperative activities through which the interlocutors build common
ground (cf. similar approaches in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ][
          <xref ref-type="bibr" rid="ref20">20</xref>
          ][
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]). In CDM, the participants are regarded
as rational agents, engaged in cooperative activity within which they aim to achieve
their communicative goals using dialogue acts which convey information about their
intentions and task topics. The agents exchange new information on the relevant
topics in order to construct mutual understanding and coordinate their actions (see [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
for dialogue management issues in general).
        </p>
        <p>
          Figure 1 shows how conversations progress in a cyclic manner as the participants
produce utterances and check various enablements for communication [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to maintain
interaction and monitor its progress. For instance, the agents must be in contact and
aware of the partner’s attempt to communicate, by paying attention to (multimodal)
signals that indicate their willingness to interact [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The agents must also perceive the
emitted vocal and visual signals as communicative signals, i.e. recognize them having
been produced with an intention to convey meaning. The agents must also intend to
engage themselves in the communication, i.e. make an effort to understand the
partner’s message and intentions, and to produce their own reaction. Reaction encodes
new information about the agent’s current viewpoint in verbal or physical actions. It
changes the current state of the world and requires the agents to restart their reasoning
with the new situation. The cycle continues until the conversation is finished by the
agents mutually agreeing to stop, or for another reason.
        </p>
        <p>
          Rationality refers to the agent’s ability to make decisions and deliberate on
situationally appropriate actions (in AI, such agents have been called BDI agents), and it
also considers the agent’s affective state which influences the agent’s reasoning.
Emotions [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] are not explicitly represented in the architecture, since they are assumed
to be manifestations of the agent’s internal state: the levels of arousal and valence of
the agent’s affects are inherent to the agent’s general activity rather than computed by
a particular emotion component. In fact, emotional activity can be regarded as one of
the connection points of CDM to the cognitive architectures under the assumption that
the processing of input signals results in an internal state which determines the
emotional quality of the agent’s response.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Cognitive Robotics Models</title>
        <p>
          Two cognitive models are shown in Figure 2: ACT-R [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and the Standard Model
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (we do not discuss the third main framework, SOAR here). The focus is to model
human behaviour based on the perception of the environment and the (motor) actions
that the agent can take as the result of its reasoning. Consequently, studies have dealt
with the visual and auditive systems and their functionality in the context of
shortterm (working) and long-term (declarative) memory. Important research has also
focused on linguistic resources and knowledge representation for reasoning and
conceptual categorization tasks. For instance, the integrated knowledge representation
system Dual-PECCS [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] uses two different sorts of common-sense reasoning,
prototypical and exemplars-based categorization, to allow knowledge acquisition and
development of Conceptual Space representations for a variety of tasks.
        </p>
        <p>
          Knowledge representation is an important aspect of cognitive processing and has
been a topic of much debate (centralized or distributed processing, symbolic or
connectionist representation, procedural or declarative knowledge, etc.). From the
dialogue point of view, there is a need for uniform representation of the meaning in order
to allow higher-level modules to operate on meaningful chunks of the incoming
information. While the overall view of the cognitive models involves two memory
components (Working Memory and procedural/declarative Long-term Memory), it is
likely that some kind of hybrid knowledge representation is needed to toggle between
declarative and procedural knowledge in the system’s working memory. Moreover,
since the agent needs to ground its knowledge in the physical world [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], an interim
representation seems necessary to connect the concepts stored in the agent’s memory
to its dynamic perception of the world. In fact, in recent years, the ProxyType Theory
[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] has been proposed to cater for heterogeneity in concept representations and to
address issues concerning the interaction between Long-term and Working Memory.
According to the theory, the process of proxification manages conceptual structures
into temporary constructs in Working Memory using heterogeneous representations:
activation of a concept category in Long-term Memory (which contains networks of
representations) results in the concept’s activation as token representation in Working
Memory (as a “proxy” for the concept). Concept categories are complex networks of
(neural) activations among the network elements, and they are constructed over time
via perceptual interactions of the agent with the environment which results in repeated
activations of relevant elements in the connected networks. (The network elements
are causally connected since activation of an element will cause the activation of the
connected elements in Long-term Memory, and the tokening of the concept category
in Working Memory.)
        </p>
        <p>
          The ultimate goal for a robot agent is to learn via experience and be able to adapt
to the dynamically changing world. The agent’s continuous learning of new concepts
and skills is an important part of interaction management and allows the agent to
coordinate action in order to adjust to its environment. However, the effectiveness of
interactive learning depends on the quality of the interactions. So far most frequently
used settings have included designer-controlled ways of interaction which are based
on linear learning and scripted interaction sequences. Free natural dialogue
interactions provide new challenges by leveraging deep interactive learning and building of
competences through interactions: besides the technically demanding aspects related
to recognition and processing of various multimodal signals, such interactions
presuppose understanding of the partner’s intentions and partly developed skills, as well
as social aspects of interaction. They require specific models for interaction through
which social behaviours are learned and saved as persistently growing experience of
the world. The goal also incorporates the issues studied in Theory of Mind [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] to
construct a shared context for mutual understanding and shared context, which have
also been some of the main issues in cooperative dialogue approaches.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Integrated Architecture</title>
      <p>While dialogue modelling also subscribes to the goals related to knowledge
representation and learning, the main focus is on the models of interaction management. Given
the cascaded model of communicative enablements as presented above with Contact,
Perception, Understanding, and Reaction, it is easy to see how to extend the Standard
Cognitive Architecture by the CDM dialogue component which deals with the
processing and management of interaction. The integrated architecture in Figure 3 shows
the basic requirements for spoken dialogue models and integration points with CDM.</p>
      <p>
        The integration points for the Perception and Motor control components of the
Standard Architecture are the CDM modules related to the enablements of Perception and
Reaction, whereas Long-Term Memory (both declarative and procedural) and
Working Memory correlate with the modules in CDM Understanding. The detailed CDM
Understanding modules encode the system’s procedural knowledge for the analysis of
the user’s behaviour and deciding what to do next, as well as modules for three types
of knowledge: the CDM Knowledge Base stores the robot’s (long-term) knowledge of
the domain, of the user, and of the world in general, while the CDM Memory stores
the system’s knowledge of the past dialogue events it has been engaged in, and the
CDM Context models the immediate dialogue context and the (short-term) state of
attention of the agent. It is worth noticing that the architecture makes an explicit
distinction between semantic and episodic memory: semantic memory is scattered
among the system components (e.g. language grammar belongs to the NLP Module
and planning rules to the Planning Modules), whereas the CDM Memory is episodic
and refers to a designated part of the agent’s knowledge where the previous sessions
with a particular user are saved, and from where chunks of knowledge are retrieved
for dialogue processing if the partner is identified as a returning user to interact with.
All knowledge sources are connected to the other processing modules via Ontology,
which provides semantic links between the Knowledge Base entities and linguistic
concepts. As in Dual-PECCS [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], it is also possible to use other linguistic resources
to provide an interface between the linguistic and the conceptual knowledge.
      </p>
      <p>
        The double nature of the robot as an agent and as a computer system sets
requirements for the dialogue model. As an agent, the robot is perceived as a communicating
partner, and as a computer system, it has access to vast digital information which it
can also share with other agents through its connection to Internet (IoT [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]). The
integrated architecture described above does not specifically deal with interactions in
the ubiquitous environment, but it is possible to include sensor information as input
through specific perception devices, and then visualise the data and process it as
normal (cf. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]). However, the ubiquitous environment can also drastically change the
knowledge available for the agent, e.g. digital database is modified according to new
data, and in these cases, the robot agent needs to possess procedural knowledge of
how to cope with unexpected, unspecified, or underspecified situations. This paper
does not discuss these issues but emphasises that probabilistic modelling of
knowledge together with the agent’s capability to learn are crucial in their realisation.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Future Work</title>
      <p>
        In the field of Human-Robot Interaction (HRI), one of the important and much
discussed topics is the notion of Uncanny Valley [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], whereby the robot’s human-like
appearance is correlated with the acceptance of the robot as an interactive partner. At
one end, we have robot agents which look and behave like human agents, while at the
other end, the interaction partner is clearly a non-human agent which may exhibit
different levels of human-likeness. The hypothesis states that the acceptance of agent
applications increases when going from less human-like agents towards close to
human-level behaviour, but there is a sudden drop in the acceptance when the robot
agent reaches almost the same level of behaviour as the human. The Uncanny Valley
phenomenon has since been shown to appear as a result of a mismatch in cognitive
categorization [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] of what is considered similar to but not exactly the same as the
prototypical conversing agent (namely the human). Contradiction between typical
members of a class and entities which deviate from them usually causes
uncomfortableness, fear and resistance, and explains why talking robots create similar reactions.
In HRI, such cognitive mismatches are commonly triggered by the robot’s appearance
and look, but also by its capability to interact with humans.
      </p>
      <p>The proposed architecture is considered a valuable first step to achieve natural
dialogue interactions in HRI, and make the robot behave in a more natural manner. There
are several aspects that can be further specified to experimentally validate the
architecture and make the contributions visible, especially with respect to the integration of
cognitive architectures into the CDM dialogue model. On the theoretical side,
appropriate knowledge representation and integration of multimodal input (gestures,
eyegaze) will be elaborated, and the ProxyType theory of concept representation will be
investigated further. On the practical robotics side, the questions of how to develop
socially competent robots and use novel AI technology to alleviate problems in the
modern society will be explored.</p>
      <p>Future work will proceed using a top-down and bottom-up (TDBU) methodology:
this aims to combine the theoretical model of interaction (top-down) with the
automatic recognition techniques and data analysis (bottom-up). The TDBU methodology
uses novel technology to provide an objective basis for detecting and segmenting
elements in the interaction flow, while the theoretical views of human observations
and annotations are used for the interpretation and parameter setting. Speech
recognizers, parsers, eye-trackers, movement detectors, etc. are used to segment signals and
provide bottom-up knowledge to trace gaze, face, and body, while the theoretical
view of dialogue modelling and communicatively important signals are used to
explore meaningful correlations and regularities in the (big) data. Deep learning
techniques and statistical correlations are used to develop such models.</p>
      <p>To explore how social robots can assist humans in various every-day tasks, the
work will continue in an interdisciplinary manner using experimental methods from
cognitive and social sciences to study user experience and engagement in social
human-robot situations. Experimental design can consist of different types of robot
agents (“personalities”) and of strategic profiling with respect to such issues as active
narration vs. passive guidance, use of gestures, amount of feedback, etc. to compare
the user’s experience and engagement with the robot agent in the selected scenarios.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The author wishes to thank colleagues for discussions on ontologies and dialogue
modelling, and Antonio Lieto for discussions related to Dual-PECCS. The study is
based on the results obtained from a project commissioned by the New Energy and
Industrial Technology Development Organization (NEDO).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Allwood</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1976</year>
          ).
          <article-title>Linguistic Communication as Action and Coordination</article-title>
          . Gothenburg Monographs in Linguistics,
          <volume>2</volume>
          , Göteborg.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bothell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Byrne</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douglass</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lebiere</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>An integrated theory of the mind</article-title>
          .
          <source>Psychological review</source>
          ,
          <volume>111</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1036</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Barret</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haviland-Jones</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          (
          <year>2008</year>
          , Eds.) Handbook of Emotions. Guildford Press, New York.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>H. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaefer</surname>
            ,
            <given-names>E. F.</given-names>
          </string-name>
          (
          <year>1987</year>
          ).
          <article-title>Collaborating on contributions to conversation</article-title>
          .
          <source>Language and Cognitive Processes</source>
          ,
          <volume>2</volume>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Feldman</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rim</surname>
            <given-names>B.</given-names>
          </string-name>
          (
          <year>1991</year>
          ).
          <source>Fundamentals of Nonverbal Behavior</source>
          . Cambridge Univ. Press
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          (
          <year>1979</year>
          ).
          <article-title>The Ecological Approach to Visual Perception</article-title>
          . Houghton Mifflin.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Harnad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1990</year>
          ).
          <article-title>The symbol grounding problem</article-title>
          .
          <source>Physica D</source>
          <volume>42</volume>
          :
          <fpage>335</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jokinen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Constructive Dialogue Modelling - Speech Interaction with Rational Agents</article-title>
          . John Wiley, Chichester, UK.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jokinen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Dialogue Models for Socially Intelligent Robots</article-title>
          .
          <source>The 10th International Conference on Social Robots</source>
          , Qingdao, China.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jokinen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McTear</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Spoken Dialogue Systems</article-title>
          . Morgan and Claypool.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jokinen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nishimura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watanabe</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nishimura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Human-Robot Dialogues for Explaining Activities</article-title>
          .
          <source>Proceedings of IWSDS-2018</source>
          , Singapore.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jokinen</surname>
          </string-name>
          . K.,
          <string-name>
            <surname>Wilcock</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Multimodal Open-Domain Conversations with the Nao Robot</article-title>
          . In: Natural Interaction with Robots,
          <source>Knowbots and Smartphones: Putting Spoken Dialogue Systems into Practice</source>
          , pages
          <fpage>213</fpage>
          -
          <lpage>224</lpage>
          . Springer, New York.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Laird</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          <string-name>
            <surname>Lebiere</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenbloom</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A Standard Model of the Mind: Toward a Common Computational Framework Across Artificial Intelligence</article-title>
          , Cognitive Science, Neurosciene, and Robotics.
          <source>AI Magazine</source>
          <volume>38</volume>
          (
          <issue>4</issue>
          ):
          <fpage>13</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lieto</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radicioni</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rho</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Dual PECCS: a cognitive system for conceptual representation and categorization</article-title>
          .
          <source>Journal of Experimental &amp; Theoretical Artificial Intelligence</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Marin-Urias</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sisbot</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pandey</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tadakuma</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alami</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Towards shared attention through geometric reasoning for human robot interaction</article-title>
          .
          <source>In: Humanoids 2009. The 9th IEEE-RAS International Conference on Humanoid Robots</source>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>A Bayesian explanation of the 'Uncanny Valley' effect and related psychological phenomena</article-title>
          . http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3499759/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Moratz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tenbrink</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Affordance-Based Human-Robot Interaction</article-title>
          .
          <source>Towards Affordance-Based Robot Control. Lecture Notes in Computer Science</source>
          4760 pp.
          <fpage>63</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mori</surname>
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1970</year>
          ).
          <source>The Uncanny Valley. Energy</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <fpage>33</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Norman</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          (
          <year>1988</year>
          ).
          <article-title>The Psychology of Everyday Things</article-title>
          . Basic Books: New York.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Nooraei</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rich</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>A Real-Time Architecture for Embodied Conversational Agents: Beyond Turn-Taking</article-title>
          .
          <source>The 7th Int. Conf. on Advances in ComputerHuman Interactions.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Okada</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>L.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aran</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatica-Perez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Modeling Dyadic and Group Impressions with Intermodal and Interpersonal Features</article-title>
          .
          <source>ACM Transactions on Multimedia Computing Communication Applications</source>
          <volume>15</volume>
          , 1s,
          <source>Article</source>
          <volume>13</volume>
          (
          <year>January 2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Prinz</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Furnishing the mind: Concepts and their perceptual basis</article-title>
          . MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Reeves</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nass</surname>
            <given-names>C.</given-names>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>The Media Equation: How people treat computers, television, and new media like real people and places</article-title>
          . New York: Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>I. G</given-names>
          </string-name>
          . (Ed.
          <year>2012</year>
          ).
          <source>The Internet of Things</source>
          <year>2012</year>
          :
          <article-title>New Horizons. IERC-Internet of Things European Research Cluster</article-title>
          . Halifax, U.K.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Tay</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jung</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>When stereotypes meet robots: The double-edge sword of robot gender and personality in human-robot interaction</article-title>
          .
          <source>Computers in Human Behaviour</source>
          , Vol
          <volume>38</volume>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Traum</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Discourse obligations in dialogue processing</article-title>
          .
          <source>Proceedings of the 32nd Annual Meeting of ACL</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . Morristown, NJ, USA.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Wilcock</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jokinen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Multilingual WikiTalk: Wikipedia-based talking robots that switch languages</article-title>
          .
          <source>Proceedings of the SIGDIAL 2015 Conference</source>
          , pp.
          <fpage>162</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Wimmer</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1983</year>
          ).
          <article-title>Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception</article-title>
          .
          <source>Cognition</source>
          <volume>13</volume>
          :
          <fpage>103</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Yoshida</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nishimura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jokinen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Biomechanics for understanding movements in daily activities</article-title>
          .
          <source>Procs of the LREC Workshop “Language and Body in Real Life”.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>