<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards A Dual Process Approach to Computational Explanation in Human-Robot Social Interaction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Agnese Augello</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ignazio Infantino</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Lieto</string-name>
          <email>lieto@di.unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Umberto Maniscalco</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Pilato</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filippo Vella</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica, University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of High Performance Computing and Networking, National Research Council, ICAR-CNR</institution>
          ,
          <addr-line>Palermo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The capacity for AI systems of explaining their decisions represents nowadays a huge challenge for both academia and industry (e.g. let us think at the autonomous cars sector). In this paper we sketch a preliminary proposal suggesting the adoption of a dual process approach for computational explanation. Our proposal is declined in the field of Human-Robot Social Interaction; namely, in a gesture recognition task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The capability for AI systems of providing an explanation
about the reasons guiding their decisions, represents a crucial
challenge and research objective in the current fields of
Artificial Intelligence (AI) and Computational Cognitive Science
[Langley et al., 2017]. Current AI systems, in fact, despite the
enormous progresses reached in specific fields, mostly fail
to provide a transparent account of the reasons determining
their behavior (both in cases of a successful or unsuccessful
output). This is due to the fact that the adoption of current
Machine Learning and Deep Learning techniques faces the
classical problem of opacity in neural networks; furthermore,
this problem explodes with the current techniques1. In our
opinion a possible way to deal with this problem is based on
a dual process approach.</p>
      <p>
        The dual process theories of mind [Evans and Frankish,
2009; Stanovich and West, 2000; Kahneman, 2011] is an
experimentally grounded theory proposed in the field of
psychology of reasoning. It suggests that our cognition is
governed by two types of interacting cognitive systems, which
are called respectively system(s) 1 and system(s) 2. Systems
of the type 1, referred also as S1, operate with rapid,
automatic, associative processes of reasoning. They are
phylogenetically older and execute processes in a parallel and fast
way. Type 2 systems, referred also as S2, are, on the other
hand, phylogenetically more recent and they are based on
1Despite some proposals exists in the literature reporting sparse
cases where a partial interpretation of the operations of the units
is possible [Zhou et al., 2015] as well as alternative proposals are
available in order to reduce the opacity in neural networks[Lieto et
al., 2017a], the general problem still remains unsolved
conscious, controlled, sequential processes (also called type
2 processes) and on logic-based rule following. As a
consequence, if compared to system 1, system 2 processes are
slower and cognitively more demanding. In the dual process
perspective, then, decision making consists in a two-step
procedure based on the interaction between heuristic,
perceptionguided (and biased) thinking (type 1 processes), with forms
of deliberative thinking based on the canons of normative
rationality (and on type 2 processes). In recent years, the
cognitive modeling and the AI community have posed a
growing attention on the dual process theories as a framework for
modeling artificial cognition. Efforts have been made, for
example, in the areas of knowledge representation and
reasoning [Frixione and Lieto, 2014], cognitive systems
dealing with arithmetical calculations [Strannega˚rd et al., 2013],
cognitive models of emotions [Larue et al., 2012], question
answering for common-sense linguistic descriptions
        <xref ref-type="bibr" rid="ref10 ref12 ref14 ref15 ref16 ref24">([Lieto
et al., 2015], [Lieto et al., 2017b])</xref>
        , computational creativity
[Augello et al., 2016b] as well as in the design of general
purpose cognitive architectures, such as CLARION, whose
principles are explicitly inspired by such a theoretical framework
[Sun, 2006].
      </p>
      <p>In this position paper, we propose to consider a dual
process approach to deal with the problem of computational
explanation (or, at least, with one facet of this problem). In
doing so we propose to endow a social robot with a
computational explanation module based on different components:
a S1 and a S2 one. The S1 module is responsible for the
fast categorization and for the perceptual based recognition
of gestures in a social context (and is based on deep neural
network architecture) while the S2 component is responsible
for providing a high level model that can be exploited to
extract an explanation about the ’reasons’, e.g. the high level
features, that characterize the categorized output provided by
S12. Such descriptive model, based on explicit
representations, can be used to formulate plausible forms of
explanation about the features/properties that are usually considered
to lead to the particular classification of gestures provided by
the, opaque, S1 component. In particular, S2 exploits an
ontology that describes the features characterizing the
follow2Currently the S2 component is activated on demand. On the
difficulty concerning the question about ’when’ the S2 process is
activated we refer to [Lieto et al., 2018]).
ing actions: “Bowing”, “Clapping”, “Handshaking”,
“Punching”, “Slapping” and “Frontkicking”. The first three of them
are grouped into a “meta-class” of normal actions, while the
remaining three are grouped as aggressive ones. Once the S1
module gives its output, the S2 module acquires the S1
outcome and tries to explain, if required by the user, what
characterizes the perceived action according to the knowledge coded
into the ontology.</p>
      <p>In the near future the authors will implement the proposed
approach in a Pepper robot, improving also the interaction
between S1 and S2. In the next sections we provide some
additional details of the proposed framework.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Framework</title>
      <p>Figure 1 shows a simplified schema of the proposed
framework. An Artificial Intelligence system (AI Module) that
drives robot social activities is responsible for the perception,
processing, and action of the social robot (S1). The
perception capabilities of the robot have to detect relevant features
of human social behavior.</p>
      <p>
        RGB cameras, microphones, lasers, sonars, depth cameras,
and other sensors allow the robot to capture different aspects
of the human action and the context. Data arising from each
sensor require complex computation and a different level of
abstraction to produce an appropriate robot reaction. The
observed robot behavior, in part designed by programmers,
could be difficult to understand to the final user because it
depends on how the robot interprets the real perception.
As reported in Figure 1, many social relevant entities should
be processed to determine a realistic robot social behavior:
speech, facial expressions, sounds, but also environment
features that influence the social context
        <xref ref-type="bibr" rid="ref7 ref8">(see for example
[Infantino et al., 2008], [Infantino et al., 2007])</xref>
        .
      </p>
      <p>The robot typically interacts with the human by a verbal
output and postures (Animated Say). The social interaction
is subject to both an internal and external evaluations. The
internal assessment considers the robot aim that consists of
detecting a desired human state. The external evaluation is
directly given by the human user involved in the interaction
or by an observer. Furthermore, if required, the robot should
enable the computational explanation process that justifies its
verbal outputs and actions.</p>
      <p>In this proposal we focus the attention on a single perceptual
input, given by an RGBD camera.This device is used to catch
human social signs occurring during the interaction between
the robot and an human being.</p>
      <p>The infrared laser of the RGBD camera and its receiver
detect a set of 3D points, from which the human skeleton is
extracted. The analysis of the temporal evolution of skeleton
joints allows the system to detect relevant action (i.e. a
couple constituted by an initial posture and a final one). A
deep neural network (DNN) approach tries to classify the
social sign. Such machine learning methodology requires
a training using a given dataset of postures. Naturally, the
learning phase determines the interpretation of the detected
social signs, and the subsequent computational processes to
decide robot reactions.</p>
      <p>
        The reasoning, execution and evaluation module together
with perceptive capabilities is organized following a dual
process theory paradigm
        <xref ref-type="bibr" rid="ref1 ref2">(see for example [Augello et al.,
2016b])</xref>
        that, in the present setting, has been extended and
used to enable a computational explanation subsystem.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Social interaction scenario</title>
      <p>During a social interaction, the involved agents usually follow
“social practices” [Reckwitz, 2002], i.e. routinized behaviors
depending on the social context, the mutual expectations, and
the pursued aims. A “social intelligent” robot must be able
to properly manage these practices, understanding the social
situation and consequently planning and adapting its behavior
[Dignum et al., 2014] [Augello et al., 2016a]. A key role in
the understanding of the situation is given by the
interpretation of all the possible social signs expressed, both in a verbal
and not verbal ways, by the agents involved in the
interaction. The interpretation of these signals, and in particular the
not verbal ones, allows an agent to recognize and understand
the intentions, emotions and attitudes of people.</p>
      <p>Let us focus on a practice of reception in a public office,
considering the task of welcoming visitors in the waiting room
and directing them to proper office rooms.</p>
      <p>In this scenario, the robot must be able to discriminate the not
appropriate behaviors of the visitors. For example, someone
can become unsettled if he must wait too much in the
waiting room or if there is some problem with the appointment.
The robot learns how to detect not appropriate and in
particular aggressive behaviors, by examining the postures and the
gestures of people during a training phase. Then, during the
interaction, considering its expectations and its experience, he
must be able to quickly recognize the exhibited social signs.
Then, if required, he must be able to provide an explanatory
account of some sort of this process of interpretation.
Classification and explanation are respectively accomplished
through S1 and S2 processes and components as briefly
described in the following sections.
3.1</p>
      <sec id="sec-3-1">
        <title>Social Signs Interpreation Process</title>
        <p>The robot is capable to classify the social signs represented
through 3D human postures by using a Deep Neural Network
approach. An RGBD device captures the human skeleton as
a spatial localisation of the relevant joints.</p>
        <p>In the last years the techniques of deep learning have given
a strong impulse for the machine learning algorithms. New
achievements have been shown in the generalization
capabilities of the input pattern and the discrimination of different
classes. These networks can learn to discriminate a big set of
data with their labels.</p>
        <p>The great advantage of the deep networks is that the first
layers of the network, if suitably trained, can automatically
extract features allowing a robust representation of the input
pattern. In this way, instead of using hand crafted features,
the raw data can be processed creating a good classifier with
features learned from data [LeCun et al., 2015]. Beyond the
possibility of extracting spatial features for the recognition
of patterns in bidimensional inputs, deep networks can also
be used for the processing and classification of sequence of
data. In these cases, the network, through recurrent
connections can maintain memory of the past history and compute
the input accordingly. An example of these kind of networks
are the Long Short Term Memory networks that are used in
the proposed 2 system.</p>
        <p>LSTMs have been designed by Hochreiter and
Schmidhuber [Hochreiter and Schmidhuber, 1997] with the aim of
avoiding the long-term dependency problem, at the price of
a more complex cell structure. The key feature of LSTMs is
the “cell state” that is propagated from a cell to another. State
modifications are regulated by three structures called gates,
composed out of a sigmoid neural net layer and a pointwise
multiplication operation. Let Ct = ft Ct 1 + it C~t the
cell state at time t; the first gate, called “forget gate layer”,
considers both the input xt and the output from the
previous step ht 1, and returns values between 0 and 1,
describing how much of each component of the old cell state Ct 1,
where should be left unaltered: if the output is 0, no
modification is made; if the output is one, the component is
completely replaced. New information to be stored in the state
is processed afterwards. The second sigmoid layer, called
the input gate layer decides which values will be updated.
Next, a tanh layer creates a vector of new candidate values,
C~t = tanh(WC [ht 1; xt] + bc), that could be added to the
state. To perform a state update, Ct 1 is first multiplied by
the output of the forget gate ft = (Wf [ht 1; xt] + bf ),
and the result is added to the pointwise multiplication of the
input gate output it = (Wi [ht 1; xt] + bi) and C~t.
Finally, the output ht = ot tanh(Ct) can be generated, where
ot = (Wo [ht 1; xt] + bo). First, a sigmoid is applied,
taking into account both ht 1 and xt; its output is then
multiplied by a constrained version of Ct, so that we only output
the parts we decided to. In [Pascanu et al., 2013] it is given
a detailed theoretical explanation of the reasons behind the
advantages of using a network made of multiple layers. In
our scenario we have chosen to gradually stack LSTM
layers and measure the trend of the F1-score to determine what
the correct number of layers can be. Each LSTM layer is
separated from the next one by a ReLU function. In
addition, given a sequence length, we attempted to determine how
many neurons are needed for the representation to be of good
quality. To speed up the information acquisition task, to train
the network, has been used a dataset formed by a set of
actions divided in two macro classes dealing with aggressive
or non aggressive behavior [Theodoridis and Hu, 2007]. The
dataset has been created by monitoring, by using proper
sensors, placed in several body parts, the free movements of ten
people in front of a camera or in front of a standing bag. The
actions of the dataset, with twenty different labels, are divided
between actions for the normal behavior and actions for “not
friendly” behavior.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>The social sign dataset</title>
        <p>For the experiments we have used a subset of the Vicon
Physical Action dataset first used in [Theodoridis and Hu, 2007]
and made available through [Lichman, 2013].
10 subjects (7 men and 3 women) have been recorded while
performing 20 actions, each accounting for 10 normal and 10
not friendly activities. For our setup, a subset of the actions,
more inherent to the considered context, has been selected,
composed of three normal actions (Bowing, Clapping and
Handshaking) and three aggressive actions (Punching,
Slapping and Frontkicking). The training has been performed on
nine subjects, while testing is done on the tenth, last subject.
We have tested with the following variations: the number of
neurons in the LSTM layers have been set to the values 64,
128 or 256; one, two or three stacked LSTM levels have been
considered and the sliding window have been set to a value
from 2 to 20. The training has been performed for 10 epochs.
After 10 epochs, the accuracy has already approached 1, so
we choose to stop.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>A Partial Explanatory Account via Ontology</title>
        <p>In the current, preliminary, version of our system, the S2
component is based on an a general ontological model
representing the main perceptual differences between different
classes of gestures (e.g. aggressive vs not aggressive ones).
The representational differences between the two
ontological states is exploited to provide an high level description
about the reasons leading to categorize a particular perceived
gesture as being aggressive or not. The ontological domain
features considered to distinguish among these two classes
are, for example, velocity of the gesture execution
(usually aggressive behavior proceeds in a fast way), distance
of the final gesture position from the body etc3. The
current version of the ontology is available, also in a
navigable format, at http://www.di.unito.it/˜lieto/
ExpActOnto.html In addition, the S2 component also
takes into account the analysis of the training of the S1
neural component (given a specific dataset). In particular, since
the Neural Network used in S1 is capable of categorizing a
gesture into a set of 20 kinds of actions, while the current
ontology is capable to describe only 6 among them, we compute
the similarity/dissimilarity of the outcome given by S1 with
respect to the six actions modeled in S2. This allows us to
attribute the unknown perceived gesture to a known one,
following a typical Case-Based Reasoning approach.
The S2 component allows also to model the differences
between gestures of the same meta-level classes (i.e. the
aggressive and normal ones). These sub-models allow to
repre3In other words: we try to provide an explanatory account of the
output of the opaque S1 component by using an apriori ontological
model of a given situation.
sent in more detail the differences of similar gestures and can
be used to described why a particular sign, e.g. firstly
categorized as ’aggressive’, has been additionally recognized,
for example, as a ’Punching’. Figure 3 shows an example of
this kind of explanation. Namely: it shows that a ’Punching’
Action is characterized by the fact of being an action
executed at a certain velocity (X), categorized as ’High
Velocity’, and at a certain distance (Y) from the Body, categorized
as ’Close Distance’ according to the ontology. In addition
to these traits, common to all the ’Aggressive Actions’, the
’Punching’ action is also characterized by the fact of being
executed with ”Close Hands”. Figure 4, finally, provides an
additional model-based explanation about why the previous
’Punching’ cannot be classified, for example, as a ’Slapping’
(both ’Slapping’ and ’Punching’ are ’Aggressive Actions’).
Also in this case the fact that the detected body part
executing the gesture is a ’Close Hand’ and not a ’Open Hand’ (as
in the case of ’Slapping’) represent a crucial element for
explaining the reason leading to that categorization decision.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>In this paper we have sketched a preliminary account of a dual
process based framework able to provide a partial explanation
of the reasons driving a robotic system to some decisions in
task of gesture recognition is a social scenario. In the near
future we plan to evaluate in detail the feasibility of the
proposed framework with a Pepper robot. In addition, as mid
term goal, we plan to extend the level of detail of the
possible explanation provided by such framework by considering
more complex scenarios and a multimodal interaction
involving both visual and linguistic elements. Finally, we plan to
provide a tighter integration of the two software components
that, currently, operate in a relatively independent way.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Augello et al., 2016a]
          <string-name>
            <given-names>Agnese</given-names>
            <surname>Augello</surname>
          </string-name>
          , Manuel Gentile, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Dignum</surname>
          </string-name>
          .
          <article-title>Social agents for learning in virtual environments</article-title>
          .
          <source>In Games and Learning Alliance</source>
          , pages
          <fpage>133</fpage>
          -
          <lpage>143</lpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Augello et al., 2016b]
          <string-name>
            <given-names>Agnese</given-names>
            <surname>Augello</surname>
          </string-name>
          , Ignazio Infantino, Antonio Lieto, Giovanni Pilato, Riccardo Rizzo, and
          <string-name>
            <given-names>Filippo</given-names>
            <surname>Vella</surname>
          </string-name>
          .
          <article-title>Artwork creation by a cognitive architecture integrating computational creativity and dual process approaches</article-title>
          .
          <source>Biologically Inspired Cognitive Architectures</source>
          ,
          <volume>15</volume>
          :
          <fpage>74</fpage>
          -
          <lpage>86</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Dignum et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Virginia</given-names>
            <surname>Dignum</surname>
          </string-name>
          , Catholijn Jonker, Rui Prada, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Dignum</surname>
          </string-name>
          .
          <article-title>Situational deliberation; getting to social intelligence</article-title>
          .
          <source>Computational Social Science and Social Computer Science: Two Sides of the Same Coin</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Evans and Frankish</source>
          , 2009]
          <article-title>Jonathan St BT Evans and Keith Ed Frankish. In two minds: Dual processes and beyond</article-title>
          . Oxford University Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Frixione and Lieto</source>
          , 2014]
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Frixione</surname>
          </string-name>
          and Antonio Lieto.
          <article-title>Towards an Extended Model of Conceptual Representations in Formal Ontologies: A Typicality-Based Proposal</article-title>
          .
          <source>Journal of Universal Computer Science</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <fpage>257</fpage>
          -
          <lpage>276</lpage>
          ,
          <year>March 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Hochreiter and Schmidhuber</source>
          , 1997]
          <article-title>Sepp Hochreiter and Ju¨rgen Schmidhuber. Long short-term memory</article-title>
          .
          <source>Neural computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Infantino et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>Ignazio</given-names>
            <surname>Infantino</surname>
          </string-name>
          , Riccardo Rizzo, and
          <string-name>
            <given-names>Salvatore</given-names>
            <surname>Gaglio</surname>
          </string-name>
          .
          <article-title>A framework for sign language sentence recognition by commonsense context</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>Applications</article-title>
          and Reviews),
          <volume>37</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1034</fpage>
          -
          <lpage>1039</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Infantino et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Ignazio</given-names>
            <surname>Infantino</surname>
          </string-name>
          , Carmelo Lodato, Salvatore Lopes, and
          <string-name>
            <given-names>Filippo</given-names>
            <surname>Vella</surname>
          </string-name>
          .
          <article-title>Human-humanoid interaction by an intentional system</article-title>
          .
          <source>In Humanoid Robots</source>
          ,
          <year>2008</year>
          .
          <source>Humanoids</source>
          <year>2008</year>
          . 8th IEEE-RAS International Conference on, pages
          <fpage>573</fpage>
          -
          <lpage>578</lpage>
          . IEEE,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Kahneman</source>
          , 2011]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kahneman</surname>
          </string-name>
          .
          <article-title>Thinking, fast and slow</article-title>
          .
          <source>Macmillan</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>[Langley</surname>
          </string-name>
          et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Pat</given-names>
            <surname>Langley</surname>
          </string-name>
          , Ben Meadows, Mohan Sridharan, and
          <string-name>
            <given-names>Dongkyu</given-names>
            <surname>Choi</surname>
          </string-name>
          .
          <article-title>Explainable agency for intelligent autonomous systems</article-title>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Larue et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Othalia</given-names>
            <surname>Larue</surname>
          </string-name>
          , Pierre Poirier, and
          <string-name>
            <given-names>Roger</given-names>
            <surname>Nkambou</surname>
          </string-name>
          .
          <article-title>A cognitive architecture based on cognitive/neurological dual-system theories</article-title>
          .
          <source>In International Conference on Brain Informatics</source>
          , pages
          <fpage>288</fpage>
          -
          <lpage>299</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [LeCun et al.,
          <year>2015</year>
          ] Yann LeCun, Yoshua Bengio, and
          <string-name>
            <given-names>Geoffrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Deep learning</article-title>
          .
          <source>Nature</source>
          ,
          <volume>521</volume>
          (
          <issue>7553</issue>
          ):
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[Lichman</source>
          , 2013]
          <string-name>
            <surname>M. Lichman.</surname>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Lieto et al.,
          <year>2015</year>
          ] Antonio Lieto, Andrea Minieri, Alberto Piana, and
          <string-name>
            <given-names>Daniele P.</given-names>
            <surname>Radicioni</surname>
          </string-name>
          .
          <article-title>A knowledge-based system for prototypical reasoning</article-title>
          .
          <source>Connection Science</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Lieto et al.,
          <source>2017a] Antonio Lieto</source>
          , Antonio Chella, and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Frixione</surname>
          </string-name>
          .
          <article-title>Conceptual spaces for cognitive architectures: A lingua franca for different levels of representation</article-title>
          .
          <source>Biologically Inspired Cognitive Architectures</source>
          ,
          <volume>19</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Lieto et al., 2017b] Antonio Lieto, Daniele P Radicioni, and
          <string-name>
            <given-names>Valentina</given-names>
            <surname>Rho</surname>
          </string-name>
          .
          <article-title>Dual peccs: a cognitive system for conceptual representation and categorization</article-title>
          .
          <source>Journal of Experimental &amp; Theoretical Artificial Intelligence</source>
          ,
          <volume>29</volume>
          (
          <issue>2</issue>
          ):
          <fpage>433</fpage>
          -
          <lpage>452</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Lieto et al.,
          <year>2018</year>
          ] Antonio Lieto,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Lebiere</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Oltramari</surname>
          </string-name>
          .
          <article-title>The knowledge level in cognitive architectures: Current limitations and possible developments</article-title>
          .
          <source>Cognitive Systems Research</source>
          ,
          <volume>48</volume>
          :
          <fpage>39</fpage>
          -
          <lpage>55</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Pascanu et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Razvan</given-names>
            <surname>Pascanu</surname>
          </string-name>
          , Guido Montu´far, and Yoshua Bengio.
          <article-title>On the number of inference regions of deep feed forward networks with piece-wise linear activations</article-title>
          .
          <source>CoRR, abs/1312.6098</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Reckwitz</source>
          , 2002]
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Reckwitz</surname>
          </string-name>
          .
          <article-title>Toward a theory of social practices: A development in culturalist theorizing</article-title>
          .
          <source>European journal of social theory</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>243</fpage>
          -
          <lpage>263</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Stanovich and West</source>
          , 2000] Keith E Stanovich and Richard F West.
          <article-title>Advancing the rationality debate</article-title>
          .
          <source>Behavioral and brain sciences</source>
          ,
          <volume>23</volume>
          (
          <issue>05</issue>
          ):
          <fpage>701</fpage>
          -
          <lpage>717</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Strannega˚rd et al.,
          <year>2013</year>
          ]
          <article-title>Claes Strannega˚rd, Rickard von Haugwitz, Johan Wessberg</article-title>
          , and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Balkenius</surname>
          </string-name>
          .
          <article-title>A cognitive architecture based on dual process theory</article-title>
          .
          <source>In International Conference on Artificial General Intelligence</source>
          , pages
          <fpage>140</fpage>
          -
          <lpage>149</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[Sun</source>
          , 2006]
          <string-name>
            <given-names>Ron</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>The CLARION cognitive architecture: Extending cognitive modeling to social simulation. Cognition and multi-agent interaction</article-title>
          , pages
          <fpage>79</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>[Theodoridis and Hu</source>
          , 2007]
          <string-name>
            <given-names>Theodoros</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Huosheng</given-names>
            <surname>Hu</surname>
          </string-name>
          .
          <article-title>Action classification of 3d human models using dynamic anns for mobile robot surveillance</article-title>
          .
          <source>In Robotics and Biomimetics</source>
          ,
          <year>2007</year>
          .
          <article-title>ROBIO 2007</article-title>
          . IEEE International Conference on, pages
          <fpage>371</fpage>
          -
          <lpage>376</lpage>
          . IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>[Zhou</surname>
          </string-name>
          et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Bolei</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba.
          <article-title>Object detectors emerge in deep scene cnns</article-title>
          .
          <source>arXiv preprint arXiv:1412.6856</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>