<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Grounding Ontologies in the External World</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio CHELLA</string-name>
          <email>antonio.chella@unipa.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Palermo and ICAR-CNR</institution>
          ,
          <addr-line>Palermo</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper discusses a case study of grounding an ontology in the external world by a cognitive architecture for robot vision developed at the RoboticsLab of the University of Palermo. The architecture aims at representing symbolic knowledge extracted from visual data related to static and dynamic scenarios. The central assumption is the principled integration of a robot vision system with a symbolic system underlying the knowledge representation of the scene. Such an integration is based on a conceptual level of representation intermediate between the sub-symbolic processing of visual data and the declarative style employed in the ontological representation.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Conceptual Spaces</kwd>
        <kwd>Symbol Grounding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The symbol grounding problem, as stated by Stevan Harnad [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], roughly concerns how
to interpret a formal symbol system in terms of the entities in the external world. For
example, an instance of the problem is the interpretation of the symbol “Hammer#1” in
a formal symbol system by the corresponding hammer in the real world. The problem is
crucial especially for autonomous agents because an autonomous agent has to find the
meaning of its symbols in the inner structures of the agent itself. Harnad discusses the
capabilities of neural networks as candidate mechanisms able to solve the problem.
      </p>
      <p>
        The paper claims that an intermediate representation of a geometric kind is a better
candidate for the symbol grounding problem. It is well known that neural networks
present several problems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. They are opaque, i.e., it is difficult to understand the
behavior of a neural network simply by analyzing its weights and the activation levels of
its units. Moreover, a neural network needs a massive training set of labeled examples.
After a neural network is trained, it is quite difficult to add new examples without
restarting the training phase from scratch. The compositionality of concepts in neural
networks is another well-known problem of a complicated solution.
      </p>
      <p>The theory of conceptual spaces provides instead a robust geometric framework for
the grounding of ontologies of symbols in a cognitive agent that overcomes many of the
limitations of neural network representations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Conceptual Spaces</title>
      <p>
        A conceptual space (CS) is a metric space in which entities are characterized by some
quality dimensions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Examples of such aspects could be color, pitch, volume, spatial
coordinates, and so on. Some dimensions are closely related to the sensorial inputs of the
system; others may be characterized in more abstract terms. The dimensions of a
conceptual space represent qualities of the external environment independently from any
linguistic formalism or description. In this sense, a conceptual space comes before any
symbolic characterization of cognitive entities.
      </p>
      <p>An important aspect of the theory of conceptual spaces is the definition of a metric
function in CS. In brief, the distance between two points of a CS computed according to
such a metric function corresponds to a measure of the similarity between the entities
corresponding to the points.</p>
      <p>
        Another pillar of CS theory is the role of convex sets of points in the
conceptualization. According to psychological literature (see, e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), the so-called
natural categories represent the most informative level of categorization in taxonomies
of real-world entities. They are the most differentiated from one another and constitute
the preferred level for reference. Also, they are the first to be learned by children, and
categorization at their level is usually faster. The theory of conceptual spaces assumes
the so-called Criterion P, according to which natural categories correspond to convex
sets in some suitable CS. As a consequence, betweenness is significant for natural
categories, in that for every pair of points belonging to a convex set (and therefore
sharing some features), all the points between them also belong to the set itself, and share
in their turn the same features.
      </p>
      <p>
        Conceptual spaces, as discussed in detail in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], are more transparent than neural
networks; they can be built even by a small set of examples; they are more suitable for
incremental learning; the problem of compositionality may be taken into account more
quickly and naturally. See [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for up to date discussion on the relationships between
conceptual spaces and structures in the brain.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. A Cognitive Architecture</title>
      <p>Based on these ideas, a cognitive architecture for robot vision has been developed at the
RoboticsLab of the University of Palermo.</p>
      <sec id="sec-3-1">
        <title>3.1. The three areas</title>
        <p>
          The design is subdivided into three main areas: the subconceptual, the conceptual and
the linguistic areas. The subconceptual area is related to the processing of data coming
from the sensors. Here, information is not yet organized concerning conceptual structures
and categories. Instead, in the linguistic area, representation and processing are based on
a logic-oriented formalism based on description logic [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In this area, ontologies may be
suitably represented.
        </p>
        <p>The conceptual area is based on the theory of conceptual spaces previously outlined.
It is an intermediate level of representation between the sub-conceptual and the linguistic
areas. Here, data is organized in conceptual structures that are independent of symbolic
description. The symbolic ontology of the linguistic area is then interpreted on
aggregations of these structures. The conceptual space acts as a workspace in which
lowlevel and high-level processes access and exchange information from bottom to top and
from top to bottom.</p>
        <p>The three areas of the architecture are parallel computational components working
together on different commitments. There is no privileged direction in the flow of
information among them: some computations are strictly bottom-up, with data flowing
from the subconceptual up to the linguistic through the conceptual area; other
calculations combine top-down with bottom-up processing.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. The case of static scenes</title>
        <p>
          In the case of grounding ontologies related to static scenes [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we take into account a
suitable conceptual space where each point corresponds to a geometric entity. Then,
“natural” concepts such as boxes, cylinders, spheres, correspond to convex sets of points
in the considered conceptual space. A symbol like “Box#1” thus corresponds to an item
in the CS belonging to the convex set of boxes.
        </p>
        <p>Composite objects cannot be described by single points in this CS. To represent
these objects, we naturally assume that they correspond to sets of points in CS. For
example, a chair can be easily described as the set of its constituents, i.e., its legs, its seat
and so on. Analogously, a hammer may be considered as composed of two geometric
entities: its handle and its head. So, a generic composite object is described as the set of
points corresponding to its components.</p>
        <p>The concept of hammer thus is described in CS as a set of pairs, each of them is
made up of the two elements of a real hammer, i.e., its handle and its head. Let us suppose
for simplicity that the hammer handle is typically a cylinder, while the hammerhead is
usually a box. Then, the handle of the hammer will be grounded in the CS on the subset
of the set of points corresponding to the concept of the cylinder, while the head of the
hammer will be grounded on the suitable subset of points corresponding to the concept
of the box.</p>
        <p>Thus, the symbol "Hammer#1,” corresponding to a specific instance of a hammer,
will correspond to a specific pair of points in the conceptual space: one point of the pair
will belong to the proper subset of cylinders while the other point will belong to the
subset of boxes. In turn, these points are linked to the corresponding entities in the
external world thanks to the subconceptual area that processes the data coming from the
sensors of the system.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. The focus of attention</title>
        <p>To identify in the CS the set of components of a composite object as the hammer that is
described at the symbolic level, we define a focus of attention mechanism acting as a
light spot that sequentially scans the conceptual space.</p>
        <p>In the beginning, the focus of attention explores a zone in the conceptual space where
a point is expected that matches one of the components of the composite object, for
example, the point corresponding to the hammer handle. If this expectation is satisfied,
then the focus of attention searches for a second component of the composite object (e.g.,
a second point corresponding to the hammerhead, with suitable shape and appropriate
spatial arrangement). This process is iterated until all such expectations are satisfied, and
therefore there is enough evidence to assert that a composite object as the hammer is
present in the scene.</p>
        <p>The focus of attention is controlled by two different modalities, namely the linguistic
modality and the associative modality. According to the linguistic modality, the focus
of attention is driven by the symbolic knowledge explicitly stored in the ontology in the
linguistic area.</p>
        <p>For example, let us suppose that the system stored in its ontology the description of
the hammer as composed by a head and by a handle. When the system recognizes a point
in CS as a possible part of a hammer (e.g., as its handle), it generates the hypothesis that
a hammer is present in the scene, and therefore it searches the CS for the lacking parts
(in this case, the hammer's head).</p>
        <p>When different types of objects have similar parts (e.g., similar handles), various
competing hypotheses are generated, the most plausible of which wins on the others, and
is accepted by the system.</p>
        <p>According to the associative modality, the focus of attention is driven by an
associative mechanism based on learned expectations. Let us suppose that the system has
experienced several scenes where a hammer is present along with a nail. As a
consequence, the system determines to associate hammers and boxes; when a hammer is
present in the scene, it expects to find a nail in the surroundings.</p>
        <p>
          A natural way to implement the focus of attention as it has been described before is
to employ an associative memory. A mechanism based on a suitable associative neural
network that implements both the linguistic and the associative modalities of the focus
of attention is discussed in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          The ideas previously summarized for the analysis of static scene are generalized to
ground ontologies related to dynamic scenes analysis [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], robot actions [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], robot
selfrecognition [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], robot self-consciousness [11], and recently to model ontologies related
with music perception [
          <xref ref-type="bibr" rid="ref11">12</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>Conceptual spaces offer a robust theoretical framework for the development of a
conceptual semantics for symbolic ontologies that can account for the grounding of
symbols in the data coming from robot vision. In this sense, conceptual spaces could give
a relevant contribution to a better integration of robot vision and ontology-based AI
techniques in the design of autonomous agents.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harnad</surname>
          </string-name>
          ,
          <article-title>The symbol grounding problem</article-title>
          ,
          <string-name>
            <surname>Physica</surname>
            <given-names>D</given-names>
          </string-name>
          : Nonlinear Phenomena,
          <volume>42</volume>
          (
          <year>1990</year>
          ),
          <fpage>335</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frixione</surname>
          </string-name>
          ,
          <article-title>Conceptual spaces for cognitive architectures: a lingua franca for different levels of representation</article-title>
          ,
          <source>Biologically Inspired Cognitive Architectures</source>
          ,
          <volume>19</volume>
          (
          <year>2017</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gärdenfors</surname>
          </string-name>
          ,
          <article-title>Conceptual spaces: The geometry of thought</article-title>
          . MIT Press, Cambridge, MA,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rosch</surname>
          </string-name>
          ,
          <article-title>Cognitive representations of semantic categories</article-title>
          ,
          <source>Journal of Experimental Psychology: General</source>
          ,
          <volume>104</volume>
          (
          <year>1975</year>
          ),
          <fpage>192</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Balkenius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gärdenfors</surname>
          </string-name>
          ,
          <article-title>Spaces in the Brain: From Neurons to Meanings</article-title>
          , Frontiers in Psychology,
          <volume>7</volume>
          (
          <year>2016</year>
          ),
          <year>1820</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.J.</given-names>
            <surname>Brachman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Resnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Borgida</surname>
          </string-name>
          ,
          <article-title>Living with CLASSIC: when and how to use a KL-ONE-like language</article-title>
          , in: J.F. Sowa (ed.),
          <article-title>Principles of semantic networks: explorations in the representation of knowledge</article-title>
          ,
          <fpage>401</fpage>
          -
          <lpage>456</lpage>
          , Morgan Kaufmann, San Mateo, CA,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frixione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaglio</surname>
          </string-name>
          ,
          <article-title>A cognitive architecture for artificial vision</article-title>
          ,
          <source>Artificial Intelligence</source>
          ,
          <volume>89</volume>
          (
          <year>1997</year>
          ),
          <fpage>73</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frixione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaglio</surname>
          </string-name>
          ,
          <article-title>Understanding dynamic scenes</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>123</volume>
          (
          <year>2000</year>
          ),
          <fpage>89</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pirrone</surname>
          </string-name>
          ,
          <article-title>Conceptual representations of actions for autonomous robots</article-title>
          ,
          <source>Robotics and Autonomous Systems</source>
          ,
          <volume>34</volume>
          (
          <year>2001</year>
          ),
          <fpage>251</fpage>
          -
          <lpage>263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frixione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaglio</surname>
          </string-name>
          ,
          <article-title>Anchoring symbols to conceptual spaces: the case of dynamic scenarios</article-title>
          ,
          <source>Robotics and Autonomous Systems</source>
          ,
          <volume>43</volume>
          (
          <year>2003</year>
          ),
          <fpage>175</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chella</surname>
          </string-name>
          ,
          <article-title>A cognitive architecture for music perception exploiting conceptual spaces</article-title>
          , in F. Zenker, P. Gärdenfors (eds.),
          <article-title>Applications of conceptual spaces</article-title>
          .
          <source>The Case for Geometric Knowledge Representation</source>
          ,
          <fpage>187</fpage>
          -
          <lpage>203</lpage>
          , Springer, Heidelberg,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>