<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology Based Semantic Image Interpretation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivan Donadello</string-name>
          <email>donadello@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DISI, University of Trento</institution>
          ,
          <addr-line>Via Sommarive 9, I-38123, Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Via Sommarive 18, I-38123, Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic image interpretation (SII) leverages Semantic Web ontologies for generating a mathematical structure that describes the content of images. SII algorithms consider the ontologies only in a late phase of the SII process to enrich these structures. In this research proposal we study a well-founded framework that combines logical knowledge with low-level image features in the early phase of SII. The image content is represented with a partial model of an ontology. Each element of the partial model is grounded to a set of segments of the image. Moreover, we propose an approximate algorithm that searches for the most plausible partial model. The comparison of our method with a knowledgeblind baseline shows that the use of ontologies significantly improves the results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Semantic image interpretation (SII) is the task of generating a semantically rich
structure that describes the content of an image [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This structure is both human and
machine understandable and can be encoded by using the Semantic Web (SW) language
RDF. The first advantage is that RDF enables the enrichment of the semantic content of
images with SW resources, the second one is that an RDF based description of images
enables content-based image retrieval via query languages like SPARQL.
      </p>
      <p>
        The main challenge in SII is bridging the so called semantic gap [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which is the
complex correlation between low-level image features and high-level semantic
concepts. High-level knowledge plays a key role in bridging the semantic gap [
        <xref ref-type="bibr" rid="ref17 ref18">17,18</xref>
        ]. This
knowledge can be found in the ontologies provided by the SW.
      </p>
      <p>
        Most of the current approaches to SII exploit ontologies at a later stage when some
hypothesis (a geometric description of the objects and their spatial relations) of the
image content have already been formulated by a bottom-up approach (see for instance
[
        <xref ref-type="bibr" rid="ref1 ref11 ref12 ref13 ref15 ref17 ref3 ref6">13,15,17,11,12,3,6,1</xref>
        ]). In these cases background knowledge is exploited to check the
consistency of the output and/or to infer new facts. These works do not consider
uncertainty coming from the low-level image analysis or require a set of DL rules for defining
what is abducible, which need to be manually crafted.
      </p>
      <p>
        In this research proposal we study a general framework for SII that allows the
integration of ontologies with low-level image features. The framework takes as input
the ontology and exploits it in the process of image interpretation. The output is a
description of the content of an image in terms of a (most plausible) partial logical model
? I thank my advisor Luciano Serafini for his precious help, suggestions and patience.
of the ontology [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Instead of lifting up low-level features into a logical form using
concrete domain (as in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) we proceed in the opposite direction, by compiling down
the background knowledge into low-level features. This allows us a more flexible
inference in processing numeric information and to use simpler, and more efficient, logical
reasoners for the semantic part. This partial model is generated by using optimisation
methods (e.g. clustering) that integrate numeric and logical information. Our
contribution is a formal framework for SII that integrates low-level features and logical axioms.
Moreover, we developed an early prototype and we evaluated it, with promising results,
on the task of detecting complex objects starting from the presence of their parts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Theoretical framework</title>
      <p>
        The proposed framework takes as input a labelled picture that is a picture partitioned
into segments (regions of pixels) using a semantic segmentation algorithm [
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        ]. Each
segment has a set of weighted labels that represent the level of confidence of the
semantic segmentation. Labels are taken from the signature which is the alphabet of
the ontology. A labelled picture is a pair P = hS; Li where S = fs1 : : : ; sng is a set of
segments of the picture P , and L is a function that associates to each segment s 2 S a
set L(s) of weighted labels hl; wi 2 (0; 1].
      </p>
      <p>
        In this research proposal we study a method for discovering new objects (e.g.,
composite objects) and relations between objects by exploiting low-level image features
and a Description Logic (DL) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] ontology. The ontology has the classical signature
= C ] R ] I of symbols for concepts, relations and individuals respectively.
We adopt the standard definitions for syntax and semantics of DL3 . An ontology O on
is a set of DL axioms. An interpretation of a DL signature is a pair I = I ; I ,
where I is a non empty set and I is a function that interprets the symbols of in
      </p>
      <p>I . I is a model of an ontology O if it satisfies all the axioms in O. The axioms of
the ontology are constraints on the states of the world. A picture, however, provides
only a partial view of the state of the world, indeed, it could show a person with only
one (visible) leg. Therefore, the content of a picture is not isomorphic to a model, as a
model could contain objects not appearing in the picture (the invisible leg). The content
of a picture should instead be represented as a partial model4.</p>
      <p>Definition 1 (Partial model). Let I and I0 be two interpretations of the signatures
and 0 respectively, with 0; I0 is an extension of I, or equivalently I0 extends
I, if I I0 , aI = aI0 , CI = CI0 \ I and RI = RI0 \ I I , for all a 2 I ,
C 2 C and R 2 R. Ip is a partial model for a ontology O, in symbols Ip j=p O, if
there is a model I of O that extends Ip.</p>
      <p>
        In this framework the use of DL ontologies is twofold: first they are a terminological
source for labelled pictures, second the DL inference services are exploited to check
if an interpretation is a partial model and thus inferring new facts. The semantic
interpretation of a picture is a partial model plus an alignment, called grounding, of every
element of Ip with the segments of the picture.
3 In this paper we use the SHIQ DL.
4 This intuition was introduced in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], our formalization however is slightly different.
      </p>
      <sec id="sec-2-1">
        <title>Definition 2 (Semantically interpreted picture). Given an ontology O with signature</title>
        <p>and a labelled picture P = hS; Li, a semantically interpreted picture is a triple
S = (P ; Ip; G)O where:
– Ip = h Ip ; Ip i is a partial model of O;
– G Ip S is a left-total and surjective relation called grounding relation:
if hd; si 2 G then there exists an l 2 L(s) such that:
1. if l 2 C then d 2 lIp ;
2. if l 2 I then d = lIp ;
3. if l 2 R then hd; d0i 2 RI or hd0; di 2 RI for some d0 2 I .</p>
        <sec id="sec-2-1-1">
          <title>The grounding of every d 2</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Ip , denoted by G(d), is the set fs 2 S j hd; si 2 Gg.</title>
          <p>There are many possible explanations of the picture content, thus there are many
partial models describing a picture via a grounding relation. We define a cost function
S that assigns a cost to a partial model based on its adherence to the image content:
the higher the adherence the lower the cost. The most plausible partial model Ip is the
partial model that minimizes S , in symbols:</p>
          <p>Ip = argmin S (P ; Ip; G)O</p>
          <p>Ipj=pO
G Ip S
(1)
The definition of S has to take into account low-level features of the segments and
highlevel semantic features of the partial model derivable form the ontology. Intuitively the
cost function measures the semantic gap between the two types of features.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Definition 3 (Semantic image interpretation problem). Given an ontology O and a</title>
        <p>labelled picture P , a cost function S , the semantic image interpretation problem is the
construction of a semantically interpreted picture S = (P ; Ip; G)O that minimizes S .
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>In this proposal we restrict to the recognition of complex objects from their parts. For
example, given a labelled picture where only some parts of a man (the legs, one arm and
the head) and of a horse (the legs, the muzzle and the tail) are labelled we want to infer
the presence of some logical individuals with their classes (man and horse respectively).
These individuals are linked with their parts through the partOf relation. This can be
seen as a clustering problem and we specify the cost function in terms of clustering
optimisation. The parts (simple objects) are the input of the clustering problem whereas
a single cluster contains the parts of a composite object. In addition, the parts to cluster
are the individuals d 2 Ip with the following features:
– a set of low-level image features extracted from G(d), the grounding of d;
– a set of semantic features corresponding the most specific concepts extracted from
the set fC 2 C j d 2 CIp g assigned to d by Ip.</p>
      <p>We use the centroid of G(d) as a numeric feature but the approach can be generalised
to other features. Clustering algorithms are based on some distance between the input
elements defined in terms of their features. Let G (d; d0) be the Euclidean distance of the
centroids of G(d) and G(d0), Os(d; d0) a semantic distance between simple objects and
Oc(d; d0) a semantic distance between a simple object and its corresponding composite
object. We define the cost function as the quality measure of the clustering:
(1</p>
      <p>S(hP; Ip; GiO) =</p>
      <p>0
0
d;d02(9hasPart:&gt;)Ip</p>
      <p>1
G (d; d0)A +
1
( G (d0; d00) + Os(d0; d00)) +</p>
      <p>X
hd0;di2partOfIp</p>
      <p>C
( G (d0; d) + Oc(d0; d))C :</p>
      <p>
        C
A
Following [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the first component of the above equation measures the centroid distance
between the composite objects (inter-cluster distance). The second component estimates
the distance between the elements of each single cluster (intra-cluster distance).
      </p>
      <p>
        Minimising analytically the above equation is rather complex, thus we developed
an iterative algorithm that at each loop groups the several parts of a composite object
approximating the cost function. If the grouping is not a partial model the algorithm
enters in the next loop and selects another clustering. In the first step our algorithm
generates an initial partial model Ip from P = hS; Li where Ip contains an element
ds for every segment s 2 S and any concept C in the labelled picture is interpreted as
CIp = fdsjC 2 L(s)g. The grounding G is the set of pair hds; si. Then, the algorithm
enters in a loop where a non-parametric clustering procedure [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] clusters the input
elements d 2 Ip by using their numeric and semantic features according to G and
s . Each cluster cl corresponds to a composite object dcl which is introduced in Ip
O
and is connected via the hasPart relation to the elements of cl. We predict the type
of this new individual via abductive reasoning: the type is the ontology concept that
shares the maximum number of parts with the elements of the cluster. For example, if
we cluster some elements of type Tail, Muzzle and Arm an abducted ontology concept
will be Horse. These new facts are introduced in Ip and the algorithm checks if Ip is a
partial model of O by using a DL reasoner (Pellet [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). If true the algorithm returns Ip
otherwise it extends the input elements with a set of consistency features that encode
information about the inconsistency of Ip. These features tend to separate (resp. join)
the segments that have been joined (resp. separated) in the previous clustering. The
cluster of our example is inconsistent because a horse does not have arms. Then the
algorithm returns at the beginning of the loop.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        To evaluate our approach we created, by using LABELME [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a dataset of 204 labelled
pictures. For each picture we manually annotated simple objects, composite objects and
their part-whole relations5. We also created a simple ontology6 with a basic
formalisation of meronymy in the domains of: houses, trees, people, and street vehicles. We built
a ground truth by associating every single labelled picture P to its partial model
encoded in an ABox AP . The partial model returned by our algorithm is encoded in the
AP ABox, in order to compare AP with AP we define the following two measures.
      </p>
      <p>Grouping (GRP): this measure expresses how good is our algorithm at grouping
parts of the same composite object. We define precision, recall and F1 measure on the
set of siblings (the parts of the same composite object): sibl(A) = fhd; d0i j 9d00 :
partOf(d; d00); partOf(d0; d00) 2 Ag: Thus:
precGRP(P) = jsibl(AP ) \ sibl(AP )j
jsibl(AP )j
recGRP(P) = jsibl(AP ) \ sibl(AP )j
jsibl(AP )j</p>
      <p>Complex-object prediction (COP): this measure expresses how good is our
algorithm at predicting the type of the composite object. We define precision, recall and
F1 measure on the types of the composite object each part is assigned to: ptype(A) =
fhd; Ci j 9d0 : fpartOf(d; d0); C(d0)g Ag. Thus:
precCOP(P) = jptype(AP ) \ ptype(AP )j recCOP(P) = jptype(AP ) \ ptype(AP )j
jptype(AP )j jptype(AP )j
To measure how the semantics improves the recognition of composite objects from their
parts we implemented a baseline that clusters without semantic features, see Table 1.
We can see that the explicit use of semantic knowledge via semantic distance, abductive
and deductive reasoning improves the baseline that relies only on numeric features.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We proposed a well-founded and general framework for SII that integrates symbolic
information of an ontology with low-level numeric features of a picture. An image
is interpreted as a (most plausible) partial model of an ontology that allows the query
about the semantic content. We applied the framework to the specific task of recognizing
composite objects from their parts. The evaluation shows good results and the injection
of semantic knowledge improves the performance with respect to a semantically-blind
baseline. As future work, we want to extend our evaluation by using more low-level
features, by studying other relations and by using a semantic segmentation algorithm as
source of labelled pictures.
5 An example of labelled picture is available at http://bit.ly/1DXZxic
6 The ontology is available at http://bit.ly/1AruGh0</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Atif</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hudelot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bloch</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Explanatory reasoning for image understanding using formal concept analysis and description logics</article-title>
          .
          <source>Systems, Man, and Cybernetics: Systems, IEEE Transactions on 44(5)</source>
          ,
          <fpage>552</fpage>
          -
          <lpage>570</lpage>
          (May
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F</given-names>
          </string-name>
          . (eds.):
          <article-title>The Description Logic Handbook: Theory, Implementation, and Applications</article-title>
          . Cambridge University Press, New York, NY, USA (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bannour</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hudelot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Towards ontologies for image interpretation and annotation</article-title>
          . In: Martinez,
          <string-name>
            <surname>J.M.</surname>
          </string-name>
          (ed.) 9th
          <source>International Workshop on Content-Based Multimedia Indexing</source>
          ,
          <string-name>
            <surname>CBMI</surname>
          </string-name>
          <year>2011</year>
          , Madrid, Spain, June 13-15,
          <year>2011</year>
          . pp.
          <fpage>211</fpage>
          -
          <lpage>216</lpage>
          . IEEE (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carreira</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caseiro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batista</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sminchisescu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Semantic segmentation with secondorder pooling</article-title>
          . In: Fitzgibbon,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lazebnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Schmid</surname>
          </string-name>
          , C. (eds.) Computer Vision - ECCV
          <year>2012</year>
          . LNCS, Springer Berlin Heidelberg (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Donadello</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serafini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Mixing low-level and semantic features for image interpretation</article-title>
          . In: Agapito,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Bronstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.M.</given-names>
            ,
            <surname>Rother</surname>
          </string-name>
          , C. (eds.) Computer Vision - ECCV
          <source>2014 Workshops. LNCS</source>
          , Springer International Publishing (
          <year>2014</year>
          ), best paper award.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Espinosa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Mo¨ller, R.:
          <article-title>Logical formalization of multimedia interpretation</article-title>
          . In: Paliouras,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Spyropoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Tsatsaronis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (eds.)
          <source>Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, Lecture Notes in Computer Science</source>
          , vol.
          <volume>6050</volume>
          , pp.
          <fpage>110</fpage>
          -
          <lpage>133</lpage>
          . Springer Berlin Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gould</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Y.:
          <article-title>Superpixel graph label transfer with learned distance metric</article-title>
          . In: Fleet,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Pajdla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Schiele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Tuytelaars</surname>
          </string-name>
          , T. (eds.) Computer Vision - ECCV
          <source>2014. Lecture Notes in Computer Science</source>
          , Springer International Publishing (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hudelot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maillot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thonnat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Symbol grounding for semantic image interpretation: From image data to semantics</article-title>
          .
          <source>In: Proc. of the 10th IEEE Intl. Conf. on Computer Vision Workshops. ICCVW '05</source>
          , IEEE Computer Society (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jung</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>D.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drake</surname>
            ,
            <given-names>B.L.</given-names>
          </string-name>
          :
          <article-title>A decision criterion for the optimal number of clusters in hierarchical clustering</article-title>
          .
          <source>Journal of Global Optimization</source>
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <fpage>91</fpage>
          -
          <lpage>111</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kohonen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The self-organizing map</article-title>
          .
          <source>Proc. of the IEEE</source>
          <volume>78</volume>
          (
          <issue>9</issue>
          ),
          <fpage>1464</fpage>
          -
          <lpage>1480</lpage>
          (
          <year>Sep 1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Mo¨ller, R.:
          <article-title>On scene interpretation with description logics</article-title>
          .
          <source>Image and Vision Computing</source>
          <volume>26</volume>
          (
          <issue>1</issue>
          ),
          <fpage>82</fpage>
          -
          <lpage>101</lpage>
          (
          <year>2008</year>
          ), cognitive Vision-Special Issue
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Peraldi</surname>
            ,
            <given-names>I.S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaya</surname>
            ,
            <given-names>A.,</given-names>
          </string-name>
          <article-title>M o¨ller</article-title>
          , R.:
          <article-title>Formalizing multimedia interpretation based on abduction over description logic aboxes</article-title>
          .
          <source>In: Proc. of the 22nd Intl. Workshop on Description Logics (DL</source>
          <year>2009</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>477</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mackworth</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>A logical framework for depiction and image interpretation</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>41</volume>
          (
          <issue>2</issue>
          ),
          <fpage>125</fpage>
          -
          <lpage>155</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freeman</surname>
          </string-name>
          , W.T.:
          <article-title>Labelme: A database and webbased tool for image annotation</article-title>
          .
          <source>Int. J. Comput. Vision</source>
          <volume>77</volume>
          (
          <issue>1-3</issue>
          ),
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          (May
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Schroder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>On the logics of image interpretation: model-construction in a formal knowledge-representation framework</article-title>
          .
          <source>In: Image Processing</source>
          ,
          <year>1996</year>
          . Proceedings.,
          <source>Int. Conf. on. vol. 1</source>
          , pp.
          <fpage>785</fpage>
          -
          <lpage>788</lpage>
          (
          <year>Sep 1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Pellet: A practical owl-dl reasoner</article-title>
          .
          <source>Web Semant</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <fpage>51</fpage>
          -
          <lpage>53</lpage>
          (
          <year>Jun 2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Town</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Ontological inference for image and video analysis</article-title>
          .
          <source>Mach. Vision Appl</source>
          .
          <volume>17</volume>
          (
          <issue>2</issue>
          ),
          <fpage>94</fpage>
          -
          <lpage>115</lpage>
          (
          <year>Apr 2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yuille</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliva</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Frontiers in computer vision: Nsf white paper</article-title>
          (
          <year>November 2010</year>
          ), http://www.frontiersincomputervision.com/WhitePaperInvite.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>