<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enabling Interoperability between Multimedia Resources: An Ontology Matching Perspective</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolas James</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Todorov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Celine Hudelot</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>nicolas.james</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>konstantin.todorov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>celine.hudelotg@ecp.fr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MAS Laboratory, Ecole Centrale Paris</institution>
          ,
          <addr-line>F-92 295 Ch</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>atenay-Malabry</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The semantic annotation of images can bene t from representations of useful concepts and the links between them as ontologies. Recently, several multimedia ontologies have been proposed in the literature as suitable knowledge models to bridge the well known semantic gap between low level features of image content and its high level conceptual meaning. Nevertheless, these multimedia ontologies are often dedicated to (or initially built for) particular needs or a particular application. Ontology matching, de ned as the process of relating di erent heterogeneous models, we will argue, is a suitable approach to solve interoperability issues in semantic image annotation and retrieval. We propose a generic instance-based ontology matching approach, applied to an important semantic image retrieval issue: the bridging of the semantic gap by matching a multimedia ontology against a common-sense knowledge resource.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The fast growth of shared digital image and video collections together with
the intensive use of visual information for decision making in many domains
(medicine, geosciences, etc) require new e ective methods for search and
retrieval in these collections. In order to enable and improve the communication
and the interface between humans and computers, it is necessary to understand
the semantic content of images and to built linguistic descriptions of their
content in an automatic way. Following decades of research on Content Based Image
Retrieval (CBIR), automatic image annotation is nowadays an active research
topic which aims at bridging the semantic and the perceptual levels of
abstraction, known as the Semantic gap problem [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In most of the image annotation
approaches, the computed linguistic description is often only related to
perceptual manifestations of semantics. Nevertheless, as explained in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the image
semantics cannot be considered as being included explicitly in the image itself.
It rather depends on prior knowledge and on the context of use of the visual
information. In consequence, explicit semantics, represented by ontologies, has
been intensely used in the eld of image retrieval recently.
      </p>
      <p>
        With the growth of the application of ontology-based solutions in the
multimedia domain, a lot of interoperability issues have arisen: (a) At the semantic
level { between di erent representations of the same domain knowledge; (b) At
the visual level { between di erent multimedia ontologies; (c) Between the visual
level and the semantic level, i.e. the semantic gap problem. Ontology matching,
widely used for semantic web applications and rarely in the context of image
sharing and retrieval, that we de ned as the process of relating heterogeneous
knowledge models, can be used to solve these kinds of interoperability issues.
This paper proposes a generic approach to address the question of lling the
semantic gap by matching an ontology at the semantic level (Wordnet1
associated to the image database LabelMe[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) with an ontology at the visual level
(LSCOM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]).
      </p>
      <p>Next section is a short review of existing multimedia ontologies and related
approaches. Section 3 describes the ontology matching framework which forms
the methodological background of our approach, presented in turn in Section
4. Results of our preliminary experiments are discussed in Section 5; Section 6
concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In the past few years, concept-based multimedia retrieval has been a very active
research eld with a major e ort in the automatic detection of semantic concepts
from low level features with machine learning approaches. Despite these e orts,
the semantic gap problem is still an issue for the semantic understanding of
multimedia documents. Recently, many knowledge models have been proposed
to improve multimedia retrieval and interpretation by the explicit modeling of
the di erent relationships between semantic concepts. Indeed, many generic large
scale multimedia ontologies or multimedia concept lexicons together with image
collections have been proposed to improve multimedia search and retrieval by
providing an e ective representation and interpretation of multimedia concepts
[
        <xref ref-type="bibr" rid="ref1 ref12 ref13">13,12,1</xref>
        ]. We propose to classify these ontologies in four major groups: (1)
semantic web multimedia ontologies often based on MPEG-7, reviewed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (2)
visual concept hierarchies (or networks) inferred from inter-concept visual
similarity contexts (among which VCNet based on Flickr Distance [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and the
Topic Network of Fan [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), (3) speci c multimedia lexicons often composed of
a hierarchy of semantic concepts with associated visual concept detectors used
to describe and to detect automatically the semantic concepts of multimedia
documents (LSCOM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], multimedia thesauri [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]) and (4) generic ontologies
based on existing semantic concept hierarchies such as WordNet populated with
annotated images or multimedia documents (ImageNet [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], LabelMe [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). These
ontologies have proved to be very useful mainly in the context of semantic
concept detection and automatic multimedia annotation but many problems still
remain unsolved among which enabling the interoperability between visual
concepts and high level concepts. Although there exist attempts to solve these
problems by manual concept mappings [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], little e ort has been directed towards
performing them in an automatic manner. Moreover, these ontologies are often
dedicated to (or built for) particular needs or a particular application and are
1 http://wordnet.princeton.edu/
complementary knowledge sources. While studies have been done to analyze the
di erent inter-concept similarities in di erent multimedia ontologies [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], to the
best of our knowledge, there are no studies which propose a cross analysis and
a joint use of these di erent and complementary ontologies.
      </p>
      <p>
        This paper proposes to situate these problems in an O[ntology] M[atching]
framework. The OM-approach presented in next section is much in line with the
tradition of extensional matching. This comprises a set of techniques which base
the similarity of concepts on characteristics of the instances that these concepts
contain [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>An Ontology Matching Approach</title>
      <p>An ontology is based on a set of concepts and relations de ned on these concepts,
which altogether describe the knowledge in a given domain of interest. Due
to the fact that di erent communities, independently from one another, tend
to conceptualize di erently the same domain of interest, a growing number of
heterogeneous ontologies, describing similar or overlapping parts of the world
are created. An OM procedure aims at reducing this heterogeneity by linking
the correspondent elements of two ontologies in an automatic or semi-automatic
manner.</p>
      <p>Formally, a populated ontology will be de ned by O = fC; is_a; R; I; gg;
where C is a set whose elements are called concepts, is_a is a partial order on C;
R is a set of other (binary) relations holding between the concepts from the set
C, I is a set whose elements are called instances and g : C ! 2I is an injection
from the set of concepts to the set of subsets of I:</p>
      <p>We note that the sets C and I are compulsorily non-empty, in contrast to
R: Thus, the de nition above describes an ontology which, although not limited
to subsumptional relations, necessarily contains a hierarchical backbone, de ned
by the partial order. The set I may contain text documents, images or other
(real world data) entities. By assumption, every instance can be represented as
an n-dimensional real-valued vector, de ned by n input variables of some kind
which are the same for all instances in I.</p>
      <p>
        In the context of semantic image annotation, WordNet together with the
LabelMe database [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and LSCOM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] together with the TRECVID 2005
database are two examples of such populated ontologies. Concepts are the nodes
of the WordNet hierarchy in ImageNet or the LSCOM categories, while instances
are the images in the associated databases, which are labeled by these concepts.
It is important to note that the set R is empty for the LSCOM ontology. In
the case of WordNet, R contains several useful relations like is_a_member_of,
is_a_part_of, opposes, etc.
      </p>
      <p>Often the outcome of an OM-procedure is a set of cross-ontology concept
alignments, issued from a measure of concept similarity. The measures used
in the current study are based on variable selection and we will describe them
in more detail.</p>
      <p>
        Variable selection techniques (reviewed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) serve to rank the input
variables of a given problem (e.g. classi cation) by their importance for the output
(the class a liation of an instance), according to certain evaluation criteria. A
real valued score which accounts for this importance is attached to every variable.
In our case, this can be of help for uncovering latent input-output dependencies.
Assuming that instances are represented as real-valued vectors, the computed
scores would indicate which of the vector dimensions are most important for the
separation of the instances (within a single ontology) into those that belong to
a given concept and those that do not and thus best characterize this concept.
      </p>
      <p>We de ne a binary classi cation training set SOc for each concept c from an
ontology O by taking I; the entire set of instances assigned to O and labeling
all instances from the set g(c) as positive and all the rest (Ing(c)) as negative.
By the help of a variable selection procedure performed on SOc, we obtain a
representation of the concept c as a list</p>
      <p>
        L(c) = (sc1; sc2; :::; scn);
(1)
where sic is the score associated to the ith variable. To compute a score per
variable and per concept, we apply the S[upport] V[ector] M[achine]-based
variable selection technique introduced in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. A series of SVMs is learned on the
training set SOc by subsequently removing a variable at a time. The ability of
each variable to discriminate c from the other concepts in O is evaluated by
measuring the sensitivity of the VC-dimension, an important SVM parameter,
with respect to the variable in question.
      </p>
      <p>
        By following the described procedure, given two source ontologies O1 and
O2; a representation as the one in (1) is made available for every concept of each
of these ontologies. The similarity of two concepts, A 2 O1 and B 2 O2 is then
assessed in terms of their corresponding representations L(A) and L(B): Several
choices of a similarity measure based on these representations are proposed and
compared in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In the experimental work contained in this paper, we have
used Pearson's, Spearman's and Kendall's measures of correlation calculated on
the variable scores or ranks (integers corresponding to the scores) given by
simP earson =
simSpearman = 1
      </p>
      <p>6
qPn
i=1(siA
sAmean)2qPn</p>
      <p>i=1(siB
Pn
i=1(siA
sAmean)(siB
sBmean)
sBmean)2</p>
      <p>;
P d2</p>
      <p>i i
n(n2
1)
;</p>
      <p>nc
simKendall = 12 n(n
nd :
1)
(2)
(3)
In the formulae above, sAmean and sBmean are the means of the scores over all input
variables, di is the di erence of the ranks calculated for the ith variable w.r.t.
the two concepts, and nc and nd are the numbers of concordant and discordant
pairs among the lists of scores L(A) and L(B):</p>
    </sec>
    <sec id="sec-4">
      <title>Filling the Semantic Gap with Mapped Concepts</title>
      <p>
        As noted in the introduction, many challenging issues in the eld of image
retrieval stem from the semantic gap problem. Two examples are the construction
of robust high level concept detectors and the creation of user oriented
annotations with high level semantics. In this section, we propose an attempt to
ll the semantic gap by matching two complementary resources: a visual and a
semantic thesaurus. Contrary to [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], our approach is automatic, generic
(ontology independent) and makes use of the visual knowledge shared by the source
ontologies.
      </p>
      <p>
        On one hand, we chose LSCOM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], an ontology dedicated to multimedia
annotation. It was initially built in the framework of TRECVID2 with the
criteria of concept usefulness, concept observability and feasibility of concept
automatic detection. LSCOM is populated by the development set of TRECVID 2005
videos (news broadcasting). On the other hand, we used WordNet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] populated
with the LabelMe dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Many interoperability issues can be addressed for
these two ontologies among which semantic interoperability and semantic gap
interoperability. Aligning these resources allows for the semantic enrichment of
concepts belonging to a multimedia ontology with high level linguistic concepts
from a general and common sense knowledge base and the evaluation of the
quality of the baseline concept detectors by studying the link between concepts
whose semantics is related to their perceptual manifestations and concepts whose
semantics is related to common sense.
      </p>
      <p>
        In our setting, the instances that extensionally de ne a concept are images
whose annotations contain the name associated to this concept. An image is
represented as a vector of descriptors. We use a codebook built on a bag-of-features
model and histograms of codewords which is, nowadays, the best approach in
the state-of-the-art [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In that, the variables which describe the instances are
the bins of these histograms. The generic variable selection approach described
in Section 3 is applied directly on our data. In result, we obtain a concept
representation as the one introduced in eq. (1) for every concept of our two source
ontologies. As stated above (Section 3), there exist several plausible choices of
a measure of similarity for two concepts represented in this manner. In our
experiments, we have tested the three measures of correlation given in (2) and (3).
Regardless of the particular choice, the similarity is always based on visual
criteria, since the underlying concept representations are obtained by using visual
characteristics of the instances (in the particular case of LSCOM and WordNet
these are the sets of images of either TRECVID or LabelMe).
      </p>
      <p>Aligning LSCOM to WordNet allows to infer knowledge about the LSCOM
concepts (dedicated to the multimedia document annotation) with regard to the
concepts of WordNet and the alignment could be used to build a linguistic
description of the concepts of LSCOM, or, in other words, to answer the question
\What is an LSCOM concept in WordNet?" in an automatic manner. This
improves the retrieval process in several ways: (1) through query expansion and</p>
      <sec id="sec-4-1">
        <title>2 http://www-nlpir.nist.gov/projects/tv2005/</title>
        <p>reformulation, i.e. retrieving documents annotated with concepts from an
ontology O1 using a query composed of concepts of an ontology O2, (2) through
a better description of the documents in the indexing process. However, note
that this relation is not symmetric: alignments in the other sense are prompt
to fail to be of any help, since WordNet concepts are rather atomic (such as
\car") as compared to the more complex LSCOM concepts (e.g. \Natural
Disaster Scene").
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <p>We use a part of the LSCOM ontology, LSCOM Annotation v1.03, which is a
subset of 449 concepts from the initial LSCOM ontology, and is used for
annotating 61,517 images from the TRECVID2005 development set. Since this
set contains images from broadcast news videos, the choosen LSCOM subpart is
particularly adapted to annotate this kind of content, thus contains abstract and
speci c concepts (e.g. 196 Science Technology, 330 Interview On Location).
To the contrary, our sub-ontology de ned from WordNet populated with
LabelMe (3676 concepts) is very general considering the nature of LabelMe, which
is composed of photographs from the daily life.</p>
      <p>In this way, to provide a preliminary evaluation of the suggested approach,
we chose three concepts from the LSCOM ontology and ve concepts from the
WordNet ontology. The choice of the selected concepts was made on several
criteria: (1) the number of associated instances, (2) for every selected concepts
there is no semantic ambiguity in our dataset, (3) for WordNet only: a high
con dence (arbitrarily decided) in the discrimination of the concept using only
perceptual information.</p>
      <sec id="sec-5-1">
        <title>3 http://www.ee.columbia.edu/ln/dvmm/lscom/</title>
        <p>To construct image features, we use a bag-of-features model with a visual
codebook, built classically using the well known SIFT descriptor and a K-Means
algorithm. The quanti cation of the extracted SIFT features was investigated in
two ways: (1) over all the instances associated to the selected concepts (LSCOM
and WordNet), (2) only over the LabelMe images and quanti cation per
concept. The two experimentations gave very similar results, and the results of the
experiment based on the rst codebook are resumed in Table 1.</p>
        <p>The values in the rst three matrices are correlations indicating high
similarity for positive values (low for non-positive). As we can see, the concept
WordNet:TV is weakly correlated to the chosen LSCOM concepts, and the
concept WordNet:House is highly correlated with LSCOM:Natural Disasters and
LSCOM:Single Familly Homes but not with LSCOM:US Flags. This is
coherent with the TRECVID2005 data considering that the images annotated with
LSCOM:US Flags are mostly images from speeches of politicians during
presidential elections. An example of an LSCOM image annotation that could be
extended to WordNet by the help the concept mapping is given in Fig. 1.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>The paper proposes an ontology matching technique to solve interoperability
issues in the area of semantic image annotation and retrieval. In particular,
we have addressed the problem of bridging the semantic gap by the help of
a generic instance-based ontology matching approach which aims at
automatically producing concept-based annotations enriched with a lexical description
of the concepts. In preliminary experiments, we have tested a concept similarity
measure on two small sets of concepts taken from the LSCOM ontology and
WordNet/LabelMe. Our results are in good agreement with the nature of the
instances associated to the selected LSCOM concepts. However, the e ciency of
the approach has to be tested on larger sets of concepts (currently in progress).
A large-scale application would also allow us to bene t from all the semantic
relations in WordNet, like hypernymy, meronymy, antonymy. In the future, we
plan to investigate the qualities of our automatic approach in terms of retrieval
e ciency as compared to approaches that solely rely on manual mappings.
Acknowledgments. This work is funded by the French National Research Agency
(ANR) through the COSINUS program (project COLLAVIZ ANR-08-COSI-003) and
by the region ^Ile de France through the SEBASTIAN2 project (Cap Digital cluster).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasiopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tzouvaras</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kompatsiaris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.G.</given-names>
            <surname>Strintzis. Enquiring</surname>
          </string-name>
          MPEG-
          <article-title>7 based multimedia ontologies</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          , pages
          <volume>1</volume>
          {
          <fpage>40</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          .
          <article-title>ImageNet: a large-scale hierarchical image database</article-title>
          .
          <source>In CVPR</source>
          , pages
          <volume>710</volume>
          {
          <fpage>719</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Integrating visual and semantic contexts for topic network generation and word sense disambiguation</article-title>
          .
          <source>ACM CIVR'09</source>
          , pages
          <issue>1{8</issue>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>I.</given-names>
            <surname>Guyon</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Elissee</surname>
          </string-name>
          .
          <article-title>An introduction to variable and feature selection</article-title>
          .
          <source>JMLR</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>1157</volume>
          {
          <fpage>1182</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Hudelot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Maillot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Thonnat</surname>
          </string-name>
          .
          <article-title>Symbol grounding for semantic image interpretation: from image data to semantics</article-title>
          .
          <source>In SKCV-Workshop</source>
          , ICCV,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          , L. van der Meij, S. Schlobach, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>An empirical study of instance-based ontology matching</article-title>
          .
          <source>The Semantic Web</source>
          , pages
          <volume>253</volume>
          {
          <fpage>266</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Y.G.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.W.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Hauptmann</surname>
          </string-name>
          .
          <article-title>Representations of keypoint-based semantic concept detection: A comprehensive study</article-title>
          .
          <source>IEEE Trans. on Multimedia</source>
          , in press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Koskela</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          .
          <article-title>An empirical study of inter-concept similarities in multimedia ontologies</article-title>
          .
          <source>In CIVR'07</source>
          , pages
          <fpage>464</fpage>
          {
          <fpage>471</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>WordNet: a lexical database for English</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <volume>39</volume>
          {
          <fpage>41</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>B.C.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.P.</given-names>
            <surname>Murphy</surname>
          </string-name>
          , and W.T. Freeman.
          <article-title>LabelMe: a database and web-based tool for image annotation</article-title>
          .
          <source>IJCV</source>
          ,
          <volume>77</volume>
          (
          <issue>1</issue>
          ):
          <volume>157</volume>
          {
          <fpage>173</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>A.W.M. Smeulders</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Worring</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Santini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jain</surname>
          </string-name>
          .
          <article-title>Content-based image retrieval at the end of the early years</article-title>
          .
          <source>IEEE Trans. Patt. An. Mach. Intell.</source>
          , pages
          <volume>1349</volume>
          {
          <fpage>1380</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.F.</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>Large-scale concept ontology for multimedia</article-title>
          .
          <source>IEEE Multimedia</source>
          ,
          <volume>13</volume>
          (
          <issue>3</issue>
          ):
          <volume>86</volume>
          {
          <fpage>91</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C.G.M. Snoek</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Huurnink</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Rijke</surname>
            , G. Schreiber, and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Worring</surname>
          </string-name>
          .
          <article-title>Adding semantics to detectors for video retrieval</article-title>
          .
          <source>IEEE Trans. on Mult.</source>
          ,
          <volume>9</volume>
          (
          <issue>5</issue>
          ):
          <volume>975</volume>
          {
          <fpage>986</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>K.</given-names>
            <surname>Todorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Geibel</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K.-U. Kuhnberger.
          <article-title>Extensional ontology matching with variable selection for support vector machines</article-title>
          .
          <source>In CISIS</source>
          , pages
          <volume>962</volume>
          {
          <fpage>968</fpage>
          . IEEE Computer Society Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lei</surname>
            <given-names>Wu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xian-Sheng</surname>
            <given-names>Hua</given-names>
          </string-name>
          , Nenghai Yu,
          <string-name>
            <surname>Wei-Ying Ma</surname>
            , and
            <given-names>Shipeng</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Flickr distance</article-title>
          .
          <source>In MM'08</source>
          , pages
          <fpage>31</fpage>
          {
          <fpage>40</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>