<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DAEDALUS at ImageCLEF Wikipedia Retrieval 2010: Expanding with Semantic Information from Context</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sara Lana-Serrano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Villena-Román</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Carlos González-Cristóbal</string-name>
          <email>josecarlos.gonzalez@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DAEDALUS - Data</institution>
          ,
          <addr-line>Decisions and Language, S.A</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Carlos III de Madrid</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Politécnica de Madrid</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <abstract>
        <p>This paper describes the participation of DAEDALUS at the ImageCLEF 2010 Wikipedia Retrieval task. The main focus of our experiments is to evaluate the impact in the image retrieval process of the incorporation of semantic information extracted only from the textual information provided as metadata of the image itself, as compared to expanding with contextual information gathered from the document where the image is referred. For the semantic annotation, DBpedia ontology and YAGO classification schema are used. As expected, the obtained results show that, in general, the textual information attached to a given image is not able to fully represent certain features of the image. Furthermore, the use of semantic information in the process of multimedia information extraction poses two hard challenges still to solve: how to automatically extract the high level features associated to a multimedia resource, and, once the resource has been semantically tagged, which features must be used in the retrieval process to best model the actual and complete meaning of the user query.</p>
      </abstract>
      <kwd-group>
        <kwd>Image retrieval</kwd>
        <kwd>domain-specific vocabulary</kwd>
        <kwd>ontology</kwd>
        <kwd>semantic expansion</kwd>
        <kwd>information retrieval</kwd>
        <kwd>indexing</kwd>
        <kwd>topic expansion</kwd>
        <kwd>context</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The basic goal of the ImageCLEF 2010 Wikipedia Retrieval task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was, similar to
previous campaigns, given a textual query and/or sample images describing a user’s
multimedia information need, find as many relevant images as possible from the
Wikipedia images collection. Each image in the collection is tagged with both its
user-provided annotation consisting of unstructured and noisy textual annotations in
English, French, and German, and also links to the article(s) that contain the image.
      </p>
      <p>This paper describes the participation of DAEDALUS team at the ImageCLEF
2010 Wikipedia Retrieval task. We are a research group led by and named after
DAEDALUS, a small private company in the field of Information and
Telecommunication Technologies and a leading provider of language-based solutions
in Spain, and research groups of two universities, Universidad Politécnica de Madrid
and Universidad Carlos III de Madrid. We have taken part in CLEF since 2003 in
many different tracks and tasks, as part of the MIRACLE team till last year.</p>
      <p>
        This year, the main objective of our experiments is to evaluate and compare the
results achieved by the application of techniques that are based on the computational
similarity between the metadata associated to the images and the query itself, as
opposed to other techniques based on the semantic description of the image based on
the contextual information provided by the Wikipedia article in which the image is
referred. For this purpose, the DBpedia ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the YAGO [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] classification
schema have been used as the knowledge base to annotate the semantic content,
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>System Description</title>
      <p>Based on our experience in previous campaigns in CLEF and other forums, we
designed a flexible system in order to be able to execute a large number of runs that
exhaustively cover many combinations of different techniques. Our system is
composed of a set of small components that are easily combined in different
configurations and executed sequentially to build the final result set. Specifically, our
system is composed of four modules:
• Linguistic processing module, which extract, parses and prepares the input text
for subsequent modules.
• Semantic module, which expands documents and/or topics with semantic
information retrieved from knowledge base.
• Textual (text-based) retrieval module, which indexes image annotations in order
to search and find the list of images that are most relevant to the text of the topic.
• Result combination module, which uses the OR operator to combine, if
necessary, two different result lists.</p>
      <p>
        A common baseline algorithm was used in all experiments to process the
collection, following these steps:
1. Text Extraction: Ad-hoc scripts are run on the files that contain image
annotations, on the Wikipedia articles and on the topics. The purpose of this
process is to generate the different collections or topics that set up the different
specific features of each experiment.
2. Tokenization: This process extracts the basic textual components in the
annotations. Some basic entities are also detected, such as numbers, initials,
abbreviations, and years. So far, compounds, proper nouns, acronyms or other
types of entity are not specifically considered. The outcomes of this process are
single words, multi-words, years in numbers and tagged entities resulting from the
application of the semantic module.
3. Conversion to lowercase: All document terms are normalized by changing all
letters to lowercase.
4. Filtering: All words recognized as stopwords are filtered out. Stopwords in the
target languages were initially obtained from the University of Neuchatel’s
resources page [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and afterwards extended using our own developed resources.
5. Stemming: This process is applied to each one of the words to be indexed or used
for retrieval. Standard Porter stemmers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for each considered language have been
used.
6. Indexing and retrieval: Lucene [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was used as the information retrieval engine
for the whole textual indexing and retrieval task.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>The main idea behind our experiments is to evaluate and compare the results achieved
by the application of techniques that are based on the computational similarity
between the metadata associated to the images and the query itself, opposed to other
techniques based on the semantic description of the image using the contextual
information provided by the Wikipedia article in which the image is referred.</p>
      <p>The following fields have been considered as contextual information:
• the metadata associated to the image itself (C),
• the title of the article (T), and
• the first paragraph in the article (S).</p>
      <p>
        The core knowledge base for the semantic expansion is a subset of the DBpedia
ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], conveniently adapted and formatted to our purposes, and using the
YAGO classification schema. YAGO [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a huge semantic knowledge base, part of
the YAGO-NAGA project at the Max-Planck Institute for Informatics in Saarbrücken
(Germany). It currently, holds more than 2 million entities (persons, organizations,
cities, etc.) with over 20 million facts about these entities. YAGO has a manually
confirmed accuracy of 95%, unlike many other automatically assembled knowledge
bases.
      </p>
      <p>Our resulting knowledge base contains 1,651,225 entities, 226,087 hierarchically
related classes by means of 225,781 subClassOf relations and 4,121,043 typeOf
relations among entities and classes.</p>
      <p>Afterwards, an entity identification process is run using the information contained
in the knowledge base using a parser specifically developed for this task. Last, the
semantic information generated as the output is built by adding up the information
about the entity itself, the information about its class(es) and all the ancestors of its
class(es).</p>
      <p>Finally we submitted 6 experiments to be evaluated, described in Table 1. .</p>
      <p>NO
C + T</p>
      <p>C + T
C + T + S</p>
      <p>C + T + S</p>
      <p>The results achieved after the evaluation of these experiments are shown in next
Table 2. . The highest figures are highlighted in bold.</p>
      <sec id="sec-3-1">
        <title>DAEDALUS_Bas</title>
      </sec>
      <sec id="sec-3-2">
        <title>DAEDALUS_NER_Bas</title>
      </sec>
      <sec id="sec-3-3">
        <title>DAEDALUS_W_CT</title>
      </sec>
      <sec id="sec-3-4">
        <title>DAEDALUS_NER_W_CT</title>
      </sec>
      <sec id="sec-3-5">
        <title>DAEDALUS_W_CTS</title>
      </sec>
      <sec id="sec-3-6">
        <title>DAEDALUS_NER_W_CTS MAP</title>
        <p>0.1492
0.1249
0.1820
0.1610
0.1737
0.1593</p>
        <p>NDCG
0.2377
0.2115
0.2662
0.2514
0.2478
0.2342</p>
        <p>A first preliminary evaluation of these figures shows that, globally, the contextual
expansion greatly helps to improve the retrieval results and the semantic expansion
tends to make them worse. However, this conclusion is not completely true because of
the fact that in the retrieval process, the semantic terms have been boosted with
respect to the contextual terms by assigning the first ones a higher relevance factor.
This was initially done to be able to better analyze the impact of the semantic
expansion, but finally it turned out not to be a good idea. Actually, this issue causes
that the final results for the semantic experiments don’t exactly reflect the behavior of
an actual retrieval system.</p>
        <p>If a deeper analysis is done, it is interesting to notice that, independently of the
experiment, the precision levels are quite low when the queries include any reference
to primitive features of the image (“white house with garden”, “red fruits”, “yellow
buses”, “close up of antenna”) or to high-level semantic features such as actions
(“people playing guitar”, “people laughing”) or perceptions.</p>
        <p>Moreover, we can also notice that, regardless of the precision level of the results,
in general, the incorporation of semantic information for a given topic always
produces similar effects (improvements or reductions) independently of the type of
contextual information that has been applied. Considering the impact that the use of
semantic information has produced in the retrieval process, the following groups of
queries haven been identified:
• Queries that couldn’t be annotated with semantic information (“lightning in the
sky”).</p>
        <p>YES
NO
YES
NO
YES
0.3556
0.3255
0.4055
0.3801
0.4315
0.4036</p>
      </sec>
      <sec id="sec-3-7">
        <title>Relevant</title>
        <p>Retrieved
6088
5628
6453
6038
6871
6105
• Queries in which the use of semantic information produces slight improvements in
the results (“horseman”, “civil airplane”).
• Queries in which the use of semantic information significantly improves the results
(see Table 3). In those queries, the weighting of the semantic terms with respect to
the contextual ones, has turned out to be successful to model the specific semantic
features that better represent the full meaning of the original query.
• Queries in which the use of semantic information significantly reduces the
precision of the results (see Table 4). The semantic representation of the query has
extracted one or more features that are too general (mainly associated to the first
levels of the ontology in the knowledge base), thus contributing with search terms
that are not very precise, which in turn produce a very high volume of relevant
documents. This fact, combined to the semantic terms boosting described before,
have caused a significant decrease in the precision of the results.</p>
        <p>W_CT</p>
        <sec id="sec-3-7-1">
          <title>Topic 8: tennis player on court</title>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>Topic 15: cyclist</title>
        <sec id="sec-3-8-1">
          <title>Topic 16: spider with cobweb</title>
          <p>MAP
R-prec.</p>
          <p>MAP
R-prec.</p>
          <p>MAP
R-prec.</p>
          <p>MAP
R-prec.</p>
          <p>MAP
164.25%</p>
        </sec>
        <sec id="sec-3-8-2">
          <title>Topic 55: building site Topic 70: close up of trees</title>
          <p>590.53%
390.76%
307.05%
329.05%
511.69%
400.54%
352.29%
110.53%
429.28%</p>
          <p>W_CTS
37.88%
12.81%
50.42%
12.51%
140.14%
100.00%
402.13%
100.00%
213.17%
After the detailed analysis of the achieved results for each of the topics, we can point
out that the text-based information retrieval techniques applied to image retrieval only
provide good results when the formulated queries exactly make reference to the
semantic or contextual content of the image (images including something o located
somewhere), but tend to be of no application for the extraction of primitive features
(such as color, brightness, texture, shapes, corner points or its spatial distribution) or
high-level semantic features about the meaning and purpose of the objects or scenes
depicted (sentiments, emotions, actions, perceptions).</p>
          <p>For the first case, the incorporation of semantic information, based on the
contextual information of the article in which the image is referred, usually improves
the results for those queries in which the semantic information contributes with
specific terms that narrow the search. For instance, the semantic information
corresponding to the “tennis player on court” topic may help to select images
associated to the “tennis player” class; however, the semantic information
corresponding to the “cities at night” topic broadens the search to all images that
show any of the subclasses extending from “city”, which turns out to be very noisy.</p>
          <p>Consequently, it seems that our future efforts should be focused first to study how
to better apply any content-based image retrieval technique that helps us to extract the
semantics of the image itself, and, on the other hand, to try and find the answers to the
following open issues: 1) Should the semantic information be taken into account for
all queries during the retrieval process? 2) In any case, should it have a specific
processing depending on the query type? 3) Would it be a good idea to assign the
same weight during the retrieval process to the semantic information associated to a
given entity, or is it better to make this value dependent on the information class
and/or the query type?</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by the Spanish Center for Industry
Technological Development (CDTI, Ministry of Industry, Tourism and Trade),
through the CONTENIDOS A LA CARTA Project, INGENIO 2010 Programme,
AVANZA I+D 2008. Other partners in the Project are Agencia EFE, Germinus XXI,
11870.com and Universidad Politécnica de Madrid.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Popescu</surname>
          </string-name>
          . A.;
          <string-name>
            <surname>Tsikrika</surname>
          </string-name>
          . T. and
          <string-name>
            <surname>Kludas</surname>
          </string-name>
          .
          <source>J. Overview of the Wikipedia Retrieval task at ImageCLEF 2010. Working Notes of CLEF 2010. Padova. Italy</source>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>2. The DBpedia Knowledge Base</article-title>
          . http://wiki.dbpedia.org/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. YAGO:
          <article-title>A Core of Semantic Knowledge</article-title>
          . http://www.mpi-inf.mpg.de/yagonaga/yago/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>4. University of Neuchatel. IR Multilingual Resources at UniNE. http://members.unine.ch/jacques.savoy/clef/index.html</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Porter</surname>
          </string-name>
          . M.
          <article-title>Snowball stemmers and resources page</article-title>
          . http://www.snowball.tartarus.org
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Apache</surname>
          </string-name>
          <article-title>Lucene project</article-title>
          . http://lucene.apache.org
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>