<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>La Rábida (Huelva)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>jacinto.mata</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>mariano.crespo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>manuel.mana}@dti.uhu.es</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <abstract>
        <p>This paper shows the experimentation and the results obtained for LABERINTO research group at the ImageCLEF 2011 medical task. We focus our work on image retrieval based on textual information related to the image. The initial hypothesis is that query expansion could improve the effectiveness of image retrieval systems. In this proposal, three different types of indexes were built and several information elements contained in MeSH ontology were used to expand the queries. The experiments carried out show that the expansion strategies using the MeSH ontology obtain good results for this task.</p>
      </abstract>
      <kwd-group>
        <kwd>Text-based image retrieval</kwd>
        <kwd>medical domain</kwd>
        <kwd>query expansion</kwd>
        <kwd>ontologies</kwd>
        <kwd>MeSH</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This paper describes the contribution of the LABERINTO research group in its first
participation at the Medical Image Retrieval task [1].</p>
      <p>This task of ImageCLEF 2011 uses a subset of PubMed Central1. This year, the
organization proposed three types of subtasks: Modality Classification, Ad-hoc
Image-based Retrieval and Case-based Retrieval. We are particularly interested in the
Ad-hoc Image-based Retrieval. This is the classic medical retrieval task, similar to
those organized in 2005-2010. Participants will be given a set of 30 textual queries
with 2-3 sample images for each query. The queries will be classified into textual,
mixed and semantic, based on the methods that are expected to yield the best results.</p>
      <p>In this work, we have used the MeSH2 [2] ontology for query expansion in order to
improve our medical image retrieval system. Query expansion is used in a search
engine when new terms are added to the user's query in order to increase the
efficiency in retrieval. Recently, systems based on query expansion are significantly
improving their results, making use of external resources such as ontologies and
lexical hierarchies.</p>
      <p>MeSH is an initiative from the U.S. National Library of Medicine. It is a controlled
vocabulary used for indexing articles from Medline. It consists of sets of terms called</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://www.ncbi.nlm.nih.gov/pmc/</title>
      <p>2 http://www.nlm.nih.gov/mesh/meshhome.html
descriptors, arranged in a hierarchical structure that enables the search at different
levels of specificity. There are currently 26,142 MeSH descriptors or Main Headings.
There are also over 177,000 alternative expressions, synonyms and terms related to
these descriptors, named entry terms.</p>
      <p>The rest of the paper is organised as follows. Section 2 describes the expansion
strategies used in the experiments. In Section 3 the results obtained are shown and
discussed. Finally, conclusions and future works are outlined in Section 4.
2</p>
      <sec id="sec-2-1">
        <title>Query Expansion using MeSH</title>
        <p>MeSH ontology offers many possibilities for expanding the query terms. There are
several works where studies on the effect of the use of the MeSH ontology for query
expansion are presented. In [3], the authors investigate a query expansion strategy
process using an advanced PubMed search called Automatic Term Mapping (ATM).
For this task, we have used several strategies for expansion based on the entry terms
similar to those used in [4] and other strategy based on the tree structure whereby
MeSH organises its descriptors [5].</p>
        <p>Many times a descriptor or entry term is made up of more than one term. For
example, if the query Mitral Valve was made for each term independently, neither
Mitral or Valve correspond to a descriptor or entry term. However, the union of the
two terms corresponds to a descriptor itself as "Mitral Valve", which is a biomedical
concept.</p>
        <p>That is the reason why each query was pre-processed by dividing it into n-grams,
with the aim of exploring all the possibilities offered by the query to obtain sequences
that are MeSH descriptors or entry terms. Below is an example of processing a query
with n-grams.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Query: Breast cancer mammogram</title>
      <p>N - Grams
(1): Breast
(2): Breast cancer
(3): Breast cancer mammogram
(4): cancer
(5): cancer mammogram
(6): mammogram
Where the n-gram 2 and 4 are entry terms and 1 is a descriptor.</p>
      <p>The following sections describe the strategies used to expand the queries.</p>
      <p>Techniques based on MeSH Tree-structure
This strategy is based on the tree structure whereby MeSH organises its descriptors.
In this case, if the descriptor is a parent node, it is expanded with its child descriptors.
If the descriptor does not have any children there is no expansion. Figure 1 shows a
brief MeSH tree excerpt which indicates that the Brain descriptor has seven children
while the Central Nervous System descriptor has three.
The first expansion strategy consists in exploring the MeSH tree by checking if the
query n-gram is a descriptor. If the n-gram is a descriptor, the query is expanded using
all the entry terms of the descriptor. If the n-gram is not a descriptor, we check if it is
an entry term. If so, its descriptor and all the entry terms of that descriptor are added
to the expansion.</p>
      <p>The second strategy has only a small variation from the first. When a n-gram in the
query is a descriptor, the query is expanded with the entry terms of the preferred
concept, instead of all the entry terms of that descriptor.</p>
      <p>When the results of these expansion strategies were calculated, it was found that
they introduced too much noise into the queries and the results were not as good as
expected. To this end, a filtering of the query was carried out to reduce redundant
entry terms. Figure 2 shows an example of a filtering process.
3</p>
      <sec id="sec-3-1">
        <title>Experiments and Results</title>
        <p>This section details the experiments that were conducted to evaluate various
expansion strategies. For this aim, three different indexes were created:
• Captions (C): This index contains the text of the captions of each image.
• Image Reference (IR): In this index, the sections of the paper that reference
each image were indexed. For this indexing, the text of the papers was split into
sentences using OpenNLP3 software and we have only indexed the sentences
which refer to an image.
• Full Text (FT): This index contains the full text of each paper.</p>
        <p>For this edition, three different runs for each indexing were sent:
• Baseline (B): Original queries.
• Concept Tree (CT): Queries expanded with techniques based on MeSH
Tree</p>
        <p>Structure.
• Entry Terms Peferred Concept (ETPC): Queries expanded with techniques
based on Entry Terms.</p>
        <p>Moreover, an additional run based on Entry Terms (ET) was sent.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3 http://incubator.apache.org/opennlp/</title>
      <p>In order to perform text indexing and run the different queries, Lucene4 search
engine was used with the default settings. Table 1 shows the results obtained with
each run.</p>
      <p>Looking at specific runs comparisons, we can further draw the following
conclusions:</p>
      <p>The best results were obtained using the index of the image captions. On the other
hand, the most effective expansion strategy was the expansion based on the MeSH
Tree Structure for all the indexes. The best result among all our runs was
laberinto_CTC (Concept Tree with Captions), which reached a MAP value of 0.2172.
This value was the highest among all the runs for textual retrieval type. With respect
to the strategy based on Entry Terms, we can observe that it retrieve more relevant
images than the other strategies. We think that it is also an effective strategy and we
will work to improve the MAP and to keep high values for relevant images retrieved.
4</p>
      <sec id="sec-4-1">
        <title>Conclusions and Future Work</title>
        <p>In this paper we have presented different query expansion strategies using one of the
most widely used ontologies in the medical domain, with the aim of enhancing the
efficacy of a textual content-based image retrieval system. Different MeSH ontology
elements were chosen for expansion.</p>
        <p>The results of our experiments showed that the expansion strategies using the
hierarchical structure whereby MeSH organises its descriptors, obtain good results for
this task. This work verified the difficulty of finding an appropriate strategy for query
expansion. We think that there are information elements or element combinations in
MeSH that might be used to expand the queries and could substantially improve an
image retrieval system.</p>
        <p>In future work, we will continue researching into other query expansion strategies
and the use of other ontologies, such as UMLS5 [6]. Moreover, we plan to build</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4 http://lucene.apache.org/</title>
      <p>5 http://www.nlm.nih.gov/research/umls/
indexes using only medical concepts extracted from the image captions. Finally, we
want to experiment expanding as the queries as the indexed text.</p>
      <sec id="sec-5-1">
        <title>Acknowledgments</title>
        <p>
          This work was partially funded by the Spanish Ministry of Science and Innovation,
the Spanish Government Plan E and the European Union through ERDF
          <xref ref-type="bibr" rid="ref5">(TIN200914057-C03-03)</xref>
          .
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Kalpathy-Cramer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bedrick</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eggel</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Garcia Seco de Herrera,
          <string-name>
            <given-names>A.</given-names>
            and
            <surname>Tsikrika</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>The CLEF 2011 medical image retrieval and classification tasks</article-title>
          .
          <source>CLEF 2011 working notes</source>
          , Amsterdam, The Netherlands.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Nelson</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schopen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savage</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schulman</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Arluk</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>The MeSH translation maintenance system: structure, interface, design and implementation</article-title>
          . M.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Fieschi</surname>
          </string-name>
          , et al. (Ed.).
          <source>Proceedings of the 11th World Congress on Medical Informatics</source>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>W.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wilbur</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Evaluation of query expansion using MeSH in PubMed</article-title>
          .
          <source>Information Retrieval</source>
          , Vol.
          <volume>12</volume>
          , No.
          <issue>1</issue>
          , pp.
          <fpage>69</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Díaz</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martín</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ureña</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Query expansion with a medical ontology to improve a multimodal information retrieval</article-title>
          .
          <source>Computers in Biology and Medicine</source>
          ,
          <volume>4</volume>
          ,
          <fpage>396</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Mata</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crespo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Maña</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Estudio del uso de ontologías para la expansión de consultas en recuperación de imágenes en el dominio biomédico</article-title>
          .
          <source>Procesamiento del Lenguaje Natural, nº 47.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>The Unified Medical Language System (UMLS): integrating biomedical terminology</article-title>
          .
          <source>Nucleic Acids Research</source>
          ,
          <volume>32</volume>
          (
          <year>2004</year>
          )
          <fpage>267</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>