<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Query</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Word Indexing Versus Conceptual Indexing in Medical Image Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karim Gasmi</string-name>
          <email>karimgasmi@yahoo.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mouna Torjmen-Khemakhem</string-name>
          <email>torjmen.mouna@redcad.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maher Ben Jemaa</string-name>
          <email>maher.benjemaa@enis.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research unit on Development and Control of Distributed Applications (ReDCAD), Department of Computer Science and Applied Mathematics, National School of Engineers of Sfax, University of Sfax</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1840</year>
      </pub-date>
      <volume>7</volume>
      <issue>0</issue>
      <abstract>
        <p>This paper presents our participation in medical image retrieval task of ImageCLEF 2012. Our aim is to study the effectiveness of using conceptual indexing comparing to word indexing in medical image retrieval. For this aim, we have used in the one hand the Terrier tool for textual indexing and for textual retrieval, and on another hand, the MetaMap tool for conceptual indexing and Vector model for conceptual retrieval. More precisely, the run of the BM25 model is considered as a baseline. For textual indexing, we tried to compare different weighting formulas. However, for conceptual indexing, we Used BM25 model results to extract concepts and rerank results using vector model. Results show that the use of the textual indexing is more useful than the conceptual indexing. However, the conceptual indexing improves the result of some queries, which encourages us to continue the study of conceptual indexing and retrieval.</p>
      </abstract>
      <kwd-group>
        <kwd>medical image retrieval</kwd>
        <kwd>information retrieval model</kwd>
        <kwd>reranking</kwd>
        <kwd>conceptual indexing</kwd>
        <kwd>metamap</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Classical Information Retrieval (IR) models retrieve documents that have the
same words (at least in part) that the query. But meaning can be expressed by
different words, and the same word can express different meanings in different
contexts. This false assumption is exactly the pitfalls of traditional approaches to
IR. Overcome these limitations is the subject of several recent research projects.
This is particularly true of the IR approach known as ”based concepts.”
The choice of information retrieval model is a crucial task, which directly affects
the result of any system of information retrieval, for that, we decided to work
on the evaluation of different information retrieval models using the Terrier IR
platform 1 and we tried to improve these models by using a conceptual indexing.</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://terrier.org/docs/v3.5/</title>
      <p>Our model uses two types of indexing: words and or concepts to re-rank the result
obtained by the BM25 model. The goals of this research are:
1. To study the influence of using of each retrieval model on the information
retrieval system performance,
2. To study the influence of using two series retrieval models on the information
retrieval system performance.
3. To study the effects of using concepts for indexing on the information
retrieval system performance.</p>
      <p>
        We have summarized the two indexing methods in the following figure (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
Our paper is organized as follows: in section 2, we describe the models of
image retrieval used in different runs. Then we describe in Section 3, the
conceptual indexing of medical image and before the conclusion we are done with
the section 4, which describes the run and the result obtained.
2
      </p>
      <p>Word Indexing of medical images
The manual textual indexing of images is usually performed by a librarian named
iconographer. Its role is to categorize and index images by associating them to
categories and groups of words, often taken from a thesaurus, to quickly find the
images. Unfortunately, the choice of terms for indexing is a problem for picture
researchers, because it is impossible that the user choose the same keywords as
those chosen by the iconographer. So the indexing of an image is subjective,
because several indexing are possible.</p>
      <p>Despite its subjectivity, manual indexing is an effective method to associate a
meaning to images. However, to index a large volume of images, this work quickly
becomes tedious or impossible, which is not the case for automatic indexing.
The automatic textual indexing of images is to associate words in an image using
a computer system without human intervention. The indexing textual images on
the web can be done from the words in the page title or the most frequent or
relevant words to this page.</p>
      <p>Every system of information retrieval needs a weighting model , but the weighting
term process must provide an iconic representation, compact and informative
content of the documents regarding the terms of queries. It should provide an
indicator of importance to discriminate the terms towards each other.
Although several approaches and techniques have been developed using this
factor of importance (weight terms), yet they almost all use these two terms:
TF (term frequency): a term more frequently in a document, it is more important
in the document.</p>
      <p>IDF (Inverse Document Frequency): a term is uncommon in the collection, it is
more important in the document.
2.1</p>
      <p>
        Model BM25 [5]
This is a ranking function used by search engines to rank matching documents
according to their relevance to a given query. It is a probabilistic model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) :
BM 25 = ∑ (
t2q\d
      </p>
      <p>tf
tf + k1:nb
:log
( N</p>
      <p>dft + 0:5 )
dft + 0:5
:qtf
)
with:
{ tf : frequency of term occurrences,
{ N : total number of documents in the collection,
{ df t: number of documents containing a term t,
{ qtf : frequency of occurrences of a term t in the query,
{ k1: parameters influencing the frequency of terms that is adjusted to 1.2 by
default,
{ nb: normalization factor is calculated as follows:
nb = (1
b) + b:</p>
      <p>
        tl
tlavg
with:
tl: Number of terms in the document (document length),
tlavg : Average number of words in a document,
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
2.2
      </p>
      <p>
        Model TF IDF (Term Frequency Inverse Document Frequency)
This model works by determining the relative frequency of words in a specific
document compared to the inverse proportion of that word over the entire
document corpus (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) 2.
      </p>
      <p>T F</p>
      <p>IDF = Roberston tf idf</p>
      <p>
        Kf
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
{ idf = log (dfn+d1 )
{ Roberston tf=k1
      </p>
      <p>
        tf
tf+k1 (1 b+ tlbavdg )
{ tf : The term frequency of the term in the document
{ dl: The document’s length
{ df : The document frequency of the term
{ Kf : The term frequency in the query
{ nd: Nombre de documents
Inverse expected document frequency model for randomness, the ratio of two
Bernoulli’s processes for first normalisation, and Normalisation 2 for term
frequency normalisation (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ).
      </p>
      <p>! (t; d) =</p>
      <p>
        F + 1
nt: (tf n + 1)
(
tf n:log2
( N + 1 ))
ne + 0:5
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
2.4
      </p>
      <p>
        Model BB2 [1][4]
Bose-Einstein model for randomness, the ratio of two Bernoulli’s processes for
first normalisation, and Normalisation 2 for term frequency normalization (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ).
! (t; d) =
      </p>
      <p>
        F +1
nt:(tfn+1) ( log2 (N
1)
log2 (e) + f (N + F
1; N + F
tnf
2)
f (F; F
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
tnf ))
{ ! (t; d) is the within-document term weight of the term t in the document d,
{ tf is the within-document frequency of the term t in the document d,
{ F is the term frequency of the term t in the whole collection,
{ N is the number of documents in the collection,
{ nt is the document frequency of the term t,
{ is given by NF ,
      </p>
    </sec>
    <sec id="sec-3">
      <title>2 http://terrier.org/docs/v3.5/</title>
      <p>{ ne = N: (1 (1 nNt )F ),
{ f (n; m) = (m + 0:5) :log2 (n )+ (n</p>
      <p>m
{ tf n = tf:log2 1 + c: avg l ),
(</p>
      <p>l
m) log2n,
where c is a parameter. l and avg l are the document length of the document
d and the average document length in the collection respectively.
3</p>
      <p>Conceptual Indexing of medical images
For extracting concept from ”caption + title” of each image, we choose to use
MetaMap3 which performs the following steps [3]
{ 1-Parse the text into noun phrases
{ 2-Look for variants for each nominal sentence, with a variant consists of a
noun phrase or words with all its variant spellings, abbreviations, acronyms,
synonyms, inflectional and derivational variants, and meaningful
combinations of these;
{ 3-Look for different candidates from all metathesaurus strings containing one
of the variants found in step 2;
{ 4-By using an evaluation function, compute the mapping from the noun
phrase and calculate the strength of the mapping, this step is performed for
each candidate, finding during stage 3.;
{ 5-Combine candidates involved with disjoint parts of the noun phrase,
recompute the match strength based on the combined candidates, and select those
having the highest score to form a set of best Metathesaurus mappings for
the original noun phrase.</p>
      <p>The evaluation function used to calculate the strength of the mapping, is
based on four components: centrality, variation, coverage, and cohesiveness. A
normalized value between 0 (the weakest match) and 1 (the strongest match) is
computed for each of these components.
4</p>
      <sec id="sec-3-1">
        <title>Evaluation</title>
        <p>The collection used in the medical retrieval task is sized of 2.5 GB and it consists
of 306,528 documents. The number of queries is 22[2].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3 http://metamap.nlm.nih.gov/</title>
      <p>For the word indexing runs, we used Terrier IR platform4, the open source
search engine written in Java and developed at the School of Computing,
University of Glasgow.</p>
      <p>For the conceptual indexing runs, MetaMap has been used: a document
(respectively a query) is represented as a set of weighted concepts extracted using
the UMLS 5.</p>
      <p>For all runs, title, caption and abstract were used to represent images. Our
runs are described as follows:
4.1</p>
      <p>Evaluation of the use of textual indexing
{ Run1-Terrier CapTitAbs BM25b0.75 (BM25): this is our official run,
which uses BM25 as an information retrieval model and Terrier as a tool for
textual indexing.
{ Run2-Terrier CapTitAbs BB2 (BB2): this run uses model BB2,which
is described in the 2.4 subsection, to compare the result obtained with that
obtained by BM25.
{ Run3-Terrier CapTitAbs In expB2 (In expB2): using the In expB2 model,which
is described in the 2.3 subsection, to compare the result obtained with that
obtained by BM25.
{ Run4-Terrier CapTitAbs TF IDF (TF IDF): TF IDF used for the
calculation of weight, and as an information retrieval model, to compare the
result obtained with that obtained by BM25 as an information retrieval
model .
we used Map (Mean Average Precision) as evaluation measure, results of
table 1 show the difference between the four information retrieval models.
According to the results, we observed that In expB2 model has higher Mean Average
Precision than other models. TF IDF model has lower Mean Average Precision</p>
    </sec>
    <sec id="sec-5">
      <title>4 http://terrier.org/docs/v3.5/</title>
    </sec>
    <sec id="sec-6">
      <title>5 http://www.nlm.nih.gov/research/umls/</title>
      <p>than other models.</p>
      <p>{ Run5-Terrier CapTitAbs BM25-DFR BM25: For this run, we used
the model DFR BM25 for the task of re-rank the result obtained by the
run-1
{ Run6-Terrier CapTitAbs BM25-In expB2: we used the same principle
as that of run-5, but we used the In expB2 model to re-rank.
{ Run7-Terrier CapTitAbs BM25-TF IDF: For this run, we used the
model TF IDF for the task of re-rank the result obtained by the run-1
To improve the results obtained by the BM25 model, we try to sort the result
by another model.Table 2 show obtained result.These Map confirm that the use
of another model to re-rank the baseline result can help and improve the result
obtained by BM25 only.</p>
      <p>But the BM25 model gives the best results for P @ 5. So according to the needs
of information retrieval system, we can choose between different models.</p>
      <p>With the BM25 model, as shown in Table 3, we obtain a result which is
really better than that achieved by the use of concepts. However, analyzing results
query by query, we discovered that , using conceptual indexing can improve
results for some queries.
Also, the results obtained by the concepts can be improved if used right from
the beginning, without indexing text as a first step, because it perhaps that the
result obtained by BM25 affects negatively on the result achieved by the
concept. Because these are two different types of indexing.</p>
      <p>We can improve also the result obtained by the use of the concept by the
implementation of another model instead of vector model.
5</p>
      <sec id="sec-6-1">
        <title>Conclusion and future Work</title>
        <p>Along this paper, we have compared the use of word indexing and conceptual
indexing.</p>
        <p>Results show that using word indexing is better than using conceptual indexing.
However, we note that conceptual indexing improves significatively some queries.
This finding encourages us to more work in conceptual indexing, and also in
conceptual retrieval. In future work, we plan to continue studying conceptual
indexing and to propose a conceptual retrieval model for medical image retrieval.
We plan also to propose a mixed approach that combines the visual appearance
of an image and conceptual description.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ben</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iadh</given-names>
            <surname>Ounis</surname>
          </string-name>
          .
          <article-title>A query-based pre-retrieval model selection approach to information retrieval</article-title>
          .
          <source>In RIAO</source>
          , pages
          <fpage>706</fpage>
          -
          <lpage>719</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Jayashree</given-names>
            <surname>Kalpathy-Cramer Dina Demner Fushman Sameer Antani Ivan Eggel Henning Mller</surname>
          </string-name>
          , Alba Garcia Seco de Herrera.
          <article-title>Overview of the imageclef 2012 medical image retrieval and classification tasks</article-title>
          .
          <source>CLEF 2012 working notes</source>
          , Rome, Italy,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Quanzhi</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <article-title>Yi fang Brook Wu</article-title>
          .
          <article-title>Identifying important concepts from medical documents</article-title>
          . pages
          <fpage>668</fpage>
          -
          <lpage>679</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Sobhana</surname>
            <given-names>N.V.</given-names>
          </string-name>
          <article-title>Enhancing retrieval of geological text using named entity disambiguation</article-title>
          .
          <source>International Journal of Emerging Technology and Advanced Engineering</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>2250</fpage>
          -
          <lpage>2459</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
            , Steve Walker, Micheline Hancock-Beaulieu,
            <given-names>Aarron</given-names>
          </string-name>
          <string-name>
            <surname>Gull</surname>
            , and
            <given-names>Marianna</given-names>
          </string-name>
          <string-name>
            <surname>Lau</surname>
          </string-name>
          .
          <article-title>Okapi at TREC</article-title>
          .
          <source>In Text REtrieval Conference</source>
          , pages
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>