<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-facet Document Representation and Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karam Abdulahhad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Pierre Chevallet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catherine Berrut</string-name>
          <email>catherine.berrut@imag.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UJF-Grenoble 1</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents our participation in ImageCLEF2011, in the two tasks: ad-hoc image-based retrieval and case-based retrieval, of the medical retrieval track. We participated through a simple IR model based on three hypotheses: 1) the amount of overlap between a document and a query, 2) the descriptive power of an indexing element, and 3) the discriminative power of an indexing element. We used three types of indexing elements: ngrams,</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Each information retrieval (IR) model has its advantages and drawbacks. In
other words, an IR model may perform well in some cases but badly in others.
In general, there is no IR model performs well in all cases [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Our model is not an exception. Therefore, a general mechanism to evaluate
the performance of an IR model and compare it with the performance of others
is needed. In this context, there are many campaigns of evaluation in IR eld,
e.g. CLEF1.</p>
      <p>First, our model is a text-based retrieval and very simple model. It uses
multiple types of indexing elements, e.g. ngrams, keywords, or concepts. The
goals of this research are: 1) studying the performance of the model itself, using
one of the indexing element types, 2) studying the e ects of using multiple
indexing element types at the same time on the performance.</p>
      <p>
        Second, CLEF (Cross-Language Evaluation Forum) is a yearly evaluation
campaign in Multilanguage information retrieval eld since 2000. ImageCLEF2
is a part of CLEF. It concerns searching medical images through documents that
contain text and images [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
1 http://www.clef-campaign.org/
2 http://www.imageclef.org/
      </p>
      <p>
        This year, ImageCLEF20113 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] contains four main tracks: 1) medical
retrieval, 2) photo annotation, 3) plant identi cation, and 4) Wikipedia retrieval.
Medical retrieval track contains three tasks: 1) modality classi cation, 2) ad-hoc
image-based retrieval which is an image retrieval task using textual, image or
mixed queries, and 3) case-based retrieval: in this task the documents are journal
articles extracted from PubMed4 and the queries are case descriptions.
      </p>
      <p>Third, we participated in the last two tasks: ad-hoc image-based retrieval
and case-based retrieval.</p>
      <p>The test collection of this year 2011 contains, according to each task: 1)
adhoc image-based retrieval: about 230,000 images with their text caption written
in English and 30 queries written in three languages: English, French, and
German. 2) case-based retrieval: about 55,000 articles written in English and 10
queries written in English.</p>
      <p>This paper is structured as follows: Section 2 describes in details our model
and the di erent types of indexing elements that are used. Section 3 presents
some technical aspects of applying this model for ImageCLEF2011 test
collection. Section 4 discusses the obtained results. We conclude in section 5.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>The Proposed Model</title>
      <sec id="sec-2-1">
        <title>Three Types of Indexing Elements</title>
        <p>Any IR model should contain two main components: indexing function and
matching function. The goal of the indexing function is to convert documents
and queries from their original form to another easy to use form.</p>
        <p>Index : D [ Q ! E
Where
D set of documents
Q set of queries
E set of indexing elements
E the set of all subsets of E</p>
        <p>Concerning indexing elements, three di erent types are used: ngrams (N G),
keywords (K), and concepts5 (C). Therefore, three indexing functions are exist
(one for each type).</p>
        <sec id="sec-2-1-1">
          <title>IndexNG : D [ Q ! ENG</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>IndexK : D [ Q ! EK</title>
          <p>
            3 http://www.imageclef.org/2011
4 http://www.ncbi.nlm.nih.gov/pubmed/
5 "Concepts" can be de ned as "Human understandable unique abstract notions
independent from any direct material support, independent from any language or
information representation, and used to organize perception and knowledge" [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. In
IR domain, to achive the conceptual indexing, each concept is associated to a set of
terms that describe it [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
(1)
(2)
(3)
          </p>
        </sec>
        <sec id="sec-2-1-3">
          <title>IndexC : D [ Q ! EC</title>
          <p>(4)
Where
ENG set of ngrams
EK set of keywords
EC set of concepts</p>
          <p>
            We believe that no single type of indexing elements could completely
represent the content of documents and queries, because: 1) there is no perfect
indexing function [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. It is always an approximate function, 2)
concerning concepts, the most of resources that contain concepts, e.g. UMLS6, are
incomplete [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], 3) each type covers an aspect of documents and queries
[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. Ngrams cover the statistical aspect, keywords cover the lexical aspect, and
concepts cover the conceptual aspect.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Matching Function</title>
        <p>
          Our model, as almost all models, depends on some hypotheses. Actually, it
depends on the following three hypotheses:
1. The more shared elements a document and a query have, the semantically
closer they are.
2. The descriptive power of an element (local weight): the more frequently an
element occurs in a document, the better it describes the document [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
3. The discriminative power of an element (global weight): the less number of
documents an element appears in, the more important it is [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>By taking these hypotheses into account, our model could be formulated.
For any type of indexing elements the Relevance Status Value (RSV) between a
document d and a query q is:</p>
        <p>RSV (d; q) = kd \ qk
(5)
X N
e2q Ne
fd;e
kdk
!
Where
d = feje 2 Index (d)g a document
q = feje 2 Index (q)g a query
kd \ qk = kfeje 2 Index (d) \ Index (q)gk the number of shared elements
between a document d and a query q
N the number of documents in the corpus
Ne = kfdje 2 Index (d)gk the number of documents that contain the element e
fd;e the number of occurances of an element e in a document d
kdk the number of elements in a document d
6 Uni ed Medical Language System. It is a meta-thesaurus in medical domain.</p>
        <p>http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=nlmumls
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Matching Function According to each Type of Indexing</title>
      </sec>
      <sec id="sec-2-4">
        <title>Elements</title>
        <p>Now after presenting our model in general, it should be instantiated according
to each type of indexing elements.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Ngrams</title>
        <p>RSVNG (d; q) = kd \ qkNG</p>
        <p>X N
ng2q Nng
fd;ng
kdkNG
!
Where
d = fngjng 2 IndexNG (d)g a document
q = fngjng 2 IndexNG (q)g a query
kd \ qkNG = kfngjng 2 IndexNG (d) \ IndexNG (q)gk the number of shared
ngrams between a document d and a query q
N the number of documents in the corpus
Nng = kfdjng 2 IndexNG (d)gk the number of documents that contain the
ngram ng
fd;ng the number of occurances of a ngram ng in a document d
kdkNG the number of ngrams in a document d</p>
        <p>
          Keywords: We added to this instance a new component, which is the length
of keyword (the number of characters). Here we supposed that the longer a
keyword is, the more information it contains [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>RSVK (d; q) = kd \ qkK
Where
kkk the number of characters in a keywords k</p>
      </sec>
      <sec id="sec-2-6">
        <title>Concepts</title>
        <p>0
kdkK
1
kkkA
RSVC (d; q) = kd \ qkC
X N
c2q Nc
fd;c
kdkC
!
2.4</p>
      </sec>
      <sec id="sec-2-7">
        <title>The Three Types in one Model</title>
        <p>
          As we said earlier, no single type of indexing elements could cover all aspects
of documents and queries. Therefore, merging all types (aspects) in one model
could enhance the performance of our model [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. One of the merging
formulas is:
        </p>
        <p>RSVall (d; q) = RSVNG (d; q) + RSVK (d; q) + RSVC (d; q)
(9)</p>
        <p>To increase the chance of retrieving more documents, another component
could be added to the Formula (9). The component represents an expansion
(6)
(7)
(8)
of the query q. It is kd \ qkexpan: the number of shared keywords between a
document d and a query q after replacing each query's concept c that does not
occur in d by the set of keywords that represent c7.</p>
        <p>RSVall (d; q) = RSVNG (d; q) + RSVK (d; q) + RSVC (d; q) + kd \ qkexpan (10)
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Model Validation</title>
      <sec id="sec-3-1">
        <title>Ad-hoc Image-Based Retrieval</title>
        <p>In this task, image captions are used as documents and the English part of
queries is just taken into account.</p>
        <p>
          Text indexing: we extracted three types of indexing elements:
1. 5gram8: before extracting 5grams from documents and queries, we deleted
all non-ASCII characters. Then we used ve-characters-wide window for
extracting 5grams with shifting the window one character each time.
2. Keywords: before extracting keywords from documents and queries, we deleted
all non-ASCII characters. Then we eliminated the stop words and stem the
remaining keywords using Porter algorithm to get nally the list of keywords
that index documents and queries.
3. Concepts: before mapping the text of documents and queries to concepts,
we deleted all non-ASCII characters. Then we mapped the text to UMLS's
concepts using MetaMap [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>Model variants: actually we experimented ve variants of our model in this
task, which are:</p>
        <p>RSVall (d; q) = RSV5G (d; q) + RSVK (d; q) + RSVC (d; q)</p>
        <p>RSVall (d; q) = (kd \ qk5G;K;C ) (sum5G;K;C )
(11)
(12)</p>
        <p>RSVall (d; q) = RSV5G (d; q) + RSVK (d; q) + RSVC (d; q) + kd \ qkexpan (13)
Where
kd \ qk5G;K;C = kd \ qk5G + kd \ qkK + kd \ qkC
sum5G;K;C = sum5G + sumK + sumC
sum5G = P5g2q NN5g kfddk;55gG
sumK = Pk2q NNk kfddk;kK
sumC = Pc2q NNc kfddk;cC</p>
        <p>RSVall (d; q) = (kd \ qk5G;K;C ) (sum5G;K;C + kd \ qkexpan)
RSVall (d; q) = ((kd \ qk5G;K;C ) (sum5G;K;C )) + kd \ qkexpan
(14)
(15)
7 We supposed that each concept is represented by a set of keywords in its resource
(we used UMLS as resource).
8 5gram is a ngram consists of ve characters. We picked out 5grams because they
gave the best results using ImageCLEF2010 comparing to the other ngrams.
3.2
In this task, articles are used as documents. We indexed documents and queries
in the same way as in the previous task. However, we extracted only two types
of indexing elements (4grams, keywords) because of technical reasons.</p>
        <p>Model variants: actually we experimented two variants of our model in this
task, which are:</p>
        <p>RSVall = RSV4G + RSVK
RSVall (d; q) = (kd \ qk4G;K )
(sum4G;K )
(16)
(17)
Where
kd \ qk4G;K = kd \ qk4G + kd \ qkK
sum4G;K = sum4G + sumK
sum4G = P4g2q NN4g kfddk;44gG
sumK = Pk2q NNk kfddk;kK
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and Discussion</title>
      <p>Before we start representing and discussing the obtained results, we will present
the names that we will use in our discussion for refering to each variant with its
corresponding formula and run's name that used in the o cial campaign9 (see
Table 1).
9 To see all results: http://www.imageclef.org/2011/medical
(see Formula 5) to compute the matching value between a document and a
query, we obtained good results. The run I CK5G 1 is ranked eighth out of
64 runs. That's because, we explicitly represented multi-facet (multi aspects)
of documents and queries. In other words, three types of elements: 5Grams,
Keywords, and Concepts are used and involved in the matching process.</p>
      <p>From another side, using di erent fusion formulas to merge the results of
using di erent types of indexing elements does not change a lot of things. Compare
the results of the two runs: I CK5G 1 and I CK5G 2 (see Table 2).</p>
      <p>In addition, the query expansion and the di erent formulas to integrate it
into the model did not add anything. Compare the results of the three runs:
I CK5G Q 1, I CK5G Q 2, and I CK5G Q 3 with the result of the run I CK5G 1
(see Table 2). That's because, the added value of query expansion is already
compensated by using keywords and 5Grams.</p>
      <p>Another noted result is that our model could retrieve the most relevant
documents comparing to the other runs10.
4.2</p>
      <sec id="sec-4-1">
        <title>Case-Based Retrieval</title>
        <p>The following table (see Table 3) contains the obtained results. The rst row
(Best) is the result of the rst ranked run in the case-based retrieval task.
0.1500
0.1444
0.1278
# rel ret</p>
        <p>Rank</p>
        <p>Here also, we obtained good results. The run C K4G 1 is ranked fth out
of 35 runs, knowing that, we used a very simple structure (set of elements), a
10 http://www.imageclef.org/2011/medical
simple matching formula (see Formula 5), and also two simple types of
indexing elements: Keywords and 4Grams. The two-facet (Keywords and 4Grams)
representation of documents and queries was useful.</p>
        <p>However, the formulas of merging the results of using di erent types of
indexing elements were more sensitive comparing to the formulas in ad-hoc
image-based retrieval task. Compare the results of the two runs: C K4G 1 and
C K4G 2 (see Table 3). That's maybe because, we used less number of elements'
types, two types (Keywords and 4Grams) in case-based retrieval comparing to
three types (Concepts, Keywords, and 5Grams) in ad-hoc image-based retrieval.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We presented in this paper our approach to index and retrieve documents. We
used three types of indexing elements (ngrams, keywords, concepts) for building
a multi-facet document representation, and then we used a simple formula based
on three hypotheses (the amount of overlap between a document and a query,
the descriptive power of an indexing element, and the discriminative power of
an indexing element) for retrieving documents, considering all facets (elements'
types) of documents.</p>
      <p>We obtained good results. The eighth out of 64 runs in the ad-hoc
imagebased retrieval task, and the fth out of 35 runs in the case-based retrieval task,
knowing that, we used a very simple structure for representing documents and
queries and also a very simple ranking formula.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Karam</given-names>
            <surname>Abdulahhad</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Pierre Chevallet</surname>
            , and
            <given-names>Catherine</given-names>
          </string-name>
          <string-name>
            <surname>Berrut</surname>
          </string-name>
          .
          <article-title>Solving concept mismatch through bayesian framework by extending umls meta-thesaurus. In la huitime dition de la COnfrence en Recherche d'Information et Applications (CORIA</article-title>
          <year>2011</year>
          ), Avignon, France, March
          <volume>16</volume>
          {
          <fpage>18</fpage>
          2011.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alan</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Aronson</surname>
          </string-name>
          . Metamap:
          <article-title>Mapping text to the umls metathesaurus</article-title>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Mustapha</given-names>
            <surname>Baziz</surname>
          </string-name>
          .
          <article-title>Indexation conceptuelle guide par ontologie pour la recherche d'information</article-title>
          . Thse de doctorat, Universit Paul Sabatier, Toulouse, France, dcembre
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Nicholas</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Belkin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cool</surname>
            ,
            <given-names>W. Bruce</given-names>
          </string-name>
          <string-name>
            <surname>Croft</surname>
            , and
            <given-names>James P.</given-names>
          </string-name>
          <string-name>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>The e ect multiple query representations on information retrieval system performance</article-title>
          .
          <source>In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <source>SIGIR '93</source>
          , pages
          <fpage>339</fpage>
          {
          <fpage>346</fpage>
          , New York, NY, USA,
          <year>1993</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>O</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Burgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G</given-names>
            <surname>Botti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Fieschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P Le</given-names>
            <surname>Beux</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F</given-names>
            <surname>Kohler</surname>
          </string-name>
          .
          <article-title>Evaluation of the uni ed medical language system as a medical knowledge source</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <volume>76</volume>
          {
          <fpage>87</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          , Anita Burgun, and Thomas C.
          <article-title>Rind esch. Lexically-suggested hyponymic relations among medical terms and their representation</article-title>
          .
          <source>In in the UMLS, in Proceedings of TIA2001</source>
          , 1121,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jean-Pierre Chevallet</surname>
          </string-name>
          .
          <article-title>endognes et exognes pour une indexation conceptuelle intermdia</article-title>
          .
          <source>Mmoire d'Habilitation a Diriger des Recherches</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jean-Pierre</surname>
            <given-names>Chevallet</given-names>
          </string-name>
          ,
          <article-title>Joo Hwee Lim, and Thi Hoang Diem Le</article-title>
          .
          <article-title>Domain knowledge conceptual inter-media indexing, application to multilingual multimedia medical reports</article-title>
          .
          <source>In ACM Sixteenth Conference on Information and Knowledge Management (CIKM</source>
          <year>2007</year>
          ), Lisboa, Portugal,
          <source>November</source>
          <volume>6</volume>
          {
          <fpage>9</fpage>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>W. Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Incorporating di erent search models into one document retrieval system</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>16</volume>
          :
          <fpage>40</fpage>
          {
          <fpage>45</fpage>
          , May
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Padima</surname>
          </string-name>
          Das-Gupta and
          <article-title>Je rey Katzer. A study of the overlap among document representations</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>17</volume>
          :
          <fpage>106</fpage>
          {
          <fpage>114</fpage>
          ,
          <year>June 1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Christopher</surname>
            <given-names>Dozier</given-names>
          </string-name>
          , Ravi Kondadadi, Khalid Al-Kofahi,
          <string-name>
            <given-names>Mark</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          , and
          <string-name>
            <surname>Xi</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
          </string-name>
          .
          <article-title>Fast tagging of medical terms in legal text</article-title>
          .
          <source>In ICAIL</source>
          , pages
          <volume>253</volume>
          {
          <fpage>260</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jayashree</surname>
            Kalpathy-Cramer, Henning Muller, Steven Bedrick, Ivan Eggel, Alba Garcia Seco de Herrera, and
            <given-names>Theodora</given-names>
          </string-name>
          <string-name>
            <surname>Tsikrika</surname>
          </string-name>
          .
          <article-title>The clef 2011 medical image retrieval and classi cation tasks</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>H. P.</given-names>
            <surname>Luhn</surname>
          </string-name>
          .
          <article-title>The automatic creation of literature abstracts</article-title>
          .
          <source>IBM J. Res. Dev.</source>
          , 2:
          <fpage>159</fpage>
          {
          <fpage>165</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>1958</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Henning Muller, Ivan Eggel, Joe Reisetter,
          <string-name>
            <given-names>Charles E.</given-names>
            <surname>Kahn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <article-title>Overview of the clef 2010 medical image retrieval track</article-title>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Jamie</given-names>
            <surname>Reilly</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Kean</surname>
          </string-name>
          .
          <article-title>Information content and word frequency in natural language: Word length matters</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>108</volume>
          (
          <issue>20</issue>
          ):
          <fpage>E108</fpage>
          , May
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Joseph</surname>
          </string-name>
          <article-title>A. Shaw and Edward A. Fox. Combination of multiple searches</article-title>
          .
          <source>In Text REtrieval Conference</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>