<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ITI's Participation in the ImageCLEF 2012 Medical Retrieval and Classi cation Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthew S. Simpson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daekeun You</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Mahmudur Rahman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Demner-Fushman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sameer Antani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Thoma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lister Hill National Center for Biomedical Communications, U. S. National Library of Medicine, NIH</institution>
          ,
          <addr-line>Bethesda, MD</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article describes the participation of the Image and Text Integration (ITI) group in the 2012 ImageCLEf medical retrieval and classi cation tasks. We present our methods for each of the three tasks and discuss our submitted textual, visual, and mixed runs as well as their results. While our methods generally perform well for each task, our best ad-hoc image retrieval submission was ranked rst among all the submissions from the participating groups.</p>
      </abstract>
      <kwd-group>
        <kwd>Image Retrieval</kwd>
        <kwd>Case-based Retrieval</kwd>
        <kwd>Image Modality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This article describes the participation of the Image and Text Integration (ITI)
group in the ImageCLEF 2012 medical retrieval and classi cation tasks. Our
group is from the Communications Engineering Branch of the Lister Hill National
Center for Biomedical Communications, which is a division of the U. S. National
Library of Medicine.</p>
      <p>
        The medical track [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] of ImageCLEF 2012 consists of an image modality
classi cation task and two retrieval tasks. For the classi cation task, the goal is
to classify a given set of medical images according to thirty-one modalities (e.g.,
\Computerized Tomography," \Electron Microscopy," etc.). The modalities are
organized hierarchically into meta-classes such as \Radiology" and \Microscopy,"
which are themselves types of \Diagnostic Images." In the rst retrieval task, a
set of ad-hoc information requests is given, and the goal is to retrieve the most
relevant images from a collection of biomedical articles for each topic. Finally, in
the second retrieval task, a set of case-based information requests is given, and
the goal is to retrieve the most relevant articles describing similar cases.
      </p>
      <p>
        In the following sections, we describe the textual and visual features that
comprise our image and article representations (Sections 2{3) and our methods
for the modality classi cation (Section 4) and medical retrieval tasks (Sections
5{6). Our textual approaches primarily utilize the Uni ed Medical Language
System R (UMLS R ) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] synonymy to identify concepts in topic descriptions and
article text, and our visual approaches rely on computed distances between
descriptors of various low-level visual features. In developing mixed approaches,
we explore the use of clustered visual features that can be represented using text,
attribute selection, and ranked list merging strategies.
      </p>
      <p>In Section 7 we describe our submitted runs, and in Section 8 we present
our results. For the modality classi cation task, our best submission achieved a
classi cation accuracy of 63:2% and was ranked within the submissions from the
top three participating groups. Our best submission for the ad-hoc image retrieval
task was ranked rst overall, achieving a mean average precision of 0.2377, which
is a statistically signi cant improvement over the second ranked submission. For
the case-based article retrieval task, our best submission achieved a mean average
precision of 0.1035 and was ranked within the submissions from the top four
participating groups, but this submission is statistically indistinguishable from
our other case-based submissions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Image Representation for Ad-hoc Retrieval</title>
      <p>We represent the images contained in biomedical articles using a combination of
the textual and visual features described below.</p>
      <sec id="sec-2-1">
        <title>2.1 Textual Features</title>
        <p>We represent each image in the collection as a structured document of
imagerelated text called an enriched citation. Our representation includes the title,
abstract, and MeSH R terms1 of the article in which the image appears as well
as the image's caption and \mentions" (snippets of text within the body of an
article that discuss an image). These features can be indexed with a traditional
text-based information retrieval system, or they may be exposed as term vectors
and combined with the visual feature vectors described below.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Visual Features</title>
        <p>In addition to the above textual features, we also represent the visual content
of images using various low-level visual descriptors. Table 1 summarizes the
descriptors we extract and their dimensionality. Due to the large number of
these features, we forego describing them in any detail. However, they are all
well-known and discussed extensively in existing literature.</p>
        <p>
          Cluster Words. To avoid the computational complexity of computing distances
between the above visual descriptors, we create a textual representation of visual
features that is easily integrated with our existing textual features. For each
visual descriptor listed in Table 1, we cluster the vectors assigned to all images
using the k-means++ [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] algorithm. We then assign each cluster a unique \cluster
word" and represent each image as a sequence of these words. We add an image's
cluster words to its enriched citation as a \global image feature" eld, which can
be searched using a traditional text-based information retrieval system.
Attribute Selection. An orthogonal approach to transforming our visual
descriptors into a computationally manageable representation is attribute selection.
By eliminating unneeded or redundant information, these techniques can also
improve our modality classi cation and image retrieval methods. We perform
attribute selection using the WEKA [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] data mining software. First, we group
all our visual descriptors into a single combined vector, and we then perform
attribute selection to reduce the dimensionality of this combined feature.
1 MeSH is a controlled vocabulary created by U. S. National Library of Medicine to
index biomedical articles.
Autocorrelation
Color and edge directivity? (CEDD) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
Color layout? (CLD) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
Color moment
Edge frequency
Edge histogram? (EHD) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
Fuzzy color and texture histogram? (FCTH) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
Gabor moment?
Gray-level co-occurrence matrix moment (GLCM) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]
Local binary pattern (LBP1) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
Local binary pattern (LBP2) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
Local color histogram (LCH)
Primitive length
Scale-invariant feature transformation? (SIFT) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
Semantic concept (SCONCEPT) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
Shape moment
Tamura moment? [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
Combined
Dimensionality
        </p>
        <p>Article Representation for Case-based Retrieval
We represent articles using the textual features of each image appearing in the
article. Thus, each article enriched citation consists of its title, abstract, and
MeSH terms as well as the caption and mention of each contained image.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Modality Classi cation Task</title>
      <p>We experimented with both at and hierarchical modality classi cation methods.
Below we describe our at classi cation strategy, an extension of this approach
that exploits the hierarchical structure of the classes, and a post-processing
method for improving the classi cation accuracy of illustrations.</p>
      <sec id="sec-3-1">
        <title>4.1 Flat Classi cation</title>
        <p>Figure 1a provides an overview of our basic classi cation approach. We utilize
multi-class support vector machines (SVMs) as our at modality classi ers.
First, we extract our visual and textual image features from the training images
(representing the textual features as term vectors). Then, we perform attribute
selection to reduce the dimensionality of the features. We construct the
lowerdimensional vectors independently for each feature type (textual or visual) and
combine the resulting attributes into a single, compound vector. Finally, we use
the lower-dimensional feature vectors to train multi-class SVMs for producing
textual, visual, or mixed modality predictions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2 Hierarchical Classi cation</title>
        <p>
          Unlike the at classi cation strategy described above, it is possible to exploit the
hierarchical organization of the modality classes in order to decompose the task
 
into several smaller classi cation problems that can be sequentially applied. Based
on our visual observation of the training samples and our initial experiments, we
modi ed the original modality hierarchy [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] proposed for the task. The hierarchy
we used for our experiments is shown in Figure 1b.
        </p>
        <p>
          We train at multi-class SVMs, as shown in Figure 1a, for each meta-class. For
recognizing compound images, we utilize the algorithm proposed by Apostilova et
al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which detects sub- gure labels and the border of each sub- gure within a
compound image. To arrive at a nal class label, an image is sequentially classi ed
beginning at the root of the hierarchy until a leaf class can be determined.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>4.3 Illustration Post-processing</title>
        <p>Because our initial classi cation experiments resulted in only modest accuracy
for the fourteen \Illustration" classes shown in Figure 1b, we concluded that
our current textual and visual features may not be su cient for representing
these gures. Therefore, in addition to the aforementioned machine learning
modality classi cation methods, we also developed several complimentary
rulebased strategies for increasing the classi cation accuracy of \Illustration" classes.</p>
        <p>
          A majority of the training samples contained in the \Illustration" meta-class,
unlike other images in the collection, consist of line drawings or text superimposed
on a white background. For example, program listings mostly consist of text;
thus, the use of text and line detection methods may increase the classi cation
accuracy of Class GPLI. Similarly, polygons (e.g., rectangles, hexagons, etc.)
contained in owcharts (GFLO), tables (GTAB), system overviews (GSYS), and
chemical structures (GCHE) are a distinctive feature of these modalities. We
utilize the methods of Jung et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and OpenCV2 functions to assess the
presence of text and polygons, respectively.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Ad-Hoc Image Retrieval Task</title>
      <p>In this section we describe our textual, visual and mixed approaches to the
ad-hoc image retrieval task. Descriptions of the submitted runs that utilize these
methods are presented in Section 7.</p>
      <sec id="sec-4-1">
        <title>2 http://opencv.willowgarage.com/wiki/</title>
        <sec id="sec-4-1-1">
          <title>5.1 Textual Approaches</title>
          <p>
            To allow for e cient retrieval and to compare their relative performance, we
index our enriched citations with the Essie [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] and Lucene/SOLR3 search engines.
Essie is a search engine developed by the U.S. National Library of Medicine
and is particularly well-suited for the medical retrieval task due to its ability to
automatically expand query terms using the UMLS synonymy. Lucene/SOLR
is a popular search engine developed by the Apache Software Foundation that
employs the well-known vector space model of information retrieval. We have
extracted the UMLS synonymy from Essie and use it for term expansion when
indexing enriched citations with Lucene/SOLR.
          </p>
          <p>
            We organize each topic description into a frame-based (e.g., PICO4)
representation following the method similar to that described by Demner-Fushman
and Lin [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Extractors identify concepts related to problems, interventions, age,
anatomy, drugs, and modality. We also identify modi ers of the extracted
concepts and a limited number of relationships among them. We then transform the
extracted concepts into queries appropriate for either Essie or Lucene/SOLR.
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>5.2 Visual Approaches</title>
          <p>Our visual approaches to image retrieval are based on retrieving images that
appear visually similar to the given topic images. We compute the visual similarity
between two images as the Euclidean distance between their visual descriptors.
For the purposes of computing this distance, we represent each image as a
combined feature vector composed of a subset of the visual descriptors listed in
Table 1 after attribute selection.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>5.3 Mixed Approaches</title>
          <p>We explore several methods of combing our textual and visual approaches. One
such approach involves the use of our image cluster words. For performing
multimodal retrieval using cluster words, we rst extract the visual descriptors
listed in Table 1 from each example image of a given topic. We then locate the
clusters to which the extracted descriptors are nearest in order to determine their
corresponding cluster words. Finally, we combine these cluster words with words
taken from the topic description to form a multimodal query appropriate for
either Essie or Lucene/SOLR.</p>
          <p>While the use of cluster words allows us to create multimodal queries, we
can instead directly combine the independent outputs of our textual and visual
approaches. In a score merging approach, we apply a min-max normalization to
the ranked lists of scores produced by our textual and visual retrieval strategies.
We then linearly combine the normalized scores given to each image to produce
a nal ranking. Similarly, a rank merging approach combines the results of our
textual and visual approaches using the ranks of the retrieved images instead of
their normalized scores. To produce the nal image ranking using this strategy,
we re-score each retrieved image as the reciprocal of its rank and then repeat the
above procedure for combining scores.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3 http://lucene.apache.org/</title>
        <p>4 PICO is a mnemonic for structuring clinical questions in evidence-based practice and
represents Patient/Population/Problem, Intervention, Comparison, and Outcome.</p>
        <p>Another means of incorporating visual information with our retrieval
approaches is through the use of a modality classi er. Using our hierarchical
modality classi cation approach, we can rst determine the most probable
modalities for a topic's example images. After retrieving a set of images using either
our textual or visual methods, we can eliminate retrieved images that are not of
the same modality as the topic images. An advantage of performing hierarchical
classi cation is that we can lter the retrieved results using the meta-classes
within the hierarchy (e.g., \Radiology").</p>
        <p>Finally, we often combine the retrieval results produced by several queries into
a single ranked list of images. We perform this query combination, or padding,
by simply appending the ranked list of images retrieved by a subsequent query
to the end of the ranked list produced by the preceding query.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Case-Based Retrieval Task</title>
      <p>Our method for performing case-based retrieval is analogous to our textual
approaches for ad-hoc image retrieval. Here, we index the enriched citations
described in Section 3 using the Essie and Lucene/SOLR search engines (for
performance comparison). We generate textual and mixed queries appropriate
for both search engines according to the approaches described in Sections 5.1.</p>
      <p>
        As a form of query expansion for case-based topics, we also explore the
possibility of determining relevant disease names to correspond with signs and
symptoms found in a topic case. To determine a set of potential diseases, we rst
use the Google Search API5 to search the World Wide Web using a topic case as
a query. We then process the top ve documents with MetaMap [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to extract
terms having the UMLS semantic type \Disease or Syndrome." Finally, we select
the top three most frequent diseases for query expansion.
7
      </p>
    </sec>
    <sec id="sec-6">
      <title>Submitted Runs</title>
      <p>In this section we describe each of our submitted runs for the modality classi
cation, ad-hoc image retrieval, and case-based article retrieval tasks. Each run is
identi ed by its submission le name or trec_eval run ID and mode (textual,
visual or mixed). All submitted runs are automatic.</p>
      <sec id="sec-6-1">
        <title>7.1 Modality Classi cation Task</title>
        <p>We submitted the following nine runs for the modality classi cation task:
M1. Visual only Flat.txt (visual): A at multi-class SVM classi cation using
selected attributes from a combined visual descriptor of 15 features (all
descriptors in Table 1 except LCH and SCONCEPT).</p>
        <p>M2. Visual only Hierarchy.txt (visual): Like Run M1 but classi cation is
performed hierarchically.</p>
        <p>M3. Text only Flat.txt (textual): A at multi-class SVM classi cation using
selected attributes from a combined term vector created from four textual
features (article title, MeSH terms, and image caption and mention).
M4. Text only Hierarchy.txt (textual): Like Run M3 but classi cation is
performed hierarchically.
5 https://developers.google.com/custom-search/v1/overview
M5. Visual Text Flat.txt (mixed): A at multi-class SVM classi cation
combining the feature representations used in Runs M1{3.</p>
        <p>M6. Visual Text Hierarchy.txt (mixed): Like Run M5 but classi cation is
performed hierarchically.</p>
        <p>M7. Visual Text Flat w Postprocessing 4 Illustration.txt (mixed): Like Run M5
but additional post-processing is applied for \Illustration" classes.
M8. Visual Text Hierarchy w Postprocessing 4 Illustration.txt (mixed): Like</p>
        <p>Run M7 but classi cation is performed hierarchically.</p>
        <p>M9. Image Text Hierarchy Entire set.txt (mixed): Like Run M6 but applied to
all the images contained in the retrieval collection.
7.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Ad-hoc Image Retrieval Task</title>
        <p>We submitted the following ten runs for the ad-hoc image retrieval task:
A1. nlm-se (mixed): A combination of three queries using Essie. (A1.Q1) A
disjunction of modality terms extracted from the query topic must occur
within the caption or mention elds of an image's enriched citation; a
disjunction of the remaining terms is allowed to occur in any eld. (A1.Q2) A
lossy expansion of the verbatim topic is allowed to occur in any eld.
(A1.Q3) A disjunction of the query images' cluster words must occur
within the global image feature eld.</p>
        <p>A2. nlm-se-cw-mf (mixed): A combination of Query A1.Q1 with the additional
query below using Essie. (A2.Q2) A lossy expansion of the verbatim topic is
allowed occur in any eld of an image's enriched citation and a disjunction
of the query images' cluster words can optionally occur within the global
image feature eld. Additionally, the retrieved images are ltered so that
they share a least common ancestor modality with the query images, as
determined by the modality classi er used in Run M9. Query A2.Q2 is
distinct from Queries 1.Q2{3 in that the occurrence of a lossy expansion
of the topic is not necessarily weighted more heavily than the occurrence
of image cluster words.</p>
        <p>A3. nlm-se-scw-mf (mixed): Like Run A2 but image cluster words are only
considered if the modality classi er used in Run M9 identically labels all
the example images of a topic.</p>
        <p>A4. nlm-lc (mixed): A combination of three queries using Lucene with BM25
similarity and UMLS synonymy. (A4.Q1) A fuzzy phrase-based occurrence
of the verbatim topic is allowed in any eld of an image's enriched citation.
(A4.Q2) A disjunction of the topic words is allowed to occur in any eld.
(A4.Q3) A disjunction of the query images' cluster words must occur within
the global image feature eld.</p>
        <p>A5. nlm-lc-cw-mf (mixed): A combination of Query A4.Q1 with the additional
query below using Lucene with BM25 similarity and UMLS synonymy.
(A5.Q2) A disjunction of the topic words is allowed occur in any eld
of an image's enriched citation and a disjunction of the query images'
cluster words can optionally occur within the global image feature eld.
Additionally, the retrieved images are ltered so that they share a least
common ancestor modality with the query images, as determined by the
modality classi er used in Run M9.</p>
        <p>A6. nlm-lc-scw-mf (mixed): Like Run A5 but image cluster words are only
considered if the modality classi er used in Run M9 identically labels all
the example images of a topic.</p>
        <p>A7. Combined Selected Fileterd Merge (visual): Similarity matching using 62
min-max normalized attributes selected from a combined visual descriptor
of 15 features (all descriptors in Table 1 except LCH and SCONCEPT).
Retrieval is performed separately for each query image, and the retrieved
results are ltered, according to the modality classi er used in Run M9,
so that they share the top two modality levels with the query. Images are
scored according the query image resulting in the maximum score.
A8. Combined LateFusion Fileterd Merge (visual): Like Run A8 but similarity
matching is performed separately for seven features (CLD, GLCM,
SCONCEPT, and the color, Gabor, shape, and Tamura moments from Table 1)
whose scores are linearly combined with prede ned weights.</p>
        <p>A9. Txt Img Wighted Merge (mixed): A combination of visual Run A7 with a
textual run consisting solely of Query A1.Q2 using score merging.
A10. Merge RankToScore weighted (mixed): A combination of visual Run A8
with a textual run consisting solely of Query A1.Q2 using rank mering.
7.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Case-based Article Retrieval Task</title>
        <p>We submitted the following eight runs for the case-based article retrieval task:
C1. nlm-se-max (textual): A combination of three queries for each topic sentence
using Essie. (C1.Q1) A disjunction of modality terms extracted from the
sentence must occur within the caption or mention elds of an article's
enriched citation; a disjunction of the remaining terms is allowed to occur
in any eld. (C1.Q2) A lossy expansion of the verbatim sentence is allowed
to occur in any eld. (C1.Q3) A disjunction of all extracted words and
discovered diseases in the sentence is allowed to occur in any eld. Articles
are scored according to the sentence resulting in the maximum score.
C2. nlm-se-sum (textual): Like Run C1 but articles are scored according to the
sum of the scores produced for each sentence.</p>
        <p>C3. nlm-se-frames-max (textual): A combination of the query below with Query
C1.Q2 for each topic sentence using Essie. (C3.Q1) An expansion of the
frame-based representation of the sentence is allowed to occur in any eld of
an article's enriched citation. Articles are scored according to the sentence
resulting in the maximum score.</p>
        <p>C4. nlm-se-frames-sum (textual): Like Run C3 but articles are scored according
to the sum of the scores produced for each sentence.</p>
        <p>C5. nlm-lc-max (textual): A combination of two queries for each topic sentence
using Lucene with language model similarity, Jelinek-Mercer smoothing,
and UMLS synonymy. (C5.Q1) A fuzzy phrase-based occurrence of the
verbatim sentence is allowed in any eld of an article's enriched citation.
(C5.Q2) A disjunction of all words and discovered diseases in the sentence is
allowed to occur in any eld. Articles are scored according to the sentence
resulting in the maximum score.</p>
        <p>C6. nlm-lc-sum (textual): Like Run C5 but articles are scored according to the
sum of the scores produced for each sentence.</p>
        <p>File Name
Visual Text Hierarchy w Postprocessing 4 Illustration.txt
Visual Text Flat w Postprocessing 4 Illustration.txt
Visual Text Hierarchy.txt
Visual Text Flat.txt
Visual only Hierarchy.txt
Visual only Flat.txt
Image Text Hierarchy Entire set.txt
Text only Hierarchy.txt
Text only Flat.txt
Mixed
Mixed
Mixed
Mixed
Visual
Visual
Mixed
Textual
Textual
C7. nlm-lc-total-max (textual): A combination of the query below with Queries
C5.Q1{2 (as C7.Q2{3) using Lucene with language model similarity,
JelinekMercer smoothing, and UMLS synonymy. (C7.Q1) A fuzzy phrase-based
occurrence of the entire verbatim topic is allowed in any eld of an article's
enriched citation. Articles are scored according to the sentence resulting in
the maximum score.</p>
        <p>C8. nlm-lc-total-sum (textual): Like Run C7 but articles are scored according
to the sum of the scores produced for each sentence.
8</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>We present and discuss the results of our modality classi cation, ad-hoc image
retrieval, and case-based article retrieval task submissions below.</p>
      <sec id="sec-7-1">
        <title>8.1 Modality Classi cation Task</title>
        <p>
          Table 2 presents the classi cation accuracy of our submitted runs for the modality
classi cation task. Visual Text Hierarchy w Postprocessing 4 Illustration.txt, a
mixed approach, achieved the highest accuracy (63.2%) of our submitted runs
and was ranked fth overall. However, it ranked within the submissions from the
top three participating groups. This result validates our post-processing method
used to improve the recognition of \Illustration" classes, and provides, with our
previous experience [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], further evidence that hierarchical classi cation is a
successful strategy. Each of our hierarchical classi cation methods outperforms
the corresponding at approach having the same feature representation.
        </p>
        <p>
          While our submitted runs were only judged on their ability to identify each of
the thirty-one modality classes [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], Table 3 presents the classi cation accuracy
of the intermediate classi ers we used for our hierarchical approaches. For each
meta-class in the hierarchy shown in Figure 1b, Table 3 gives the number of
classes they contain; the classi cation accuracy associated with the textual,
visual, and mixed feature representations; and the dimensionality of the mixed
feature representation after attribute selection. These results demonstrate that
the accuracies of the intermediate classi ers generally improve as the number
of class labels decreases. Given the limited amount of training data in relation
to the number of total modalities, the smaller number of labels per classi er
likely is signi cant for explaining why our hierarchical classi cation approaches
consistently outperform their corresponding at approaches.
? Feature dimensionality is given for mixed mode classi ers only.
nlm-se
Merge RankToScore weighted
nlm-lc
nlm-lc-cw-mf
nlm-lc-scw-mf
nlm-se-scw-mf
Txt Img Wighted Merge
nlm-se-cw-mf
Combined LateFusion Fileterd Merge
Combined Selected Fileterd Merge
Mode
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Visual
Visual
8.2
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>Ad-hoc Image Retrieval Task</title>
        <p>
          Table 4 presents the mean average precision (MAP), binary preference (bpref),
and early precision (P@10) of our submitted runs for the ad-hoc image retrieval
task. nlm-se achieved the highest MAP (0.2377) among our submitted runs and
was ranked rst overall. Merge RankToScore weighted, the run achieving our
second highest MAP (0.2166), was ranked second overall. Comparing these two
runs using Fisher's paired randomization test [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], a recommended statistical
test for evaluating information retrieval systems, we nd that nlm-se achieved
a statistically signi cant increase (9:7%, p = 0:0016) over the performance of
Merge RankToScore weighted.
        </p>
        <p>That the two highest ranked runs were multimodal, as apposed to textual, is
an encouraging result, and provides evidence that our ongoing e orts at
integrating textual and visual information will be successful. In particular, the use by
nlm-se and other runs of cluster words, which are indexed and retrieved using
a traditional text-based information retrieval system, is an e ective way, not
only of incorporating visual information with text, but of avoiding the
computational expense common among content-based retrieval methods. Furthermore,
Merge RankToScore weighted demonstrates the value of rank merging when
combining textual and visual retrieval results. Some of our other mixed runs, in
utilizing the results of our modality classi ers, may have been weakened due to
the modest performance of our classi cation methods.</p>
      </sec>
      <sec id="sec-7-3">
        <title>8.3 Case-based Article Retrieval Task</title>
        <p>Table 5 presents the MAP, bpref, and P@10 of our submitted runs for the
casebased article retrieval task. nlm-lc-total-sum, a textual approach using language
model similarity, achieved the highest MAP (0.1035) among our submitted runs
and was ranked seventh overall. However, it ranked within the submissions from
the top four participating groups. Using Fisher's paired randomization test,
we nd that there is no statistically signi cant di erence (p &lt; 0:05) in MAP
among any of our submitted runs. The relatively low performance of most of the
ImageCLEF 2012 case-based submissions may be due, in part, to the existence
in the collection of only a small number of case reports, clinical trials, or other
types of documents relevant for case-based topics.
9</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>This article describes the methods and results of the Image and Text Integration
(ITI) group in the ImageCLEF 2012 medical retrieval and classi cation tasks.
For the modality classi cation task, our best submission was ranked within the
submissions from the top three participating groups. Our best submission for the
ad-hoc image retrieval task was ranked rst overall. Finally, for the case-based
article retrieval task, our best submission was ranked within the submissions from
the top four participating groups, though we found no statistical signi cance
between this run and our other case-based submissions. The e ectiveness of our
multimodal approaches are encouraging and provide evidence that our ongoing
e orts at integrating textual and visual information will be successful.
Acknowledgments. We would like to thank Antonio Jimeno-Yepes for assisting in
expanding case-based topics with disease names, Russell Loane for proving source code
for converting frame-based topics to Essie queries, and Srinivas Phadnis for constructing
enriched citations and extracting visual features.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Apostolova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>You</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thoma</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Image retrieval from scienti c publications: Text and image content processing to separate multi-panel gures</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          (To appear)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.:</given-names>
          </string-name>
          <article-title>E ective mapping of biomedical text to the UMLS metathesaurus: The MetaMap program</article-title>
          .
          <source>In: Proc. of the Annual Symp. of the American Medical Informatics Association (AMIA)</source>
          . pp.
          <volume>17</volume>
          {
          <issue>21</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arthur</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vassilvitskii</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>k-means++: The advantages of careful seeding</article-title>
          .
          <source>In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms</source>
          . pp.
          <volume>1027</volume>
          {
          <fpage>1035</fpage>
          . SODA '
          <volume>07</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <issue>4</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>S.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sikora</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the MPEG-7 standard</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          <volume>11</volume>
          (
          <issue>6</issue>
          ),
          <volume>688</volume>
          {
          <fpage>695</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.:</surname>
          </string-name>
          <article-title>CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval</article-title>
          . In: Gasteratos,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Vincze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tsotsos</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.K</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 6th International Conference on Computer Vision Systems. Lecture Notes in Computer Science</source>
          , vol.
          <volume>5008</volume>
          , pp.
          <volume>312</volume>
          {
          <fpage>322</fpage>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.:</surname>
          </string-name>
          <article-title>FCTH: Fuzzy color and texture histogram: A low level feature for accurate image retrieval</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Image Analysis for Multimedia Interactive Services</source>
          . pp.
          <volume>191</volume>
          {
          <issue>196</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Answering clinical questions with knowledge-based and statistical techniques</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>33</volume>
          (
          <issue>1</issue>
          ),
          <volume>63</volume>
          {103 (Mar
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutemann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>The WEKA data mining software: An update</article-title>
          .
          <source>SIGKDD Explorations</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ide</surname>
            ,
            <given-names>N.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loane</surname>
            ,
            <given-names>R.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Essie: A concept-based search engine for structured biomedical text</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <volume>253</volume>
          {
          <fpage>263</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jung</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>Text information extraction in images and video: A survey</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>37</volume>
          (
          <issue>5</issue>
          ),
          <volume>977</volume>
          {
          <fpage>997</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lindberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphreys</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The uni ed medical language system</article-title>
          .
          <source>Methods of Information in Medicine</source>
          <volume>32</volume>
          (
          <issue>4</issue>
          ),
          <volume>281</volume>
          {
          <fpage>291</fpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>In: Proceedings of the Seventh IEEE International Conference on Computer Vision</source>
          . vol.
          <volume>2</volume>
          , pp.
          <volume>1150</volume>
          {
          <issue>1157</issue>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chatzichristo s</surname>
          </string-name>
          , S.A.:
          <article-title>LIRe: Lucene image retrival|an extensible java CBIR library</article-title>
          .
          <source>In: Proceedings of the 16th ACM International Conference on Multimedia</source>
          . pp.
          <volume>1085</volume>
          {
          <issue>1088</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Maenpaa,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>The Local Binary Pattern Approach to Texture Analysis|Extensions and Applications</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Oulu (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Muller, H.,
          <string-name>
            <surname>de Herrara</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalpathy-Cramer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eggel</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2012 medical image retrieval and classi cation tasks</article-title>
          .
          <source>In: CLEF 2012 Working Notes</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thoma</surname>
          </string-name>
          , G.:
          <article-title>A medical image retrieval framework in correlation enhanced visual concept feature space</article-title>
          .
          <source>In: Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Simpson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Phadmis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apostolova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thoma</surname>
          </string-name>
          , G.:
          <article-title>Text- and content-based approaches to image modality classi cation and retrieval for the ImageCLEF 2011 medical retrieval track (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Smucker</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carterette</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A comparison of statistical signi cance tests for information retrieval evaluation</article-title>
          .
          <source>In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management</source>
          . pp.
          <volume>623</volume>
          {
          <issue>632</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Srinivasan</surname>
            ,
            <given-names>G.N.</given-names>
          </string-name>
          , G.,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Statistical texture analysis</article-title>
          .
          <source>In: Proceedings of World Academy of Science, Engineering and Technology</source>
          . vol.
          <volume>36</volume>
          , pp.
          <volume>1264</volume>
          {
          <issue>9</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Tamura</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mori</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamawaki</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Textural features corresponding to visual perception</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <volume>460</volume>
          {
          <fpage>73</fpage>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>