<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text- and Content-based Approaches to Image Modality Classi cation and Retrieval for the ImageCLEF 2011 Medical Retrieval Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthew Simpson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Mahmudur Rahman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Srinivas Phadnis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilia Apostolova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Demner-Fushman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sameer Antani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Thoma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lister Hill National Center for Biomedical Communications U.S. National Library of Medicine, NIH</institution>
          ,
          <addr-line>Bethesda, MD</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article describes the participation of the Communications Engineering Branch (CEB), a division of the Lister Hill National Center for Biomedical Communications, in the ImageCLEF 2011 medical retrieval track. Our methods encompass a variety of techniques relating to text- and content-based image retrieval. Our textual approaches primarily utilize the Uni ed Medical Language System (UMLS) synonymy to identify concepts in topic descriptions and image-related text, and our visual approaches utilize similarity metrics based on computed \visual concepts" and low-level image features. We also explore mixed approaches that utilize a combination of textual and visual features. In this article we present an overview of the application of our methods to the modality classi cation, ad-hoc image retrieval, and case-based image retrieval tasks, and we describe our submitted runs and results.</p>
      </abstract>
      <kwd-group>
        <kwd>Image Retrieval</kwd>
        <kwd>Case-based Retrieval</kwd>
        <kwd>Image Modality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This article describes the participation of the Communications Engineering
Branch (CEB), a division of the Lister Hill National Center for Biomedical
Communications, in the ImageCLEF 2011 medical retrieval track.</p>
      <p>
        The medical retrieval track [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] of ImgeCLEF 2011 consists of an image
modality classi cation task and two retrieval tasks. For the modality classi cation
task, the goal is to classify a given set of medical images according to eighteen
modalities (e.g., CT or Histopathology) taken from ve classes (e.g., Radiology
or Microscopy). In the rst retrieval task, a set of ad-hoc information requests is
given, and the goal is to retrieve the most relevant images for each topic. Finally,
in the second retrieval task, a set of case-based information requests is given, and
the goal is to retrieve the most relevant articles describing similar cases.
      </p>
      <p>
        In the following sections, we describe the textual and visual features that
comprise our image and case representations (Sections 2{3) and our methods
for the modality classi cation (Section 4) and medical retrieval tasks (Sections
5{6). Our textual approaches primarily utilize the Uni ed Medical Language
System (UMLS) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] synonymy to identify concepts in topic descriptions and
image-related text, and our visual approaches rely on similarity metrics based on
computed \visual concepts" and other low-level visual features. We also explore
mixed approaches for the modality classi cation and retrieval tasks that utilize a
combination of textual and visual features.
      </p>
      <p>In Section 7 we describe our submitted runs, and in Section 8 we present
our results. For the modality classi cation task, our best submission achieved a
classi cation accuracy of 74% and was ranked within the submissions from the
top three groups. For the retrieval tasks, our results were lower than expected
yet reveal new insights which we anticipate will improve future work. For the
modality classi cation and image retrieval tasks, our best results were obtained
using mixed approaches, indicating the importance of both textual and visual
features for these tasks.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Image Representation</title>
      <p>Images contained in biomedical articles can be represented using both textual and
visual features. Textual features can include text from an article that pertains
to an image, such as image captions and \mentions" (snippets of text within
the body of an article that discuss an image), and visual features can include
information derived from the content of an image, such as shape, color and
texture. We describe the features we use in representing images below.
2.1</p>
      <sec id="sec-2-1">
        <title>Textual Features</title>
        <p>We represent each image in the ImageCLEF 2011 medical collection as a structured
document of image-related text. Our representation includes the title, abstract,
and MeSH terms1 of the article in which the image appears as well as the
image's caption and mentions. Additionally, we identify within image captions
textual Regions of Interest (ROIs). A textual ROI is a noun phrase describing the
content of an interesting region of an image and is identi ed within a caption by a
pointer. For example, in the caption \MR image reveals hypointense indeterminate
nodule (arrow)," the word arrow points to the ROI containing a hypointense
indeterminate nodule.</p>
        <p>The above structured documents may be indexed and searched with a
traditional search engine or the underlying term vectors may be exposed and added
to a mixed image representation that includes the visual features described in
Section 2.2. For the latter approach, the terms in a structured document eld
Dj (e.g., caption) are commonly represented as an N -dimensional vector
f term = [wj1; wj2;
j
; wjN ]T
(1)
where wjk denotes the tf-idf weight of term tk in document eld Dj, and N is
the size of the vocabulary.
1 MeSH is a controlled vocabulary created by U.S. National Library of Medicine to
index biomedical articles.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Visual Features</title>
        <p>In addition to the above textual features, we also represent the visual content
of images using various low-level global image features and a derived feature
intended to capture the high-level semantic content of images.</p>
        <p>
          Low-level Global Features We represent the spatial structure and global
shape and edge features of images with the Color Layout Descriptor (CLD) and
Edge Histogram Descriptor (EHD) of MPEG-7 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. We extract the CLD feature
as a vector f cld and the EHD feature as f ehd. Additionally, we extract the Color
and Edge Directivity Descriptor (CEDD) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] as f cedd and the Fuzzy Color and
Texture Histogram (FCTH) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as f fcth using the Lucene image retrieval (LIRE)
library.2 Both CEDD and FCTH incorporate color and texture information into
single histograms that are suitable for image indexing and retrieval.
Concept Feature In a heterogeneous medical image collection, it is possible
to identify speci c local patches in images that are perceptually or semantically
distinguishable, such as homogeneous texture patterns in gray-level radiological
images or di erential color and texture structures in microscopic pathology
images. The variation in the local patches can be e ectively modeled as \visual
concepts" [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] using supervised machine learning-based classi cation techniques.
        </p>
        <p>
          For the generation of these concepts, we utilize a multi-class Support Vector
Machine (SVM) composed of several binary classi ers organized using the
oneagainst-one strategy [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. To train the SVMs, we manually assign a set of L
visual concepts C = fc1; ; ci; ; cLg to the color and texture features of each
xed-size patch contained in an image. For a single image, the input to the
training process is a set of color and texture feature vectors for all xed-size
patches along with their manually assigned concept labels. We generate the
concept feature for each image Ij in the collection by rst partitioning Ij into l
patches as fx1j ; ; xkj ; ; xlj g, where each xkj 2 &lt;d is a combined color and
texture feature vector. Then, for each xkj , we determine its concept label by the
prediction of the multi-class SVM. Thus, in contrast to the low-level features
described above, the concept feature represents an image as a set of high-level
\visual concepts." Based on this encoding scheme, we represent an image Ij as a
vector of concepts
f concept = [w1j ;
j
; wij ;
wLj ]
        </p>
        <p>
          T
(2)
where each wij denotes the tf-idf weight of concept ci in Image Ij .
Clustered Features In an attempt to avoid the online computational
complexity required to calculate visual similarity (described in Section 5.2) using the
above features, we create an index of image similarity based on the clustering
of feature vectors. For each visual feature described above, we cluster the
vectors assigned to all images into k = d log jIj clusters using the k-means++ [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
        </p>
        <sec id="sec-2-2-1">
          <title>2 http://freshmeat.net/projects/lirecbir/</title>
          <p>algorithm, where d is the number of attributes in each vector and jIj is the total
number of images in the collection. We then assign each cluster a unique \word"
and represent each image as a sequence of these words. For example, using only
MPEG-7 features for simplicity, an image might be represented as the sequence
\cld:k1 ehd:k2" if the image's CLD feature was among the vectors in the rst CLD
cluster and its EHD feature was among the vectors in the second EHD cluster.
The resulting textual interpretation of an image's visual features may then be
indexed and searched using a traditional search engine or added to a mixed image
representation that includes the textual features described in Section 2.1.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Case Representation</title>
      <p>We represent a full-text article as the combination of the textual and visual
features of each image appearing in the article. Thus, each article representation
consists of an article's title, abstract, and MeSH terms as well as the caption,
mention, textual ROIs, and clustered visual features of each contained image.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Modality Classi cation Task</title>
      <p>
        Owing to their empirical success, we utilize multi-class SVMs for classifying
images into eighteen medical image modalities based on their textual and visual
features. We compose multi-class SVMs using the one-against-one strategy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
for combining the pairwise classi cations of each binary SVM.
      </p>
      <p>
        Figure 1 describes our textual, visual, and mixed approaches to the modality
classi cation task. Our visual and textual image features (with the text-based
features represented as term vectors) can be used individually to produce
singlemode classi cations, or they may be combined to produce multimodal predictions.
For the mixed approaches, the features may be combined into a single feature
vector or they may be used independently, with the separate predictions being
\fused" to form a single classi er. We fuse the output of multiple classi ers with
the popular \Sum" classi er combination technique [
        <xref ref-type="bibr" rid="ref10 ref6">6, 10</xref>
        ] of Bayes' theorem.
      </p>
      <p>We utilize the above approach for both at and hierarchical modality classi
cation. For the former, the system classi es an image's modality as one of eighteen
medical image modalities. For the latter, the system rst classi es the image's
modality as belonging to one of ve high-level modality classes (i.e., Radiology,
Microscopy, Photograph, Graphic, or Other), and then it classi es the image's
modality as one of the original eighteen, given its predicted high-level class. The
hierarchical approach requires the training of a single high-level classi er and
multiple class-speci c classi ers, and an appropriate set of example images must
be constructed to train each classi er.</p>
      <p>
        Due to the small number of training examples for several modality classes, we
created an extended set of training images from the collection. We accomplished
this task by rst performing textual image searches, using particular modalities
as queries, then by manually inspecting and labeling the retrieved results.
In this section we describe our textual, visual and mixed approaches to the
ad-hoc image retrieval task. Descriptions of the submitted runs that utilize these
methods are presented in Section 7.
To allow for e cient retrieval and to compare their relative performance, we
index the textual image representations described in Section 2.1 with the Essie [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and Lucene/SOLR3 search engines. Essie is a search engine developed by the
U.S. National Library of Medicine and is particularly well-suited for the medical
retrieval task due to its ability to automatically expand query terms using the
UMLS synonymy. Lucene/SOLR is a popular search engine developed by the
Apache Software Foundation that employs the well-known vector space model of
information retrieval and tf-idf term weighting. Both Essie and Lucene/SOLR
provide the ability to weight term occurrences according the location in a
document in which they occur. For example, we weight term occurrences in image
captions higher than those in article abstracts.
      </p>
      <sec id="sec-4-1">
        <title>3 http://lucene.apache.org/</title>
        <p>
          We organize each topic description into the well-formed clinical question (i.e.,
PICO4) framework following the method described by Demner-Fushman and
Lin [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Extractors identify UMLS concepts related to problems, interventions, age,
anatomy, drugs, and image modality. When used as part of an Essie query, each
extracted concept is automatically expanded along the synonymy relationships
in the UMLS. For approaches that make use of the Lucene/SOLR search engine,
we rst expand the extracted concepts using Essie's built-in synonymy and then
replace the extracted concepts with their expansions.
        </p>
        <p>To construct a query for each topic, we create and combine several boolean
expressions derived from the extracted concepts. First, we create an expression
by combining the concepts using the AND operator (meaning all of the concepts
are required to occur in an image's textual representation), and then we produce
additional expressions by allowing an increasing number of the extracted concepts
to be optional. Finally, we combine these expressions using the OR operator giving
signi cantly more weight to expressions containing a fewer number of optional
concepts. Additionally, we often include the verbatim topic description as a
component of a query, but we give minimal weight to this expression compared
to those containing the extracted concepts. We use the resulting queries to search
the Essie and Lucene/SOLR indices.
5.2</p>
        <sec id="sec-4-1-1">
          <title>Visual Approaches</title>
          <p>Our visual approaches to image retrieval are based on retrieving images that
appear visually similar to the given topic images. The similarity between a
query image Iq and a target image Ij , based on the visual features described in
Section 2.2, is de ned by</p>
          <p>Sim(Iq; Ij ) = X</p>
          <p>F SimF (Iq; Ij )
F
(3)
where F 2 fConcept; EHD; CLD; CEDD; FCTHg, F are feature weights, and
SimF is Euclidean distance. In the above similary matching function, the feature
weights are determined based on the cross validation accuracies of the
featurespeci c SVMs trained for the modality classi cation task. The weights are
normalized to 0 F 1 and P F = 1.</p>
          <p>In order to avoid the online computation of the above similarity metric, we may
utilize clustered visual features, also described in Section 2.2, to retrieve visually
similar images. To allow for e cient retrieval, we index the textual interpretations
of the images' clustered visual features using the Essie and Lucene/SOLR search
engines. Again, we utilize both search engines in order to compare their relative
performance. Retrieval is performed by rst extracting a query image's visual
features, then by determining the features' cluster membership, and nally by
combining the unique \words" assigned to the clusters containing the features
in order to form a textual query. For a given topic, we combine the textual
interpretations of all features for all sample images using the OR operator.
4 PICO is a mnemonic for structuring clinical questions in evidence-based practice and
represents Patient/Population/Problem, Intervention, Comparison, and Outcome.
Our mixed approaches to image retrieval combine our textual and visual
approaches through a process of ltering and re-ranking or by issuing multimodal
queries. For the ltering approach, we rst lter the image collection by the
two most probable modalities of the query images, as indicated by our modality
classi er. We then query the remaining images according to a textual approach.
For the re-ranking approach, we rst query the image collection using a textual
approach, and we then re-rank the retrieved images according to their visual
similarity with the query images, as indicated by the above similarity metric. Finally,
for approaches involving multimodal queries, we utilize Essie and Lucene/SOLR
to index images using both their textual features and the textual interpretation of
their clustered image features. We construct multimodal queries by combining a
query produced by a textual approach with that produced by the visual approach
described above that utilizes clustered image features. We join the textual and
visual components of the query with the OR operator.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Case-Based Retrieval Task</title>
      <p>Our method for performing case-based retrieval is analogous to our approach
for ad-hoc image retrieval. Here, we index the case representations described in
Section 3 using the Essie and Lucene/SOLR search engines (for performance
comparison). We generate textual and mixed queries appropriate for both search
engines according to the approaches described in Sections 5.1 and 5.3.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Submitted Runs</title>
      <p>In this section we describe each of our submitted runs for the modality classi
cation, ad-hoc image retrieval, and case-based image retrieval tasks. Each run is
identi ed by its trec_eval run ID and followed by a submission mode (textual,
visual or mixed) and type (automatic, manual or feedback).
7.1</p>
      <sec id="sec-6-1">
        <title>Modality Classi cation Task</title>
        <p>We submitted the following 10 runs for the modality classi cation task:
1. image test result original (visual, automatic): SVM classi cation derived
from the original set of training images. Each image is represented as a single
vector of visual features.
2. image test result ext (visual, automatic): SVM classi cation like Run 1 but
derived from an extended set of training images.
3. image text test result original (mixed, automatic): SVM classi cation derived
from the original set of training images. Each image is represented as a single
vector containing visual features and a subset of textual features (article title,
MeSH terms, caption, and mention).
4. image text test result ext (mixed, automatic): SVM classi cation like Run 3
but derived from an extended set of training images.
5. image text test result sum (mixed, automatic): Classi er combination using
the \Sum" method of Bayes' theorem. Each image is represented as a group
of vectors for visual and textual features that are individually classi ed using
SVMs derived from the original set of training images.
6. image text test result sum ext (mixed, automatic): Classi er combination like</p>
        <p>Run 5 but using SVMs derived from an extended set of training images.
7. image text test result CV (mixed, feedback): Linear classi er combination
weighting classi ers according to their normalized cross-validation accuracies.
Each image is represented as a group of vectors for visual and textual features
that are individually classi ed using SVMs derived from the original set of
training images.
8. image text test result CV ext (mixed, automatic): Classi er combination like</p>
        <p>Run 7 but using SVMs derived from an extended set of training images.
9. image text test result multilevel (mixed, automatic): Hierarchical SVM
classi cation derived from the original set of training images. Each image is
represented as a single vector of visual and textual features that is rst
classi ed into a top-level modality class and is then further classi ed using a
class-speci c SVM.
10. image text test result multilevel ext (mixed, automatic): SVM classi cation
like Run 9 but using SVMs derived from an extended set of training images.
7.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Ad-hoc Image Retrieval Task</title>
        <p>We submitted the following 10 runs for the ad-hoc image retrieval task:
1. iti-essie-baseline+expanded-concepts (textual, automatic): Textual search
using the Essie search engine. Each image is represented by its textual
features, and queries combine the verbatim topic description with extracted
concepts and image modalities.
2. iti-lucene-baseline+expanded-concepts (textual, automatic): Textual search
using the Lucene/SOLR search engine. Each image is represented as in Run 1,
and queries combine the verbatim topic description with extracted concepts
and image modalities that are then expanded along synonymy relationships
in the UMLS.
3. iti-lucene-image (visual, automatic): Visual search using the Lucene/SOLR
search engine. Each image is represented using the textual interpretation of
its clustered visual features, and queries combine the visual \words" of each
of the sample topic images.
4. image fusion category weight lter (visual, automatic): Similarity matching
over images ltered according to modality. Each image is represented as a
subset of visual features (Concept, CLD, and EHD), and similarity scores for
each feature are linearly combined and weighted according to modality class.
5. image fusion category weight lter merge (visual, automatic): Similarity
matching like Run 4, but each image is scored as the sum of its
similarity with each of the sample topic images.
7.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Case-based Retrieval Task</title>
        <p>We submitted the following 10 runs for the case-based retrieval task:
1. iti-essie-manual (textual, manual): Textual search using the Essie search
engine. Articles are represented by their textual features, and queries were
manually generated by a medical doctor with expertise in biomedical
informatics.
2. iti-essie-frames (textual, automatic): Textual search using the Essie search
engine. Articles are represented by their textual features, and queries combine
concepts from automatically generated PICO summary frames.
3. iti-lucene-frames (textual, automatic): Textual search using the Lucene/SOLR
search engine. Articles are represented by their textual features, and queries
combine concepts from automatically generated PICO summary frames that
are then expanded along synonymy relationships in the UMLS.
4. iti-lucene-baseline (textual, automatic): Textual search with the Lucene/SOLR
search engine. Articles are represented by their textual features, and queries
are the verbatim topic descriptions.
5. iti-lucene-expanded-concepts (textual, automatic): Textual search using the
Lucene/SOLR search engine. Articles are represented by their textual features,
and queries combine extracted concepts and image modalities that are then
expanded along synonymy relationships in the UMLS.
6. iti-lucene-baseline+expanded-concepts (textual, automatic): Textual search
like Run 5, but queries also include verbatim topic descriptions.
7. iti-lucene-baseline+expanded-concepts+cases (textual, automatic): Textual
search like Run 6, but articles are boosted if their MeSH terms are indicative
of case studies or clinical trials.
8. iti-lucene-expanded-concepts+image (mixed, automatic): Mixed search using
the Lucene/SOLR search engine. Articles are represented by their textual
features and the textual interpretation of the clustered visual features of each
contained image. Queries are as in Run 5 but also include the visual \words"
of each of the sample topic images.
9. iti-lucene-baseline+expanded-concepts+image (mixed, automatic): Mixed
search like Run 8, but queries also include verbatim topic descriptions.
image text test result multilevel
image text test result sum ext
image text test result CV
image text test result multilevel ext
image text test result sum
image text test result CV ext
image text test result original
image test result original
image text test result ext
image test result ext
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Visual
Mixed
Visual
Automatic
Automatic
Feedback
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic</p>
        <p>Automatic
10. iti-lucene-baseline+expanded-concepts+image+cases (mixed, automatic):
Mixed search like Run 9, but articles are boosted if their MeSH terms
are indicative of case studies or clinical trials.
8</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>
        Table 1 presents the classi cation accuracy of our submitted runs for the modality
classi cation task. image text test result multilevel, a mixed approach, achieved
the highest accuracy (74%) of our submitted runs and was ranked within the
submissions from the top three groups. This result validates our hierarchical
classi cation approach and, as in our previous experience with image modality
classi cation [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], underscores the bene ts of combining textual and visual
features. Surprisingly, the use of an extended set of training images did not
improve classi cation accuracy.
      </p>
      <p>
        Table 2 presents the Mean Average Precision (MAP) of our submitted runs
for the ad-hoc image retrieval task. iti-lucene-baseline+expanded-concepts+image
Textual Manual
Textual Automatic
Mixed Automatic
Textual Automatic
Mixed Automatic
Textual Automatic
Mixed Automatic
Textual Automatic
Textual Automatic
Textual Automatic
achieved the highest MAP (0.14) among our submitted mixed runs,
iti-lucenebaseline+expanded-concepts achieved the highest MAP (0.13) among our
submitted textual runs, and iti-lucene-image achieved the highest MAP (0.02) among
our submitted visual runs. Although our retrieval results are lower than expected
given our previous experience [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ], they demonstrate the utility of combining
both textual and visual features. In particular, the use of clustered visual features,
which can be indexed and searched with a traditional text-based search engine,
not only resulted in our best visual approach but, when used in combination
with our best textual approach, produced our best overall submission.
      </p>
      <p>Finally, Table 3 presents the MAP of our submitted runs for the case-based
retrieval task. iti-essie-manual achieved the highest MAP (0.09) among our
submitted textual runs, and iti-lucene-baseline+expanded-concepts+image achieved
the highest MAP (0.03) among our submitted mixed runs. iti-lucene-baseline,
a textual approach, achieved the highest MAP (0.08) among our submitted
automatic runs. Similar to our results for the image retrieval task, our case-based
retrieval results are lower than expected given our previous experience. The
relatively low MAP for most ImageCLEF 2011 case-based submissions may be
due, in part, to the existence in the collection of only a small number of case
reports, clinical trials, or other types of documents relevant for case-based topics.
Surprisingly, our submissions that utilize extracted concepts and image modalities
achieved a lower MAP than our textual baseline, which used the verbatim topic
descriptions as queries.
9</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>This article describes the methods and results of the Communications Engineering
Branch, a division of the Lister Hill National Center for Biomedical
Communications, for the ImageCLEF 2011 medical retrieval track. We submitted ten runs
each for the modality classi cation task and the ad-hoc image and case-based
retrieval tasks. For the modality classi cation task, our best submission, a mixed
approach, achieved a classi cation accuracy of 74% and was ranked within the
submissions from the top three groups. For the retrieval tasks, our results were
lower than expected but reveal the mixed approaches involving clustered visual
features to be promising methods for combing textual and visual image features.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arthur</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vassilvitskii</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>k-means++: The advantages of careful seeding</article-title>
          .
          <source>In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms</source>
          . pp.
          <volume>1027</volume>
          {
          <fpage>1035</fpage>
          . SODA '
          <volume>07</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <issue>2</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>S.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sikora</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the MPEG-7 standard</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          <volume>11</volume>
          (
          <issue>6</issue>
          ),
          <volume>688</volume>
          {
          <fpage>695</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.:</surname>
          </string-name>
          <article-title>CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval</article-title>
          . In: Gasteratos,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Vincze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tsotsos</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.K</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 6th International Conference on Computer Vision Systems. Lecture Notes in Computer Science</source>
          , vol.
          <volume>5008</volume>
          , pp.
          <volume>312</volume>
          {
          <fpage>322</fpage>
          . SpringerVerlag Berlin Heidelberg (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.:</surname>
          </string-name>
          <article-title>FCTH: Fuzzy color and texture histogram: A low level feature for accurate image retrieval</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Image Analysis for Multimedia Interactive Services</source>
          . pp.
          <volume>191</volume>
          {
          <issue>196</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Answering clinical questions with knowledge-based and statistical techniques</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>33</volume>
          (
          <issue>1</issue>
          ),
          <volume>63</volume>
          {103 (Mar
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Duda</surname>
            ,
            <given-names>R.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>P.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stork</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Pattern Classi cation</article-title>
          . John Wiley &amp; Sons Ltd. (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
          </string-name>
          , R.:
          <article-title>Classi cation by pairwise coupling</article-title>
          .
          <source>The Annals of Statistics</source>
          <volume>26</volume>
          (
          <issue>2</issue>
          ),
          <volume>451</volume>
          {
          <fpage>471</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ide</surname>
            ,
            <given-names>N.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loane</surname>
            ,
            <given-names>R.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Essie: A concept-based search engine for structured biomedical text</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <volume>253</volume>
          {
          <fpage>263</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kalpathy-Cramer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Muler, H.,
          <string-name>
            <surname>Bedrick</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eggel</surname>
            , I., de Herrera,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsikrika</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The CLEF 2011 medical image retrieval and classi cation tasks</article-title>
          .
          <source>In: CLEF 2011 Working Notes</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kittler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hatef</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duin</surname>
            ,
            <given-names>R.P.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matas</surname>
          </string-name>
          , J.:
          <article-title>On combining classi ers</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis</source>
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <volume>226</volume>
          {
          <fpage>2329</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lindberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphreys</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The uni ed medical language system</article-title>
          .
          <source>Methods of Information in Medicine</source>
          <volume>32</volume>
          (
          <issue>4</issue>
          ),
          <volume>281</volume>
          {
          <fpage>291</fpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thoma</surname>
          </string-name>
          , G.:
          <article-title>A medical image retrieval framework in correlation enhanced visual concept feature space</article-title>
          .
          <source>In: Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Simpson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thoma</surname>
            ,
            <given-names>G.R.</given-names>
          </string-name>
          :
          <article-title>Text- and content-based approaches to image retrieval for the ImageCLEF 2009 medical retrieval track (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Simpson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singhal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thoma</surname>
          </string-name>
          , G.:
          <article-title>Text- and content-based approaches to image modality detection and retrieval for the ImageCLEF 2010 medical retrieval track (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>