<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ITI's Participation in the 2013 Medical Track of ImageCLEF</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthew S. Simpson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daekeun You</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Mahmudur Rahman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Demner-Fushman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sameer Antani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Thoma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, NIH</institution>
          ,
          <addr-line>Bethesda, MD</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article describes the participation of the Image and Text Integration (ITI) group in the ImageCLEF medical retrieval, classi cation, and segmentation tasks. Although our methods are similar to those we have explored at past ImageCLEF evaluations, we describe in this paper the results of our methods on the 2013 collection and set of topics. In doing so, we present our submitted textual, visual, and mixed runs and our results for each of the four tasks. Like our participation in previous evaluations, we found our methods to generally perform well for each task. In particular, our best ad-hoc retrieval submission was again ranked rst among all the submissions from the participating groups.</p>
      </abstract>
      <kwd-group>
        <kwd>Image Retrieval</kwd>
        <kwd>Case-based Retrieval</kwd>
        <kwd>Image Modality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This article describes the participation of the Image and Text Integration (ITI)
group in the ImageCLEF 2013 medical retrieval, classi cation, and segmentation
tasks. Our group is from the Communications Engineering Branch of the Lister
Hill National Center for Biomedical Communications, which is a research division
of the U.S. National Library of Medicine.</p>
      <p>
        The medical track [
        <xref ref-type="bibr" rid="ref4">4] of ImageCLEF 2013</xref>
        consists of an image modality
classi cation task, a compound gure separation task, and two retrieval tasks.
For the classi cation task, the goal is to classify a given set of images according to
thirty-one modalities (e.g., \Computerized Tomography," \Electron Microscopy,"
etc.). The modalities are organized hierarchically into meta-classes such as
\Radiology" and \Microscopy," which are themselves types of \Diagnostic Images."
For the compound gure separation task, the goal is to segment the panels of
multi-panel gures. Figures contained in biomedical articles are often composed
of multiple panels (e.g., commonly labeled \a," \b," etc.) and segmenting them
can result in improved retrieval performance. In the rst retrieval task, a set of
ad-hoc information requests is given, and the goal is to retrieve the most relevant
images from a collection of biomedical articles for each topic. Finally, in the
second retrieval task, a set of case-based information requests is given, and the
goal is to retrieve the most relevant articles describing similar cases.
      </p>
      <p>In the following sections, we describe our methods and results. In Section 2,
we brie y outline our approach to each of the four tasks. In Section 3, we describe
each of our submitted runs, and in Section 4 we present our results. For the
modality classi cation task, our best submission achieved a classi cation accuracy
of 69:28%, which is better than what we achieved in the previous ImageCLEF
evaluation. Our submission for the compound gure separation task achieved a
similar accuracy of 69:27%. Our best submission for the ad-hoc image retrieval
task was a mixed approach that achieved a mean average precision of 0.3196.
This result is comparable to what we achieved in the previous evaluation and is
again ranked rst among all submissions for the participating groups. Finally,
for the case-based article retrieval task, our best submission achieved a mean
average precision of 0.0886, which is signi cantly lower than the top-ranked run.
In each of the above tasks, we obtained our best results using mixed approaches,
indicating the importance of both textual and visual features for these tasks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>The methods we used in participating in the 2013 medical track of ImageCLEF
are identical to the approaches we explored in the 2012 evaluation [11]. We brie y
summarize these methods below.
? Feature computed using the Lucene Image Retrieval library [8].</p>
      <p>Dimensionality</p>
      <p>We represent images and the articles in which they are contained using a
combination of textual and visual features. Our textual features include the title,
abstract, and Medical Subject Headings (MeSH R terms) of the articles in which
the images appear as well as the images' captions and \mentions" (snippets
of text within the body of an article that discuss the images). In addition to
the above textual features, we also represent the visual content of images using
various low-level visual descriptors. Table 1 summarizes the descriptors we extract
and their dimensionality. Due to the large number of these features, we forego
describing them in any detail. However, they are all well-known and discussed
extensively in existing literature.</p>
      <p>For the modality classi cation task, we experimented with both at and
hierarchical classi cation strategies using support vector machines (SVMs). First,
we extract our visual and textual image features from the training images
(representing the textual features as term vectors). Then, we perform attribute
selection to reduce the dimensionality of the features. We construct the
lowerdimensional vectors independently for each feature type (textual or visual) and
combine the resulting attributes into a single, compound vector. Finally, we use
the lower-dimensional feature vectors to train multi-class SVMs for producing
textual, visual, or mixed modality predictions. Our at classi ers attempt to
classify images into one of the thirty-one modality classes whereas our hierarchical
classi ers attempt to classify images following the structured organization of
modalities provided by the ImageCLEF organizers.</p>
      <p>For the compound gure separation task, our method incorporates both
natural language and image processing techniques. Our method rst seeks to
determine the number of image panels comprising a compound gure by identifying
textual panel labels in the gure's caption and visual panel labels overlain on
the gure. A border detection method combines this information to determine
the appropriate borders and segment the gure.</p>
      <p>For the ad-hoc image retrieval task, we explored a variety of textual, visual,
and mixed strategies. Our textual approaches utilize the Essie [5] retrieval system.
Essie is a biomedical search engine developed by the U.S. National Library
of Medicine, and it incorporates the synonymy relationships encoded in the
Uni ed Medical Language System R (UMLS R ) Metathesaurus R [6]. Our visual
approaches are based on retrieving images that appear visually similar to the
given topic images. We compute the visual similarity between two images as
the Euclidean distance between their visual descriptors. For the purposes of
computing this distance, we represent each image as a combined feature vector
composed of a subset of the visual descriptors listed in Table 1. We also explored
methods involving the clustering of visual descriptors and attribute selection.
Finally, our mixed approaches combine the above textual and visual approaches
in both early and late fusion strategies.</p>
      <p>Our method for performing case-based article retrieval is analogous to our
approaches for the ad-hoc image retrieval task. The only substantive di erence is
that we represent articles by a combination of the textual and visual features of
each image they contain.</p>
    </sec>
    <sec id="sec-3">
      <title>Submitted Runs</title>
      <p>In this section we describe each of our submitted runs for the modality classi
cation, compound gure separation, ad-hoc image retrieval, and case-based article
retrieval tasks. Each run is identi ed by its le name or trec_eval run ID and
mode (textual, visual or mixed). All submitted runs are automatic.
3.1</p>
      <sec id="sec-3-1">
        <title>Modality Classi cation Runs</title>
        <p>We submitted the following six runs for the modality classi cation task:
M1. nlm textual only at (textual): A at multi-class SVM classi cation using
selected attributes from a combined term vector created from four textual
features (article title, MeSH terms, and image caption and mention).
M2. nlm visual only hierarchy (visual): A hierarchical multi-class SVM
classi cation using selected attributes from a combined visual descriptor of
features 1{15 of Table 1.</p>
        <p>M3. nlm mixed hierarchy (mixed): A hierarchical multi-class SVM classi cation
combining Runs 1 and 2. Textual and visual features are combined into a
single feature vector for each image.</p>
        <p>M4. nlm mixed using 2012 visual classi cation (mixed): A combination of Runs
1 and 2 but using models trained on the 2012 ImageCLEF medical modality
classi cation data set. Images are rst classi ed according to Run 1. Images
having no textual features are classi ed according to Run 2. We use our
compound gure separation method to improve the classi cation accuracy
of some classes.</p>
        <p>
          M5. nlm mixed using 2013 visual classi cation 1 (mixed): Like Run
          <xref ref-type="bibr" rid="ref4">4 but using
the 2013</xref>
          ImageCLEF medical modality classi cation data set.
        </p>
        <p>M6. nlm mixed using 2013 visual classi cation 2 (mixed): Like Run 5 but using
all visual features from Table 1.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Compound Figure Separation Runs</title>
        <p>We submitted the following run for the compound gure separation task:
S1. nlm multipanel separation (mixed): A combination of gure caption
analysis, panel border detection, and panel label recognition.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Ad-hoc Image Retrieval Runs</title>
        <p>We submitted the following ten runs for the ad-hoc image retrieval task:
A1. nlm-image-based-textual (textual): A combination of two queries using
Essie. (A1.Q1) A disjunction of modality terms extracted from the query
topic must occur within the caption or mention elds of an image's textual
features; a disjunction of the remaining terms is allowed to occur in any
eld. (A1.Q2) A lossy expansion of the verbatim topic is allowed to occur
in any eld.</p>
        <p>A2. nlm-image-based-visual (visual): A disjunction of the query images'
clustered visual descriptors must occur within the global image feature eld.
A3. nlm-image-based-mixed (mixed): A combination of Queries A1.Q1{Q2 with</p>
        <p>Run A2.</p>
        <p>A4. image latefusion merge (visual): An automatic content-based image
retrieval approach. In this approach, features 10{16 of Table 1 are used, and
their individual similarity scores are linearly combined with prede ned
weights based on modality classi cation results of the query and collection
images. All images in each topic are considered and result lists for each
topic are combined to produce a single list of retrieved images.</p>
        <p>A5. image latefusion merge lter (visual): Like Run A4 but the search is
performed after ltering the collection of images based on modality classi
cation results of the query images.</p>
        <p>A6. latefusion accuracy merge (visual): Like Run A4 but the feature weights
are based on their normalized accuracy in classifying images in the 2012
ImageCLEF medical modality classi cation test set.</p>
        <p>A7. Txt Img Wighted Merge (mixed): A score-based combination of Runs A1
and A5.</p>
        <p>A8. Merge RankToScore weighted (mixed): A rank-based combination of Runs</p>
        <p>A1 and A5.</p>
        <p>A9. Txt Img Wighted Merge A (mixed): A score-based combination of Runs</p>
        <p>A1 and A6.</p>
        <p>A10. Merge RankToScore weighted A (mixed): A rank-based combination of</p>
        <p>Runs A1 and A6.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Case-based Article Retrieval Runs</title>
        <p>We submitted the following three runs for the case-based article retrieval task:
C1. nlm-case-based-textual (textual): A combination of three queries for each
topic sentence using Essie. (C1.Q1) A disjunction of modality terms
extracted from the sentence must occur within the caption or mention elds
of an article's textual features; a disjunction of the remaining terms is
allowed to occur in any eld. (C1.Q2) A lossy expansion of the verbatim
sentence is allowed to occur in any eld. (C1.Q3) A disjunction of all
extracted words in the sentence is allowed to occur in any eld. Articles
are scored according to the sentence resulting in the maximum score.
C2. nlm-case-based-visual (visual): A disjunction of the query images' clustered
visual descriptors must occur within the global image feature eld.
C3. nlm-case-based-mixed (mixed): A combination of Queries A1.Q1{Q3 with</p>
        <p>Run A2.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>nlm mixed using 2013 visual classi cation 2
nlm mixed using 2013 visual classi cation 1
nlm mixed hierarchy
nlm mixed using 2012 visual classi cation
nlm visual only hierarchy
nlm textual only at
and Table 3, we give the accuracy of our gure classi cation and separation
methods. In Table 4 and Table 5, we give the mean average precision (MAP),
binary preference (bpref) and precision-at-ten (P@10) of our retrieval methods.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This article describes the methods and results of the Image and Text Integration
(ITI) group in the ImageCLEF 2013 medical classi cation, segmentation, and
retrieval tasks. Our methods are similar to those we have developed for previous
ImageCLEF evaluations, and they include a variety of textual, visual, and mixed
approaches. For the modality classi cation task, our best submission was a
mixed approach that achieved an accuracy of 69:28% and was ranked within
the submissions from the top ve participating groups. For the compound gure
separation task, our mixed approach resulted in an accuracy of 69:27% and was
ranked second among four submissions from three groups participating in this
task. Similar to our experience in previous years, our best submission for the
ad-hoc image retrieval task was also a mixed approach, achieving a mean average
precision of 0:3106 and ranking rst overall. Finally, for the case-based article
retrieval task, our best submission obtained a mean average precision of 0:0886.
This result is much lower than what we have achieved in previous ImageCLEF
evaluations. Despite our performance on the case-based task, the e ectiveness of
our mixed approaches are encouraging and provide evidence that our ongoing
e orts at integrating textual and visual information will be successful.
Acknowledgments. We would like to thank Suchet Chandra for preparing our
collection and extracting the textual and visual features used by our methods.
8. Lux, M., Chatzichristo s, S.A.: LIRe: Lucene image retrival|an extensible java
CBIR library. In: Proceedings of the 16th ACM International Conference on
Multimedia. pp. 1085{1088 (2008)
9. Maenpaa, T.: The Local Binary Pattern Approach to Texture Analysis|Extensions
and Applications. Ph.D. thesis, University of Oulu (2003)
10. Rahman, M.M., Antani, S., Thoma, G.: A medical image retrieval framework in
correlation enhanced visual concept feature space. In: Proceedings of the 22nd
IEEE International Symposium on Computer-Based Medical Systems (2009)
11. Simpson, M.S., You, D., Rahman, M.M., Demner-Fushman, D., Antani, S., Thoma,
G.: ITI's participation in the ImageCLEF 2012 medical retrieval and classi cation
tasks. In: Working Notes for the Conference on Multilingual and Multimodal
Information Access Evaluation (CLEF). September 17{20 (Rome, Italy 2012)
12. Srinivasan, G.N., Shobha, G.: Statistical texture analysis. In: Proceedings of World</p>
      <p>Academy of Science, Engineering and Technology. vol. 36, pp. 1264{9 (2008)
13. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual
perception. IEEE Transactions on Systems, Man, and Cybernetics 8(6), 460{73
(1978)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <issue>1</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>S.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sikora</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the MPEG-7 standard</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          <volume>11</volume>
          (
          <issue>6</issue>
          ),
          <volume>688</volume>
          {
          <fpage>695</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.:</surname>
          </string-name>
          <article-title>CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval</article-title>
          . In: Gasteratos,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Vincze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tsotsos</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.K</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 6th International Conference on Computer Vision Systems. Lecture Notes in Computer Science</source>
          , vol.
          <volume>5008</volume>
          , pp.
          <volume>312</volume>
          {
          <fpage>322</fpage>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Chatzichristo s,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Boutalis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.S.:</surname>
          </string-name>
          <article-title>FCTH: Fuzzy color and texture histogram: A low level feature for accurate image retrieval</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Image Analysis for Multimedia Interactive Services</source>
          . pp.
          <volume>191</volume>
          {
          <issue>196</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. de Herrera,
          <string-name>
            <given-names>G.S.</given-names>
            ,
            <surname>Kalpathy-Cramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Antani</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , Muller, H.:
          <article-title>Overview of the ImageCLEF 2013 medical tasks</article-title>
          .
          <source>In: Working notes of CLEF</source>
          <year>2013</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ide</surname>
            ,
            <given-names>N.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loane</surname>
            ,
            <given-names>R.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Essie: A concept-based search engine for structured biomedical text</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <volume>253</volume>
          {
          <fpage>263</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lindberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphreys</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The uni ed medical language system</article-title>
          .
          <source>Methods of Information in Medicine</source>
          <volume>32</volume>
          (
          <issue>4</issue>
          ),
          <volume>281</volume>
          {
          <fpage>291</fpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>In: Proceedings of the Seventh IEEE International Conference on Computer Vision</source>
          . vol.
          <volume>2</volume>
          , pp.
          <volume>1150</volume>
          {
          <issue>1157</issue>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>