=Paper=
{{Paper
|id=Vol-1179/CLEF2013wn-ImageCLEF-SimpsonEt2013
|storemode=property
|title=ITI's Participation in the 2013 Medical Track of ImageCLEF
|pdfUrl=https://ceur-ws.org/Vol-1179/CLEF2013wn-ImageCLEF-SimpsonEt2013.pdf
|volume=Vol-1179
|dblpUrl=https://dblp.org/rec/conf/clef/SimpsonYRDAT13
}}
==ITI's Participation in the 2013 Medical Track of ImageCLEF==
<pdf width="1500px">https://ceur-ws.org/Vol-1179/CLEF2013wn-ImageCLEF-SimpsonEt2013.pdf</pdf>
<pre>
 ITI’s Participation in the 2013 Medical Track of
                    ImageCLEF

       Matthew S. Simpson, Daekeun You, Md Mahmudur Rahman, Dina
            Demner-Fushman, Sameer Antani, and George Thoma

Lister Hill National Center for Biomedical Communications, U.S. National Library of
                        Medicine, NIH, Bethesda, MD, USA


       Abstract. This article describes the participation of the Image and Text
       Integration (ITI) group in the ImageCLEF medical retrieval, classification,
       and segmentation tasks. Although our methods are similar to those we
       have explored at past ImageCLEF evaluations, we describe in this paper
       the results of our methods on the 2013 collection and set of topics. In
       doing so, we present our submitted textual, visual, and mixed runs and
       our results for each of the four tasks. Like our participation in previous
       evaluations, we found our methods to generally perform well for each
       task. In particular, our best ad-hoc retrieval submission was again ranked
       first among all the submissions from the participating groups.

       Keywords: Image Retrieval, Case-based Retrieval, Image Modality


1    Introduction

This article describes the participation of the Image and Text Integration (ITI)
 group in the ImageCLEF 2013 medical retrieval, classification, and segmentation
 tasks. Our group is from the Communications Engineering Branch of the Lister
 Hill National Center for Biomedical Communications, which is a research division
 of the U.S. National Library of Medicine.
     The medical track [4] of ImageCLEF 2013 consists of an image modality
 classification task, a compound figure separation task, and two retrieval tasks.
 For the classification task, the goal is to classify a given set of images according to
 thirty-one modalities (e.g., “Computerized Tomography,” “Electron Microscopy,”
 etc.). The modalities are organized hierarchically into meta-classes such as
“Radiology” and “Microscopy,” which are themselves types of “Diagnostic Images.”
 For the compound figure separation task, the goal is to segment the panels of
 multi-panel figures. Figures contained in biomedical articles are often composed
 of multiple panels (e.g., commonly labeled “a,” “b,” etc.) and segmenting them
 can result in improved retrieval performance. In the first retrieval task, a set of
 ad-hoc information requests is given, and the goal is to retrieve the most relevant
 images from a collection of biomedical articles for each topic. Finally, in the
 second retrieval task, a set of case-based information requests is given, and the
 goal is to retrieve the most relevant articles describing similar cases.
    In the following sections, we describe our methods and results. In Section 2,
we briefly outline our approach to each of the four tasks. In Section 3, we describe
each of our submitted runs, and in Section 4 we present our results. For the
modality classification task, our best submission achieved a classification accuracy
of 69.28%, which is better than what we achieved in the previous ImageCLEF
evaluation. Our submission for the compound figure separation task achieved a
similar accuracy of 69.27%. Our best submission for the ad-hoc image retrieval
task was a mixed approach that achieved a mean average precision of 0.3196.
This result is comparable to what we achieved in the previous evaluation and is
again ranked first among all submissions for the participating groups. Finally,
for the case-based article retrieval task, our best submission achieved a mean
average precision of 0.0886, which is significantly lower than the top-ranked run.
In each of the above tasks, we obtained our best results using mixed approaches,
indicating the importance of both textual and visual features for these tasks.


2     Methods

The methods we used in participating in the 2013 medical track of ImageCLEF
are identical to the approaches we explored in the 2012 evaluation [11]. We briefly
summarize these methods below.


                      Table 1: Extracted visual descriptors.
     No. Descriptor                                             Dimensionality
       1. Autocorrelation                                                    25
       2. Edge frequency                                                     25
       3. Fuzzy color and texture histogram? (FCTH) [3]                     192
       4. Gabor moment?                                                      60
       5. Gray-level co-occurrence matrix moment (GLCM) [12]                 20
       6. Local binary pattern (LBP1 ) [9]                                  256
       7. Local binary pattern (LBP2 ) [9]                                  256
       8. Scale-invariant feature transformation? (SIFT) [7]                256
       9. Shape moment                                                        5
      10. Tamura moment? [13]                                                18
      11. Edge histogram? (EHD) [1]                                          80
      12. Color and edge directivity? (CEDD) [2]                            144
      13. Primitive length                                                    5
      14. Color layout? (CLD) [1]                                            16
      15. Color moment                                                        3
      16. Semantic concept (SCONCEPT) [10]                                   30
          Combined                                                        1391
?
    Feature computed using the Lucene Image Retrieval library [8].
    We represent images and the articles in which they are contained using a
combination of textual and visual features. Our textual features include the title,
abstract, and Medical Subject Headings (MeSH R terms) of the articles in which
the images appear as well as the images’ captions and “mentions” (snippets
of text within the body of an article that discuss the images). In addition to
the above textual features, we also represent the visual content of images using
various low-level visual descriptors. Table 1 summarizes the descriptors we extract
and their dimensionality. Due to the large number of these features, we forego
describing them in any detail. However, they are all well-known and discussed
extensively in existing literature.
    For the modality classification task, we experimented with both flat and
hierarchical classification strategies using support vector machines (SVMs). First,
we extract our visual and textual image features from the training images
(representing the textual features as term vectors). Then, we perform attribute
selection to reduce the dimensionality of the features. We construct the lower-
dimensional vectors independently for each feature type (textual or visual) and
combine the resulting attributes into a single, compound vector. Finally, we use
the lower-dimensional feature vectors to train multi-class SVMs for producing
textual, visual, or mixed modality predictions. Our flat classifiers attempt to
classify images into one of the thirty-one modality classes whereas our hierarchical
classifiers attempt to classify images following the structured organization of
modalities provided by the ImageCLEF organizers.
    For the compound figure separation task, our method incorporates both
natural language and image processing techniques. Our method first seeks to de-
termine the number of image panels comprising a compound figure by identifying
textual panel labels in the figure’s caption and visual panel labels overlain on
the figure. A border detection method combines this information to determine
the appropriate borders and segment the figure.
    For the ad-hoc image retrieval task, we explored a variety of textual, visual,
and mixed strategies. Our textual approaches utilize the Essie [5] retrieval system.
Essie is a biomedical search engine developed by the U.S. National Library
of Medicine, and it incorporates the synonymy relationships encoded in the
Unified Medical Language System R (UMLS R ) Metathesaurus R [6]. Our visual
approaches are based on retrieving images that appear visually similar to the
given topic images. We compute the visual similarity between two images as
the Euclidean distance between their visual descriptors. For the purposes of
computing this distance, we represent each image as a combined feature vector
composed of a subset of the visual descriptors listed in Table 1. We also explored
methods involving the clustering of visual descriptors and attribute selection.
Finally, our mixed approaches combine the above textual and visual approaches
in both early and late fusion strategies.
    Our method for performing case-based article retrieval is analogous to our
approaches for the ad-hoc image retrieval task. The only substantive difference is
that we represent articles by a combination of the textual and visual features of
each image they contain.
3     Submitted Runs

In this section we describe each of our submitted runs for the modality classifica-
tion, compound figure separation, ad-hoc image retrieval, and case-based article
retrieval tasks. Each run is identified by its file name or trec_eval run ID and
mode (textual, visual or mixed). All submitted runs are automatic.


3.1   Modality Classification Runs

We submitted the following six runs for the modality classification task:
 M1. nlm textual only flat (textual): A flat multi-class SVM classification using
     selected attributes from a combined term vector created from four textual
     features (article title, MeSH terms, and image caption and mention).
 M2. nlm visual only hierarchy (visual): A hierarchical multi-class SVM clas-
     sification using selected attributes from a combined visual descriptor of
     features 1–15 of Table 1.
 M3. nlm mixed hierarchy (mixed): A hierarchical multi-class SVM classification
     combining Runs 1 and 2. Textual and visual features are combined into a
     single feature vector for each image.
 M4. nlm mixed using 2012 visual classification (mixed): A combination of Runs
     1 and 2 but using models trained on the 2012 ImageCLEF medical modality
     classification data set. Images are first classified according to Run 1. Images
     having no textual features are classified according to Run 2. We use our
     compound figure separation method to improve the classification accuracy
     of some classes.
 M5. nlm mixed using 2013 visual classification 1 (mixed): Like Run 4 but using
     the 2013 ImageCLEF medical modality classification data set.
 M6. nlm mixed using 2013 visual classification 2 (mixed): Like Run 5 but using
     all visual features from Table 1.


3.2   Compound Figure Separation Runs

We submitted the following run for the compound figure separation task:
 S1. nlm multipanel separation (mixed): A combination of figure caption analy-
     sis, panel border detection, and panel label recognition.


3.3   Ad-hoc Image Retrieval Runs

We submitted the following ten runs for the ad-hoc image retrieval task:
 A1. nlm-image-based-textual (textual): A combination of two queries using
     Essie. (A1.Q1) A disjunction of modality terms extracted from the query
     topic must occur within the caption or mention fields of an image’s textual
     features; a disjunction of the remaining terms is allowed to occur in any
     field. (A1.Q2) A lossy expansion of the verbatim topic is allowed to occur
     in any field.
 A2. nlm-image-based-visual (visual): A disjunction of the query images’ clus-
     tered visual descriptors must occur within the global image feature field.
 A3. nlm-image-based-mixed (mixed): A combination of Queries A1.Q1–Q2 with
     Run A2.
 A4. image latefusion merge (visual): An automatic content-based image re-
     trieval approach. In this approach, features 10–16 of Table 1 are used, and
     their individual similarity scores are linearly combined with predefined
     weights based on modality classification results of the query and collection
     images. All images in each topic are considered and result lists for each
     topic are combined to produce a single list of retrieved images.
 A5. image latefusion merge filter (visual): Like Run A4 but the search is per-
     formed after filtering the collection of images based on modality classifica-
     tion results of the query images.
 A6. latefusion accuracy merge (visual): Like Run A4 but the feature weights
     are based on their normalized accuracy in classifying images in the 2012
     ImageCLEF medical modality classification test set.
 A7. Txt Img Wighted Merge (mixed): A score-based combination of Runs A1
     and A5.
 A8. Merge RankToScore weighted (mixed): A rank-based combination of Runs
     A1 and A5.
 A9. Txt Img Wighted Merge A (mixed): A score-based combination of Runs
     A1 and A6.
A10. Merge RankToScore weighted A (mixed): A rank-based combination of
     Runs A1 and A6.


3.4   Case-based Article Retrieval Runs

We submitted the following three runs for the case-based article retrieval task:
 C1. nlm-case-based-textual (textual): A combination of three queries for each
     topic sentence using Essie. (C1.Q1) A disjunction of modality terms ex-
     tracted from the sentence must occur within the caption or mention fields
     of an article’s textual features; a disjunction of the remaining terms is
     allowed to occur in any field. (C1.Q2) A lossy expansion of the verbatim
     sentence is allowed to occur in any field. (C1.Q3) A disjunction of all
     extracted words in the sentence is allowed to occur in any field. Articles
     are scored according to the sentence resulting in the maximum score.
 C2. nlm-case-based-visual (visual): A disjunction of the query images’ clustered
     visual descriptors must occur within the global image feature field.
 C3. nlm-case-based-mixed (mixed): A combination of Queries A1.Q1–Q3 with
     Run A2.


4     Results

Tables 2–5 summarize the results of our modality classification, compound figure
separation, ad-hoc image retrieval, and case-based article retrieval runs. In Table 2
           Table 2: Accuracy results for the modality classification task.
    ID                                                   Mode      Accuracy (%)
    nlm mixed using 2013 visual classification 2         Mixed                69.28
    nlm mixed using 2013 visual classification 1         Mixed                68.74
    nlm mixed hierarchy                                  Mixed                67.31
    nlm mixed using 2012 visual classification           Mixed                67.07
    nlm visual only hierarchy                            Visual               61.50
    nlm textual only flat                                Textual              51.23


         Table 3: Accuracy results for the compound figure separation task.
    ID                                                   Mode      Accuracy (%)
    nlm multipanel separation                             Mixed               69.27


and Table 3, we give the accuracy of our figure classification and separation
methods. In Table 4 and Table 5, we give the mean average precision (MAP),
binary preference (bpref) and precision-at-ten (P@10) of our retrieval methods.


5        Conclusion

This article describes the methods and results of the Image and Text Integration
(ITI) group in the ImageCLEF 2013 medical classification, segmentation, and
retrieval tasks. Our methods are similar to those we have developed for previous
ImageCLEF evaluations, and they include a variety of textual, visual, and mixed
approaches. For the modality classification task, our best submission was a
mixed approach that achieved an accuracy of 69.28% and was ranked within
the submissions from the top five participating groups. For the compound figure


            Table 4: Retrieval results for the ad-hoc image retrieval task
    ID                                         Mode      MAP       bpref     P@10
    nlm-se-image-based-mixed                   Mixed     0.3196    0.2983    0.3886
    nlm-se-image-based-textual                 Textual   0.3196    0.2982    0.3886
    Txt Img Wighted Merge A                    Mixed     0.3124    0.3014    0.3886
    Merge RankToScore weighted A               Mixed     0.3120    0.2950    0.3771
    Txt Img Wighted Merge                      Mixed     0.3086    0.2938    0.3857
    Merge RankToScore weighted                 Mixed     0.3032    0.2872    0.3943
    image latefusion merge                     Visual    0.0110    0.0207    0.0257
    image latefusion merge filter              Visual    0.0101    0.0244    0.0343
    latefusuon accuracy merge                  Visual    0.0092    0.0179    0.0314
    nlm-se-image-based-visual                  Visual    0.0002    0.0021    0.0029
       Table 5: Retrieval results for the case-based article retrieval task
 ID                                             Mode        MAP        bpref      P@10
 nlm-se-case-based-mixed                        Mixed       0.0886     0.0926     0.1457
 nlm-se-case-based-textual                      Textual     0.0885     0.0926     0.1457
 nlm-se-case-based-visual                       Visual      0.0008     0.0044     0.0057


separation task, our mixed approach resulted in an accuracy of 69.27% and was
ranked second among four submissions from three groups participating in this
task. Similar to our experience in previous years, our best submission for the
ad-hoc image retrieval task was also a mixed approach, achieving a mean average
precision of 0.3106 and ranking first overall. Finally, for the case-based article
retrieval task, our best submission obtained a mean average precision of 0.0886.
This result is much lower than what we have achieved in previous ImageCLEF
evaluations. Despite our performance on the case-based task, the effectiveness of
our mixed approaches are encouraging and provide evidence that our ongoing
efforts at integrating textual and visual information will be successful.


Acknowledgments. We would like to thank Suchet Chandra for preparing our
collection and extracting the textual and visual features used by our methods.


References

 1. Chang, S.F., Sikora, T., Puri, A.: Overview of the MPEG-7 standard. IEEE
    Transactions on Circuits and Systems for Video Technology 11(6), 688–695 (2001)
 2. Chatzichristofis, S.A., Boutalis, Y.S.: CEDD: Color and edge directivity descriptor:
    A compact descriptor for image indexing and retrieval. In: Gasteratos, A., Vincze, M.,
    Tsotsos, J.K. (eds.) Proceedings of the 6th International Conference on Computer
    Vision Systems. Lecture Notes in Computer Science, vol. 5008, pp. 312–322. Springer
    (2008)
 3. Chatzichristofis, S.A., Boutalis, Y.S.: FCTH: Fuzzy color and texture histogram: A
    low level feature for accurate image retrieval. In: Proceedings of the 9th International
    Workshop on Image Analysis for Multimedia Interactive Services. pp. 191–196 (2008)
 4. de Herrera, G.S., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S., Müller,
    H.: Overview of the ImageCLEF 2013 medical tasks. In: Working notes of CLEF
    2013 (2013)
 5. Ide, N.C., Loane, R.F., Demner-Fushman, D.: Essie: A concept-based search en-
    gine for structured biomedical text. Journal of the American Medical Informatics
    Association 1(3), 253–263 (2007)
 6. Lindberg, D., Humphreys, B., McCray, A.: The unified medical language system.
    Methods of Information in Medicine 32(4), 281–291 (1993)
 7. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings
    of the Seventh IEEE International Conference on Computer Vision. vol. 2, pp.
    1150–1157 (1999)
 8. Lux, M., Chatzichristofis, S.A.: LIRe: Lucene image retrival—an extensible java
    CBIR library. In: Proceedings of the 16th ACM International Conference on Multi-
    media. pp. 1085–1088 (2008)
 9. Mäenpää, T.: The Local Binary Pattern Approach to Texture Analysis—Extensions
    and Applications. Ph.D. thesis, University of Oulu (2003)
10. Rahman, M.M., Antani, S., Thoma, G.: A medical image retrieval framework in
    correlation enhanced visual concept feature space. In: Proceedings of the 22nd
    IEEE International Symposium on Computer-Based Medical Systems (2009)
11. Simpson, M.S., You, D., Rahman, M.M., Demner-Fushman, D., Antani, S., Thoma,
    G.: ITI’s participation in the ImageCLEF 2012 medical retrieval and classification
    tasks. In: Working Notes for the Conference on Multilingual and Multimodal
    Information Access Evaluation (CLEF). September 17–20 (Rome, Italy 2012)
12. Srinivasan, G.N., Shobha, G.: Statistical texture analysis. In: Proceedings of World
    Academy of Science, Engineering and Technology. vol. 36, pp. 1264–9 (2008)
13. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual
    perception. IEEE Transactions on Systems, Man, and Cybernetics 8(6), 460–73
    (1978)

</pre>