=Paper= {{Paper |id=Vol-1177/CLEF2011wn-ImageCLEF-SimpsonEt2011 |storemode=property |title=Text and Content-based Approaches to Image Modality Classification and Retrieval for the ImageCLEF 2011 Medical Retrieval Track |pdfUrl=https://ceur-ws.org/Vol-1177/CLEF2011wn-ImageCLEF-SimpsonEt2011.pdf |volume=Vol-1177 }} ==Text and Content-based Approaches to Image Modality Classification and Retrieval for the ImageCLEF 2011 Medical Retrieval Track== https://ceur-ws.org/Vol-1177/CLEF2011wn-ImageCLEF-SimpsonEt2011.pdf

Text- and Content-based Approaches to Image
Modality Classification and Retrieval for the
ImageCLEF 2011 Medical Retrieval Track

Matthew Simpson, Md Mahmudur Rahman, Srinivas Phadnis, Emilia
Apostolova, Dina Demner-Fushman, Sameer Antani, and George Thoma

Lister Hill National Center for Biomedical Communications
U.S. National Library of Medicine, NIH, Bethesda, MD, USA

Abstract. This article describes the participation of the Communica-
tions Engineering Branch (CEB), a division of the Lister Hill National
Center for Biomedical Communications, in the ImageCLEF 2011 medical
retrieval track. Our methods encompass a variety of techniques relating
to text- and content-based image retrieval. Our textual approaches pri-
marily utilize the Unified Medical Language System (UMLS) synonymy
to identify concepts in topic descriptions and image-related text, and our
visual approaches utilize similarity metrics based on computed “visual
concepts” and low-level image features. We also explore mixed approaches
that utilize a combination of textual and visual features. In this article
we present an overview of the application of our methods to the modality
classification, ad-hoc image retrieval, and case-based image retrieval tasks,
and we describe our submitted runs and results.

Keywords: Image Retrieval, Case-based Retrieval, Image Modality

1 Introduction

This article describes the participation of the Communications Engineering
Branch (CEB), a division of the Lister Hill National Center for Biomedical
Communications, in the ImageCLEF 2011 medical retrieval track.
The medical retrieval track [9] of ImgeCLEF 2011 consists of an image
modality classification task and two retrieval tasks. For the modality classification
task, the goal is to classify a given set of medical images according to eighteen
modalities (e.g., CT or Histopathology) taken from five classes (e.g., Radiology
or Microscopy). In the first retrieval task, a set of ad-hoc information requests is
given, and the goal is to retrieve the most relevant images for each topic. Finally,
in the second retrieval task, a set of case-based information requests is given, and
the goal is to retrieve the most relevant articles describing similar cases.
In the following sections, we describe the textual and visual features that
comprise our image and case representations (Sections 2–3) and our methods
for the modality classification (Section 4) and medical retrieval tasks (Sections
5–6). Our textual approaches primarily utilize the Unified Medical Language
System (UMLS) [11] synonymy to identify concepts in topic descriptions and
image-related text, and our visual approaches rely on similarity metrics based on
computed “visual concepts” and other low-level visual features. We also explore
mixed approaches for the modality classification and retrieval tasks that utilize a
combination of textual and visual features.
In Section 7 we describe our submitted runs, and in Section 8 we present
our results. For the modality classification task, our best submission achieved a
classification accuracy of 74% and was ranked within the submissions from the
top three groups. For the retrieval tasks, our results were lower than expected
yet reveal new insights which we anticipate will improve future work. For the
modality classification and image retrieval tasks, our best results were obtained
using mixed approaches, indicating the importance of both textual and visual
features for these tasks.

2 Image Representation
Images contained in biomedical articles can be represented using both textual and
visual features. Textual features can include text from an article that pertains
to an image, such as image captions and “mentions” (snippets of text within
the body of an article that discuss an image), and visual features can include
information derived from the content of an image, such as shape, color and
texture. We describe the features we use in representing images below.

2.1 Textual Features
We represent each image in the ImageCLEF 2011 medical collection as a structured
document of image-related text. Our representation includes the title, abstract,
and MeSH terms1 of the article in which the image appears as well as the
image’s caption and mentions. Additionally, we identify within image captions
textual Regions of Interest (ROIs). A textual ROI is a noun phrase describing the
content of an interesting region of an image and is identified within a caption by a
pointer. For example, in the caption “MR image reveals hypointense indeterminate
nodule (arrow),” the word arrow points to the ROI containing a hypointense
indeterminate nodule.
The above structured documents may be indexed and searched with a tradi-
tional search engine or the underlying term vectors may be exposed and added
to a mixed image representation that includes the visual features described in
Section 2.2. For the latter approach, the terms in a structured document field
Dj (e.g., caption) are commonly represented as an N -dimensional vector

fjterm = [wj1 , wj2 , · · · , wjN ]T (1)
where wjk denotes the tf-idf weight of term tk in document field Dj , and N is
the size of the vocabulary.
1
MeSH is a controlled vocabulary created by U.S. National Library of Medicine to
index biomedical articles.
2.2 Visual Features
In addition to the above textual features, we also represent the visual content
of images using various low-level global image features and a derived feature
intended to capture the high-level semantic content of images.

Low-level Global Features We represent the spatial structure and global
shape and edge features of images with the Color Layout Descriptor (CLD) and
Edge Histogram Descriptor (EHD) of MPEG-7 [2]. We extract the CLD feature
as a vector f cld and the EHD feature as f ehd . Additionally, we extract the Color
and Edge Directivity Descriptor (CEDD) [3] as f cedd and the Fuzzy Color and
Texture Histogram (FCTH) [4] as f fcth using the Lucene image retrieval (LIRE)
library.2 Both CEDD and FCTH incorporate color and texture information into
single histograms that are suitable for image indexing and retrieval.

Concept Feature In a heterogeneous medical image collection, it is possible
to identify specific local patches in images that are perceptually or semantically
distinguishable, such as homogeneous texture patterns in gray-level radiological
images or differential color and texture structures in microscopic pathology
images. The variation in the local patches can be effectively modeled as “visual
concepts” [12] using supervised machine learning-based classification techniques.
For the generation of these concepts, we utilize a multi-class Support Vector
Machine (SVM) composed of several binary classifiers organized using the one-
against-one strategy [7]. To train the SVMs, we manually assign a set of L
visual concepts C = {c1 , · · · , ci , · · · , cL } to the color and texture features of each
fixed-size patch contained in an image. For a single image, the input to the
training process is a set of color and texture feature vectors for all fixed-size
patches along with their manually assigned concept labels. We generate the
concept feature for each image Ij in the collection by first partitioning Ij into l
patches as {x1j , · · · , xkj , · · · , xlj }, where each xkj ∈