=Paper=
{{Paper
|id=Vol-1177/CLEF2011wn-ImageCLEF-SimpsonEt2011
|storemode=property
|title=Text and Content-based Approaches to Image Modality Classification and Retrieval for the ImageCLEF 2011 Medical Retrieval Track
|pdfUrl=https://ceur-ws.org/Vol-1177/CLEF2011wn-ImageCLEF-SimpsonEt2011.pdf
|volume=Vol-1177
}}
==Text and Content-based Approaches to Image Modality Classification and Retrieval for the ImageCLEF 2011 Medical Retrieval Track==
Text- and Content-based Approaches to Image Modality Classification and Retrieval for the ImageCLEF 2011 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Srinivas Phadnis, Emilia Apostolova, Dina Demner-Fushman, Sameer Antani, and George Thoma Lister Hill National Center for Biomedical Communications U.S. National Library of Medicine, NIH, Bethesda, MD, USA Abstract. This article describes the participation of the Communica- tions Engineering Branch (CEB), a division of the Lister Hill National Center for Biomedical Communications, in the ImageCLEF 2011 medical retrieval track. Our methods encompass a variety of techniques relating to text- and content-based image retrieval. Our textual approaches pri- marily utilize the Unified Medical Language System (UMLS) synonymy to identify concepts in topic descriptions and image-related text, and our visual approaches utilize similarity metrics based on computed “visual concepts” and low-level image features. We also explore mixed approaches that utilize a combination of textual and visual features. In this article we present an overview of the application of our methods to the modality classification, ad-hoc image retrieval, and case-based image retrieval tasks, and we describe our submitted runs and results. Keywords: Image Retrieval, Case-based Retrieval, Image Modality 1 Introduction This article describes the participation of the Communications Engineering Branch (CEB), a division of the Lister Hill National Center for Biomedical Communications, in the ImageCLEF 2011 medical retrieval track. The medical retrieval track [9] of ImgeCLEF 2011 consists of an image modality classification task and two retrieval tasks. For the modality classification task, the goal is to classify a given set of medical images according to eighteen modalities (e.g., CT or Histopathology) taken from five classes (e.g., Radiology or Microscopy). In the first retrieval task, a set of ad-hoc information requests is given, and the goal is to retrieve the most relevant images for each topic. Finally, in the second retrieval task, a set of case-based information requests is given, and the goal is to retrieve the most relevant articles describing similar cases. In the following sections, we describe the textual and visual features that comprise our image and case representations (Sections 2–3) and our methods for the modality classification (Section 4) and medical retrieval tasks (Sections 5–6). Our textual approaches primarily utilize the Unified Medical Language System (UMLS) [11] synonymy to identify concepts in topic descriptions and image-related text, and our visual approaches rely on similarity metrics based on computed “visual concepts” and other low-level visual features. We also explore mixed approaches for the modality classification and retrieval tasks that utilize a combination of textual and visual features. In Section 7 we describe our submitted runs, and in Section 8 we present our results. For the modality classification task, our best submission achieved a classification accuracy of 74% and was ranked within the submissions from the top three groups. For the retrieval tasks, our results were lower than expected yet reveal new insights which we anticipate will improve future work. For the modality classification and image retrieval tasks, our best results were obtained using mixed approaches, indicating the importance of both textual and visual features for these tasks. 2 Image Representation Images contained in biomedical articles can be represented using both textual and visual features. Textual features can include text from an article that pertains to an image, such as image captions and “mentions” (snippets of text within the body of an article that discuss an image), and visual features can include information derived from the content of an image, such as shape, color and texture. We describe the features we use in representing images below. 2.1 Textual Features We represent each image in the ImageCLEF 2011 medical collection as a structured document of image-related text. Our representation includes the title, abstract, and MeSH terms1 of the article in which the image appears as well as the image’s caption and mentions. Additionally, we identify within image captions textual Regions of Interest (ROIs). A textual ROI is a noun phrase describing the content of an interesting region of an image and is identified within a caption by a pointer. For example, in the caption “MR image reveals hypointense indeterminate nodule (arrow),” the word arrow points to the ROI containing a hypointense indeterminate nodule. The above structured documents may be indexed and searched with a tradi- tional search engine or the underlying term vectors may be exposed and added to a mixed image representation that includes the visual features described in Section 2.2. For the latter approach, the terms in a structured document field Dj (e.g., caption) are commonly represented as an N -dimensional vector fjterm = [wj1 , wj2 , · · · , wjN ]T (1) where wjk denotes the tf-idf weight of term tk in document field Dj , and N is the size of the vocabulary. 1 MeSH is a controlled vocabulary created by U.S. National Library of Medicine to index biomedical articles. 2.2 Visual Features In addition to the above textual features, we also represent the visual content of images using various low-level global image features and a derived feature intended to capture the high-level semantic content of images. Low-level Global Features We represent the spatial structure and global shape and edge features of images with the Color Layout Descriptor (CLD) and Edge Histogram Descriptor (EHD) of MPEG-7 [2]. We extract the CLD feature as a vector f cld and the EHD feature as f ehd . Additionally, we extract the Color and Edge Directivity Descriptor (CEDD) [3] as f cedd and the Fuzzy Color and Texture Histogram (FCTH) [4] as f fcth using the Lucene image retrieval (LIRE) library.2 Both CEDD and FCTH incorporate color and texture information into single histograms that are suitable for image indexing and retrieval. Concept Feature In a heterogeneous medical image collection, it is possible to identify specific local patches in images that are perceptually or semantically distinguishable, such as homogeneous texture patterns in gray-level radiological images or differential color and texture structures in microscopic pathology images. The variation in the local patches can be effectively modeled as “visual concepts” [12] using supervised machine learning-based classification techniques. For the generation of these concepts, we utilize a multi-class Support Vector Machine (SVM) composed of several binary classifiers organized using the one- against-one strategy [7]. To train the SVMs, we manually assign a set of L visual concepts C = {c1 , · · · , ci , · · · , cL } to the color and texture features of each fixed-size patch contained in an image. For a single image, the input to the training process is a set of color and texture feature vectors for all fixed-size patches along with their manually assigned concept labels. We generate the concept feature for each image Ij in the collection by first partitioning Ij into l patches as {x1j , · · · , xkj , · · · , xlj }, where each xkj ∈