=Paper=
{{Paper
|id=Vol-1176/CLEF2010wn-ImageCLEF-WuEt2010
|storemode=property
|title=UESTC at ImageCLEF 2010 Medical Retrieval Task
|pdfUrl=https://ceur-ws.org/Vol-1176/CLEF2010wn-ImageCLEF-WuEt2010.pdf
|volume=Vol-1176
}}
==UESTC at ImageCLEF 2010 Medical Retrieval Task==
UESTC at ImageCLEF 2010 medical retrieval task Hong Wu, Changjun Hu, Sikun Chen School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731, P. R. China hwu@uestc.edu.cn Abstract. This paper presents the UESTC contribution to the ImageCLEF 2010 medical retrieval task. For ad-hoc retrieval and case-based retrieval, we only use text information, and propose a phrase-based approach. Phrases, subphrases and individual words are used with vector space model (VSM) for ranking. Phrases and subphrases are extracted with the help of MetaMap, and all extracted phrasal terms are corresponding to concepts in UMLS. Two term weighting methods are proposed, one is to weight terms with their idfs, and the other is adapted to assign lower weights to phrasal terms. We also propose a query expansion method which can extract more phrases for query by relaxing the restrictions on phrase extraction. For modality classification, we use three global texture features with SVM and Ada-boost.MH respectively. Keywords: text retrieval, image retrieval, medical retrieval, modality classification, phrase extraction, MetaMap, UMLS 1 Introduction This paper describes the first participation of the School of Computer Science and Engineering at University of Electronic Science and Technology of China (UESTC) in the ImageCLEF 2010 medical retrieval task. ImageCLEFmed'10 [1] includes three types of tasks, ad-hoc retrieval, case-based retrieval and modality classification. For the retrieval tasks, the datatset similar to 2008 and 2009 is used but with a larger number of images. The dataset contains all images (>77,000) from articles published in Radiology and Radiographics including the text of the captions and a link to the html of the full text articles. In the ad-hoc retrieval task, a set of textual queries, each of which with several sample images, are given, and the goal is to retrieve the images most relevant to each topic. In the case- based task, a set of case-based information requests are given, and the goal is to retrieve the articles most relevant to the topic case. In the modality classification task, training and testing medical images are given for classification based on their modality, such as CT, MR, XR etc. In this paper, we describe our phrase-based approach to two retrieval tasks and classification algorithm for modality classification. For retrieval tasks, only text information of title and caption is used. Phrases, subphrases and individual words are used as indexing terms with vector space model (VSM). Phrases and subphrases are extracted with the help of MetaMap 1 , so that all the phrasal terms are corresponding to concepts in UMLS2. Since the text information for ad-hoc is very short, it is necessary to adapt traditional term weighting methods. We propose to weight terms with their idfs, and measure the similarity with dot-product. But in this way, the phrasal terms are always over-rewarded. Then we give another weighting method which assigning lower weights to phrasal terms. We also propose a query expansion method which can extract more phrases for query by relaxing the restrictions on phrase extraction. For modality classification, we use three global texture features with SVM [2] and Ada-boost.MH [3] respectively. The remainder of this paper is organized as follows. The phrase-based retrieval approach and modality classification algorithm are described in section 2 and 3 respectively. And our submitted runs and results are presented in section 4, followed by the conclusions and future works in section 5. 2 Phrase-based Medical Retrieval 2.1 Using phrase as indexing term The selection of appropriate indexing terms is critical to information retrieval. Traditional retrieval systems use word or word stem as indexing term. And these representations of content are usually inadequate since single words are rarely specific enough for accurate discrimination. A better method is to identify groups of words that form meaningful phrases, especially if these phrases denote important concepts in the related domain. This is corresponding to using phrase or concept as indexing term. In the past years, concept-based approaches have been investigated in ImageCLEFmed [4, 5, 6], but to the best of our knowledge, there's still no work using phrase as indexing term in this campaign. In this year, we investigate the phrase- based medical retrieval. In the past, various types of phrases, such as sequential n-grams [7], head-modifier pairs extracted from syntactic structures [8, 9, 10, 11], proximity-based phrases [12], were examined with conventional retrieval models (e.g. vector space model). In our approach, we consider phrases which are corresponding to medical concepts. The phrases are extracted with the help of MetaMap, which is a highly configurable program to map biomedical text to the UMLS Metathesaurus. MetaMap maps the longest possible phrase to concept so that it discovers the most specific concept possible. This may cause much mismatch between query terms and document terms, if the detected concepts (CUIs) are directly used as indexing term. Because a general concept and a specific concept may all be relevant to a user’s need, also the meaning of a concept can be expressed by a phrase or several words or phrases co-occur in the context. [13] has given a example for this: 1 http://mmtx.nlm.nih.gov/ 2 http://www.nlm.nih.gov/research/umls The 24th query of ImagCLEF2005 is “Show me images of right middle lobe pneumonia”, and the best mapping schema of Metamap will give these concepts: “C0150627” (Images) “C0578577” (Right middle lobe pneumonia) But the relevant documents contain concepts, “ C0032285” (pneumonia) or “C0796494” (lobe) will not match with query concepts, and then will get an unfavorable ranking. One way to tackle this problem is to expand query or document with some concepts related to the mapped concepts, e.g. hypernyms or hyponym [13]. Following this way, we give the derivation of our approach. When mapping a phrase to concept, MetaMap also generates some candidates. Candidates consist of one or more constituent words or their variants of the phrase, and are corresponding to concepts in UMLS. Some of these concepts are related to the mapped most specific concept, and can be used to expand query or document. But when generating candidate, a phrase can be mapped to several concepts (more frequently for subphrase), and much noise will be introduced if all corresponding concepts are added. So, we consider using phrase (subphrase) instead of concept (CUI) to represent document, and phrases, subphrases and individual words are all used as index terms. The subphrases of a noun phrase capture a part of the meaning of the noun phrase, and can be regarded as a weak representation of its meaning. And the use of both phrase and its subphrases can increase the chance of match between query and document having difference linguistic forms of similar meaning. 2.2 Phrase Extraction In our experiments, we do not develop a phrase extraction algorithm. All phrases and their subphrase are extracted by MetaMap. MetaMap [14] performs the following steps to map text to concept for each textual utterance: 1. Parse the text into noun phrases and perform the remaining steps for each phrase; 2.Generate the variants for the noun phrase where a variant essentially consists of one or more noun phrase words together with all of its spelling variants, abbreviations, acronyms, synonyms, inflectional and derivational variants, and meaningful combinations of these; 3. Form the candidate set of all Metathesaurus strings containing one of the variants; 4. For each candidate, compute the mapping from the noun phrase and calculate the strength of the mapping using an evaluation function. Order the candidates by mapping strength; and 5. Combine candidates involved with disjoint parts of the noun phrase, recompute the match strength based on the combined candidates, and select those having the highest score to form a set of best Metathesaurus mappings for the original noun phrase. The best candidate is corresponding to the longest phrase, and other candidates corresponding to its subphrases or constituent words. The phrases and multi-word subphrases are added to query and document before indexing and retrieval. MetaMap is designed for mapping the longest possible phrase to concept, not for phrase and subphrase extraction. We find that it is not easy to control it for phrase extraction. For example, When processing 8th query: “microscopic images of streptococcus pneumonia”, MateMap generates a candidate “streptococcus pneumoniae” with LexVariation=0.5 due to an inflectional variation. For “chest x- ray” in 17th query, MetaMap generates an unwanted candidate “breast x-ray” with LexVariation=2. There’s no simple rule to select candidates having meaning closely related to the best candidate, and we use a strict rule. In experiments, we use 0910 Strict Model Dataset for MetaMap. When calling MetaMap for phrase extraction, we do not allow derivational variants by setting the parameter ‘-d’, because derivational variants always involve a significant change in meaning. From output files of MetaMap, only candidates with ‘LexVariation’=0 and ‘MatchedWords Count’>1 are selected to form the phrasal terms. ‘LexVariation’=0 implies that no lexical variances are permitted in phrase extraction, and ‘MatchedWords Count’>1 means only multi-words phrases are selected. 2.3 Term weighting We use phrasal terms and single word terms with VSM, and propose two term weighting methods. For the ad-hoc retrieval task, the context information (title and caption) is much shorter than document in traditional IR. Thus we think that the term frequency (tf) may be not important in this case, and use a simple term weighting method, where only idfs of indexing terms are used for term weighting. This term weighting is also used for case-based retrieval. And the similarity between query and document is measured by dot product of query vector and document vector. When using VSM to combine weights of phrases, subphrases and single word terms, phrasal terms are over-rewarded. Since occurrence of a phrase in a document also indicates the occurrence of its subphrases and constituent words. To solve this problem, we propose another term weighting method which associates lower weights to phrasal terms. For convenience of description, we introduce some concepts to describe the relationship between phrases. We say a phrase or single word A is an offspring component of a phrase B, if and only if it is a subphrase or constituent word of phrase B. We say a phrase or single word A is a son component of a phrase B, if and only if A is an offspring component of B, and there’s no offspring component of B which having A as its offspring component. In the second term weighting method, the weight of phrasal term is changed to be its idf minus the maximum idf of its son components. 2.4. Query Expansion Our query expansion algorithm is just to relax the restrictions on phrase extraction. Candidates with ‘MatchedWords Count’>1 (multi-words) are selected to form the phrasal terms for query, and more phrase terms are extracted than previous setting. 3. modality classification For modality classification, we use three global texture features: LBP texture feature, Gabor texture feature and Tamura texture feature. LBP: Local Binary Pattern (LBP) [15] features have performed very well in various applications, including texture classification and segmentation, image retrieval and surface inspection. In our experiments, LBP operator with 8 neighbors on a circle of radius 4 is applied to each pixel, and the obtained results are cumulated to form 256- dim LBP histogram. Tamura Texture Feature: Based on the research of textural features corresponding to human visual perception, Tamura et al.[16] proposed six basic textural features, namely, coarseness, contrast, directionality, line likeness, regularity, and roughness. In our experiments, coarseness, contrast and directionality features are computed on a per-pixel basis, and the values are quantized into a three-dimensional histogram ( 8× 8×8=512 bins) to form one 512-dim vector. Gabor Texture Feature: Gabor filter based approaches are popular for texture feature extraction. Based on the work of Manjunath et al [17], gabor filters with 3 scales and 4 orientations are used to filter image, and the values in the filtered images are quantized to 10 bins to form a 120-dim histogram feature. For feature combination, they are simply concatenated to form an 888-dim feature vector. We use two algorithms for classification. One is SVM [3] with rbf kernel, and one- vs-one strategy is used for multi-class classification. LibSVM [18] is used in our experiments, and the parameters are tuned by cross-validation on training data. The other is Adaboost.MH [4], a multi-class boosting algorithm. An implementation named MultiBoost [19] is used in our experiments. 4. Submitted Runs and Results For ad-hoc retrieval, collection with title and caption only is used since it was proven to be effective and obtain the best results in ImageCLEFmed 2008 [20]. After phrasal terms were added, collection is indexed by Lemur IR toolkit3. We also update the stop word list to add common terms found in the queries that are not relevant to medical domain such as ‘image’, ‘photo’, and ‘figure’. For convenience, this procedure has also been applied to case-based retrieval without update. But the use of title and caption only may lose important information for case-based retrieval and result in poor performance. 4.1 Ad-hoc Retrieval We have submitted the following 3 textual runs for the 16 ad-hoc topics [1]: 3 http://www.lemurproject.org/ (1) UESTC_image_pBasic: Phrasal terms are extracted by the approach described in Section 2.2, and terms are weighted by their idfs as the first method in Section 2.3. Similarity is measured by dot product of query vector and document vector. (2) UESTC_image_pNw: It is similar to the above basic run (UESTC_image_pBasic), but the term weighting method is changed to the second method in Section 2.3. (3) UESTC_image_pQE: This run is similar to the basic run (UESTC_image_pBasic), but a query expansion method (Section 2.4) is used to get more phrasal terms for query. To evaluate the effectiveness of the use of phrasal term, we conduct an additional run Image_word_idf when preparing this report. Image_word_idf uses word stem as indexing term, and the term weighting is the same as UESTC_image_pBasic. Table1. Results of the Submitted and additional Runs for the Ad-hoc Retrieval Runs MAP bPref P10 UESTC_image_pQE 0.2789 0.297 0.3125 UESTC_image_pNw 0.2751 0.3028 0.3438 UESTC_image_pBasic 0.2713 0.2963 0.3438 Image_word_idf 0.2216 0.2499 0.3375 Table 1 gives the results of our three submitted runs and the additional run for the ad- hoc retrieval. The performances of the three submitted runs are very similar. The performance of UESTC_image_pNw is better than UESTC_image_pBasic, but the improvement is subtle. UESTC_image_pQE achieves the best MAP (0.2789) of our submitted runs, and is ranked 3rd among all best official runs of each group for automatic textual retrieval. But P10 of UESTC_image_pQE is lower than the other two runs. The MAPs and bPrefs of the three phrase-based approaches are apparently superior to word stem based approach Image_word_idf. This may hint the use of phrase in medical retrieval. Table 2 presents the performance of best official runs of each group for automatic textual retrieval. The third run with bold text is our best official textual run. Table 2. Performance of Best Official Runs of Each Group for Automatic Textual Retrieval Runs MAP ipl_aueb_AdHoc_default_TC 0.3235 OHSU_pm_all_all_mod 0.3029 UESTC_image_pQE 0.2789 issr_CT 0.2583 runfile_hes-so-vs_image-based_captions 0.2568 ad_hoc_QE_0.1_Citations_and_All_Image_Text 0.188 NMFText_k2_11 0.1005 Originally with our method, the 16th ad-hoc topic “images of dermatofibroma” will match no document. And we update this query by inserting a space character to “images of dermato fibroma”, when conducting the submitted runs. From the raw results, we calculate the corrected results which corresponding to runs without update to 16th topic. The corrected MAPs of UESTC_image_pQE, UESTC_image_pNW, UESTC_image_pBasic are 0.2777, 0.2739, and 0.2701 respectively. The corrected bPrefs of them are 0.2969, 0.3027, and 0.2962 respectively. And the P10s of them stay unchanged. The differences between the corrected and original results are subtle, and will not affect the conclusions on the results of submitted runs. 4.2 Case-based Retrieval The methods used in ad-hoc retrieval are directly used for case-based retrieval, and 3 textual runs are submitted for 14 case-based topics [1]. (1) UESTC_case_pBasic: This run uses the same method as UESTC_image_pBasic. (2) UESTC_case_pNw: This run uses the same method as UESTC_image_pNw. (3) UESTC_case_pQE: This run uses the same method as UESTC_image_pQE. For evaluation, we also conduct an additional run Case_word_idf, and the methods used in them are the same as Image_word_idf. All the four runs are automatic textual runs. Table 3. Results of the Submitted Runs and additional Runs for the Case-based Retrieval Runs MAP bPref P10 UESTC_case_pBasic 0.1692 0.184 0.2643 UESTC_case_pQE 0.1677 0.1852 0.2786 UESTC_case_pNw 0.1522 0.1725 0.2714 Case_word_idf 0.1501 0.1688 0.2786 Table 3 gives the results of our three submitted runs and the additional run for the case-based retrieval. UESTC_case_pBasic achieves the best MAP (0.1692) of our submitted runs, and is ranked 6th among all 7 groups who submitted for automatic textual runs. The best MAP is 0.2902, achieved by UIUCIBM. The low performance of our approach in case-based retrieval may have two reasons, the first is that only article title and image caption are used for indexing, some important information may lose in the process; the second is that our methods originally proposed for ad-hoc retrieval may not be suitable for case-based retrieval. 4.3 Modality Classification We use LBP texture feature, Gabor texture feature and Tamura texture feature for modality classification, and submit two visual runs with different classifier. (1) UESTC_modality_boosting: This run uses Adaboost.MH with the three global texture features for modality classification. (2) UESTC_modality_svm: This run uses SVM with the three global features for modality classification. Table 4. Results of the Submitted Runs for Modality Classification Run Accuracy UESTC_modality_boosting 0.82 UESTC_modality_svm 0.80 Table 4 presents the classification accuracies of the two submitted runs. UESTC_modality_boosting achieves the best accuracy of 0.82, and is ranked 2th among all 5 groups who submitted visual runs. 5. Conclusions and Future Work This paper describes our contribution to the ImageCLEF 2010 medical retrieval task. For ad-hoc retrieval, we have submitted 3 runs with our phrase-based approaches. With the same methods, 3 runs have been submitted for case-based retrieval. For modality classification, 2 runs have been submitted, using global texture features with two different classifiers respectively. The runs submitted to ad-hoc retrieval and modality classification are successful, and achieve 3th rank in automatic textual retrieval and 2th in modality classification. Our research on medical retrieval is still primary, both the phrase extraction and term weighting. And there’s no extensive comparison of different methods. In the future, we will develop and compare different phrase extraction algorithms and term weighting schemes, and use more text features for cased-based retrieval. For modality classification, we plan to test other visual features and advanced classification algorithms. Acknowledgments. This research is partly supported by the National Science Foundation of China under grants 60873185 and by the Key Program of the Youth Science Foundation of UESTC under Grant JX0745. References 1. Henning Müller, Jayashree Kalpathy-Cramer, Ivan Eggel, Steven Bedrick, Charles E. Kahn Jr., and William Hersh. Overview of the CLEF 2010 medical image retrieval track. In the Working Notes of CLEF 2010, Padova, Italy, (2010) 2. Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1995) 3. Schapire, R. & Singer, Y., ‘Improved boosting algorithms using confidence-rated prediction’, Machine Learning 37(3), 297–336, (1999) 4. Lacoste, C., Chevallet, J.P., Lim, J.H., Wei, X., Raccoceanu, D., Le, T.H.D., Teodorescu, R., Vuillenemot, N.: Ipal knowledge-based medical image retrieval in imageclefmed 2006. In: Working Notes for the CLEF 2006 Workshop, Alicante, Spain, September 20-22 (2006) 5. Chevallet, J.P., Lim, J.H., Le, T.H.D.: Domain knowledge conceptual inter-media indexing, application to multilingual multimedia medical reports. In: ACM Sixteenth Conference on Information and Knowledge Management (CIKM 2007), November 6-9 (2007) 6. Maisonnasse, L., Gaussier, E., Chevallet, J.P.: Multiplying concept sources for graph modeling. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152. Springer, Heidelberg (2008) 7. Mitra, M., Buckley, C., Singhal, A., and Cardie, C. An analysis of statistical and syntactic phrases. In Proceedings of RIAO ’97, pages 200–214, (1997) 8. Lewis, D.D., and Croft., W.B., Term clustering of syntactic phrases. In Proceedings of SIGIR ’90, pages 385–404, (1990) 9. Zhai, C., Fast statistical parsing of noun phrases for document indexing. In Proceedings of ANLP ’97, pages 312–319, (1997) 10. Dillon, M., and Gray, A.S. Fasit: A fully automatic syntactically based indexing system. Journal of the American Society for Information Science, 34(2):99–108, (1983) 11. Strzalkowski, T., Perez-Carballo, J., and Marinescu, M. Natural language information retrieval: Trec-3 report. In Proceedings of TREC-3, pages 39–54, (1994) 12. Turpin, A., and Moffat, A., Statistical phrases for vector-space information retrieval. In Proceedings of SIGIR ’99, pages 309–310, (1999) 13. Le T.H.D., Chevallet, J.-P., Dong T.B.T., Thesaurus-based query and document expansion in conceptual indexing with UMLS: Application in medical information retrieval,IEEE International Conference on In Research, Innovation and Vision for the Future, pp. 242- 246,(2007) 14. Aronson, A. R., MetaMap: Mapping Text to the UMLS Metathesaurus, http://skr.nlm.nih.gov/papers/references/metamap06.pdf , July (2006) 15. Ojala, T., Peitikäinen, M., and Mäenpää, T., “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Analysis and. Machine Intelligence, vol. 24, pp. 971–987, July (2002) 16. Tamura, H., Mori, S., and Yamawaki, T., Texture features corresponding to visual perception. IEEE Trans. On Systems, Man, and Cybernetics, 8(6) , (1978) 17. Manjunath, B., and Ma, W., Textures for browsing and retrieval of image data. IEEE Trans on Pattern Analysis and Machine Intelligence, 18(8):837-842, (1996) 18. Chang, C.C. and Lin, C.J., LIBSVM: a library for support vector machines, (2001) 19. Fekete, R.B., Casagrande, N., Kegl, B.,: MultiBoost: http://mloss.org/software/view/246/ 20. García-Cumbreras, M.A., Díaz-Galiano, M.C., Martín-Valdivia, M.T., Ureña López, L.A.: SINAI at ImageCLEFphoto 2008. In: On-line Working Notes, CLEF 2008 (2008).