=Paper=
{{Paper
|id=Vol-1179/CLEF2013wn-ImageCLEF-SecoDeHerreraEt2013b
|storemode=property
|title=Overview of the ImageCLEF 2013 Medical Tasks
|pdfUrl=https://ceur-ws.org/Vol-1179/CLEF2013wn-ImageCLEF-SecoDeHerreraEt2013b.pdf
|volume=Vol-1179
|dblpUrl=https://dblp.org/rec/conf/clef/HerreraKDAM13
}}
==Overview of the ImageCLEF 2013 Medical Tasks==
Overview of the ImageCLEF 2013 medical tasks Alba G. Seco de Herrera 1 , Jayashree Kalpathy–Cramer2 , Dina Demner Fushman3 , Sameer Antani3 , Henning Müller1,4 1 University of Applied Sciences Western Switzerland, Sierre, Switzerland 2 Harvard University, Cambridge, MA, USA 3 National Library of Medicine (NLM), USA 4 Medical Informatics, Univ. Hospitals and University of Geneva, Switzerland alba.garcia@hevs.ch Abstract. In 2013, the tenth edition of the medical task of the Image- CLEF benchmark was organized. For the first time, the ImageCLEFmed workshop takes place in the United States of America at the annual AMIA (American Medical Informatics Association) meeting even though the task was organized as in previous years in connection with the other ImageCLEF tasks. Like 2012, a subset of the open access collection of PubMed Central was distributed. This year, there were four subtasks: modality classification, compound figure separation, image–based and case–based retrieval. The compound figure separation task was included due to the large number of multipanel images available in the literature and the importance to separate them for targeted retrieval. More com- pound figures were also included in the modality classification task to make it correspond to the distribution in the full database. The retrieval tasks remained in the same format as in previous years but a larger number of tasks were available for image–based and case–based tasks. This paper presents an analysis of the techniques applied by the ten groups participating 2013 in ImageCLEFmed. Keywords: ImageCLEFmed, modality classification, compound figure separation, image–based retrieval, case–based retrieval 1 Introduction ImageCLEF1 [1] is the image retrieval track of the Cross Language Evaluation Forum (CLEF). ImageCLEFmed is part of ImageCLEF focusing on medical images [2–7]. In the 10th edition of the medical task, the workshop is for the first time organized outside of Europe at the annual AMIA2 (American Medical Informat- ics Association) meeting. The same format as in 2012 was followed and a new task was added, the compound figure separation. Characterisation of compound figures is often difficult, as they can contain features of various image types. Fo- cusing search on the sub figures can lead to better results. In 2013, the modality 1 http://www.imageclef.org/ 2 http://www.amia.org/amia2013/ classification task also included a larger amount of compound figures to make the task more realistic and correspond to the distribution in the database. The four tasks of 2013 are: – modality classification; – compound figure separation; – image–based retrieval; – case–based retrieval. The paper is organized as follows. Section 2 describes the ImageCLEFmed tasks in more detail as well as the participation in each of the tasks. Section 3 presents the main results of the tasks and compares results within the participating groups and the techniques employed. Section 4 concludes the paper. 2 Participation, Data Sets, Tasks, Ground Truth This section describes the four tasks organized in ImageCLEFmed 2013. The datasets and the ground truth provided for the evaluation campaign are ex- plained in detail. 2.1 Participation Like 2012, over sixty groups registered for the medical tasks and obtained access to the data sets. Ten of the registered groups submitted results to the medical tasks compared to 17 in 2012 with a total of 166 valid runs submitted, slightly fewer runs than in 2012. The smaller number of participants and submitted runs can be due to a change in the evaluation schedule of CLEF 2013 and may also be due to the fact that the event will be organized outside of Europe. 51 runs were submitted to the modality classification task, 4 runs to the compound figure separation task, 9 runs to the image retrieval task and 45 runs to the case-based retrieval task. As in previous years, the number of runs per group was limited to ten per subtask. The following groups submitted at least one run: – AAUITEC (Institute of Information Technology, Alpen–Adria University of Klagenfurt, Austria)*; – CITI (Center of Informatics and Information Technology, Portugal)*; – DEMIR (Dokuz Eylul University, Turkey); – FCSE (Faculty of Computer Sciences and Engineering, University of Ss Cyril and Methodius, Macedonia); – IBM Multimedia Analytics (United States); – IPL (Athens University of Economics and Business, Greece); – ITI (Image and Text Integration Project, NLM, United States); – medGIFT (University of Applied Sciences Western Switzerland, Switzer- land); – MiiLab (Medical Image Information Laboratory, Shanghai Advanced Re- search Institute,China)*; – SNUMedInfo (Medical Informatics Laboratory, Seul National University, Re- public of Korea)*; Participants marked with a star had not participated in the medical task in 2012. 2.2 Datasets In ImageCLEFmed 2013, the same database as in 2012 was supplied to the participants. The database contains over 300,000 images of 75,000 articles of the biomedical open access literature that allow free redistribution of the data. The ImageCLEFmed database is a subset of PubMed Central3 containing in total over 1.5 million images of over 600,000 articles. 2.3 Modality Classification The modality classification task was first introduced in 2010. The goal of this task is to classify the images into medical modalities and other images types, such as Computed Tomography, xray or general graphs. A modality hierarchy of 38 classes of which 31 appear in the data was used [8]. Using the modality information, the retrieval results could often be improved in the past by filtering our non–relevant image types [9]. The same hierarchy as in ImageCLEFmed 2012 was used (see Figure 1). In 2013 a larger number of compound figures than in ImageCLEFmed 2012 were provided in the training and test data sets. The current distribution corresponds to that in the PubMed Central data set, much closer to reality than in previous years. The class codes with descriptions are the following ([Class code] Description): – [COM P ] Compound or multipane images (1 category) – [Dxxx] Diagnostic images: • [DRxx] Radiology (7 categories): • [DRU S] Ultrasound • [DRM R] Magnetic Resonance • [DRCT ] Computerized Tomography • [DRXR] X–Ray, 2D Radiography • [DRAN ] Angiography • [DRP E] PET • [DRCO] Combined modalities in one image – [DV xx] Visible light photography (3 categories): • [DV DM ] Dermatology, skin • [DV EN ] Endoscopy • [DV OR] Other organs – [DSxx] Printed signals, waves (3 categories): • [DSEE] Electroencephalography • [DSEC] Electrocardiography • [DSEM ] Electromyography 3 http://www.ncbi.nlm.nih.gov/pmc/ Fig. 1. The image class hierarchy that was development for document images occurring in the biomedical open access literature – [DM xx] Microscopy (4 categories): • [DM LI] Light microscopy • [DM EL] Electron microscopy • [DM T R] Transmission microscopy • [DM F L] Fluorescence microscopy – [D3DR] 3D reconstructions (1 category) – [Gxxx] Generic biomedical illustrations (12 categories): • [GT AB] Tables and forms • [GP LI] Program listing • [GF IG] Statistical figures, graphs, charts • [GSCR] Screenshots • [GF LO] Flowcharts • [GSY S] System overviews • [GGEN ] Gene sequence • [GGEL] Chromatography, Gel • [GCHE] Chemical structure • [GM AT ] Mathematics, formulae • [GN CP ] Non–clinical photos • [GHDR] Hand–drawn sketches 2.4 Compound Figure Separation In the ImageCLEFmed 2012 data set [7] between 40% and 60% of the figures are compound or multipanel figures. Making the content of the compound figures accessible for targeted search can improve retrieval accuracy. For this reason the detection of compound figures and their separation into sub figures is considered an important task. Examples for compound figures can be seen in Figure 2. (a) Mixed modalities in a single figure (b) Graphs and microscopy images in a single figure Fig. 2. Examples of compound figures found in the biomedical literature The data set used in the ImageCLEF 2013 compound figure separation task are all figures of the data set from the biomedical litertature. 2,967 compound figures were selected from the complete data set after a manual classification of images into compound and other figures. This subset was randomly split into two parts: a training set containing 1,538 images and a testing set with 1,429 images. The ground truth for the dataset was generated in a semi–automatic way, using a two–step approach: first, an automated separation process (using the technique described in [10]) was run on both image sets in order to obtain a general overview of the subfigures. The automatic results were then manually corrected. Missing lines were added and incorrect lines removed, whereas often the lines were only slightly changed. Separating lines rather than bounding boxes were used to separate subfigures. The evaluation then required to have a mini- mum overlap between the ground truth and the data supplied by the groups in their runs. The terminology used in the evaluation is: – The term figure, F , refers to a compound figure as a whole. – A subfigure, fi , represents a part (or panel) of a figure. The ground truth for F the figure F consists of a set of KGT subfigures f1 , . . . , fKGT F . – The word candidate, cj , refers to the data being evaluated against the ground F truth. Separation of figure F consists of a set of KC candidates c1 , . . . , cKCF . A brief summary of the evaluation algorithm for a given figure F is as follows: 3/3 pts 1/3 pts 3/5 pts score=1.0 score=0.3 score=0.6 (a) Perfect score (b) Not enough candidates (c) Too many candidates Fig. 3. Examples for the separation of a compound figure. Dashed blue lines represent the ground truth, while solid lines represent the candidates. Valid candidates are shown in green and invalid candidates in red – The score SF is computed based on the number of correct candidates, F Ccorrect . – For each subfigure fi defined in the ground truth the best matching candidate subfigure will be determined. Only one candidate is used in case there are several matches. – The main metric used to compare subfigures is the overlap between a can- didate subfigure and the ground truth. To be considered a valid match the overlap between a candidate subfigure and a subfigure from the ground truth must correspond to at least 66% of the candidate’s size. If the best candidate F is an acceptable match, the number of correctly matched figures Ccorrect will be incremented. Since only one candidate subfigure can be assigned to each F F of the subfigures from the ground truth, Ccorrect ≤ KGT . – The maximum score for the figure is 1 and the normalisation factor used to compute the score will be the maximum between the number of subfigures F F in the ground truth KGT and the number of candidate subfigures KC . F Ccorrect SF = F , KF ) . max(KGT C Therefore the maximum score is obtained only when the number of candi- F F dates KC is equal to the number of subfigures in the ground truth KGT and all of them are correctly matched: F F F Ccorrect = KC = KGT . Figure 3 contains examples showing different candidates being validated against a reference figure (which contains 3 subfigures), along with their scores. 2.5 Image–Based Retrieval The image–based retrieval task is the classical medical retrieval task, similar to those organized each year since 2004 with the target unit being the image. In 2013, 35 queries were given to the participants so more than in previous years. The 22 queries used in 2012 [7] were part of the 35 queries that all contain text (in English, Spanish, French and German) with 1–7 sample images for each query. As in previous years, the queries were classified into textual, mixed and semantic, based on the methods that are expected to yield the best results. 2.6 Case–Based Retrieval The case–based retrieval task has been running since 2009. In this task, a case description, with patient demographics, limited symptoms and test results in- cluding imaging studies, is provided (but not the final diagnosis). As in previous years, the goal is to retrieve cases including images that might best suit the provided case description. This year the 26 topics distributed in 2012 were also part of the 35 final topics. Each of the topics was accompanied by one or two images. 3 Results This section describes the results of ImageCLEF 2013. Runs are ordered based on the tasks (modality classification, compound figure separation, image–based and case–based retrieval) and the techniques used (visual, textual, mixed). In 2013, several groups used the ImageCLEF 2012 [7] database to optimize the parameters [11–13]. 3.1 Modality Classification Results Table 1 shows the classification accuracy obtained by the various runs submitted in the modality classification task. In 2013, the IBM Multimedia Analytics and FCSE [12] obtained the best results in the the three types of runs (visual, textual, mixed).Best results were obtained using multimodal techniques (81.68%) follow by visual techniques (80.79%). The best run using textual methods alone ob- tained a lower accuracy (64.17%). Only ITI [14] explored hierarchical approaches among the hierarchy distributed and some groups investigated a separation be- tween compound and non–compound images before classifying the remaining categories [11, 15]. Techniques Used for Visual Classification The IBM team achived the best results in the visual classification.FCSE [12] was the second best group (77.14%) using a spatial pyramid in combination with dense sampling using an oppo- nentSIFT descriptor for each image patch. Finally, Support Vector Machines (SVM) with χ2 kernel were used as a classifier. As in 2012, multiple features were extracted from the images, most frequently color and edge directivity descrip- tors (CEDD) [11, 13, 14, 16, 17], fuzzy color and texture histogram (FCTH) [11, 13, 14, 16, 17] and scale–invariant feature transform (SIFT) variants [11, 12, 15]. Several classifiers were explored by the participants such as SVM [12, 14, 15, 17], k–nearest neighbour (k–nn) [11, 15] or class–centroid–based techniques [17]. Table 1. Results of the runs of the modality classification task Run Group Run Type Accuracy IBM modality run8 IBM Mixed 81.68 results mixed finki run3 FCSE Mixed 78.04 All CITI Mixed 72.92 IBM modality run9 IBM Mixed 69.82 medgift2013 mc mixed k8 medGIFT Mixed 69.63 medgift2013 mc mixed sem k8 medGIFT Mixed 69.63 nlm mixed using 2013 visual classification 2 ITI Mixed 69.28 nlm mixed using 2013 visual classification 1 ITI Mixed 68.74 nlm mixed hierarchy ITI Mixed 67.31 nlm mixed using 2012 visual classification ITI Mixed 67.07 DEMIR MC 5 DEMIR Mixed 64.60 DEMIR MC 3 DEMIR Mixed 64.48 DEMIR MC 6 DEMIR Mixed 64.09 DEMIR MC 4 DEMIR Mixed 63.67 medgift2013 mc mixed exp sep sem k21 medGIFT Mixed 62.27 IPL13 mod cl mixed r2 IPL Mixed 61.03 IBM modality run10 IBM Mixed 60.34 IPL13 mod cl mixed r3 IPL Mixed 58.98 medgift2013 mc mixed exp k21 medGIFT Mixed 47.83 medgift2013 mc mixed exp sem k21 medGIFT Mixed 47.83 All NoComb CITI Mixed 44.61 IPL13 mod cl mixed r1 IPL Mixed 09.56 IBM modality run1 IBM Textual 64.17 results text finki run2 FCSE Textual 63.71 DEMIR MC 1 DEMIR Textual 62.70 DEMIR MC 2 DEMIR Textual 62.70 words CITI Textual 62.35 medgift2013 mc text k8.csv medGIFT Textual 62.04 nlm textual only flat ITI Textual 51.23 IBM modality run2 IBM Textual 39.07 words noComb CITI Textual 32.80 IPL13 mod cl textual r1 IPL Textual 09.02 IBM modality run4 IBM Visual 80.79 IBM modality run5 IBM Visual 80.01 IBM modality run6 IBM Visual 79.82 IBM modality run7 IBM Visual 78.89 results visual finki run1 FCSE Visual 77.14 results visual compound finki run4 FCSE Visual 76.29 IBM modality run3 IBM Visual 75.94 sari modality baseline MiiLab Visual 66.46 sari modality CCTBB DRxxDict MiiLab Visual 65.60 medgift2013 mc 5f medGIFT Visual 63.78 nlm visual only hierarchy ITI Visual 61.50 medgift2013 mc 5f exp separate k21 medGIFT Visual 61.03 medgift2013 mc 5f separate medGIFT Visual 59.25 CEDD FCTH CITI Visual 57.62 IPL13 mod cl visual r2 IPL Visual 52.05 medgift2013 mc 5f exp k8 medGIFT Visual 45.42 IPL13 mod cl visual r3 IPL Visual 43.33 CEDD FCTH NoComb CITI Visual 32.49 IPL13 mod cl visual r1 IPL Visual 06.19 Techniques Used for Classification Based on Text In 2012, only the ITI team [18] submitted runs for the textual modality classification task. In 2013, seven groups submitted textual results. A variety of techniques was employed using systems as Terrier IR4 [12, 13], Lucene5 [11, 16] or Essie [14]. Techniques Used for Multimodal Classification Eight groups submitted multimodal runs, five more than in 2012. The groups fused the techniques de- scribed above for visual and textual classification with a variety of fusion tech- niques, leading to the best results overall with multimodal techniques. 3.2 Compound Figure Separation Results Three groups participated in the first year of the compound figure separation task (see Table 2). MedGIFT [11] achieved the best result in one of its runs but it simply serves as a point of reference, since it was also used when the separating lines were drawn [10] and thus has an advantage over other techniques. ITI [14] achived 69.27% using a combination of figure caption analysis, panel border de- tection and panel label recognition. FCSE [12] got 68.59% using an unsupervised algorithm based on a breadth–first search strategy using only visual information. Finally, medGIFT [11] submitted a second run which was not strictly designed for figure separation but provided a point of comparison. The run used a region detection algorithm mainly focused on volumetric medical image retrieval [19] with 46.82% of accuracy showing the possibility to use such techniques. Table 2. Results of the runs of the compound figure separation task Run Group Run Type Accuracy HESSO CFS medGIFT Visual 84.64 nlm multipanel separation ITI Mixed 69.27 fcse-final-noempty FCSE Visual 68.59 HESSO REGIONDETECTOR SCALE50 STANDARD medGIFT Visual 46.82 3.3 Image–Based Retrieval Results Nine groups submitted image–based runs in 2013. The best results in terms of mean average precision (MAP) were obtained by ITI [14] using multimodal methods. The same group also obtained best results in 2012. The best textual run achieved the same MAP than the best multimodal run (0.3196). As in pre- vious years, visual approaches achieved much lower results than the textual and multimodal techniques. Most of the techniques used in the retrieval task were also used for the modality classification task and are described in Section 3.1. 4 http://terrier.org/ 5 http://lucene.apache.org/ Visual Retrieval Eight groups submitted 28 visual runs (see Table 3). DEMIR [13] achieved the best position in terms of MAP applying a classification algorithm. In addition to the techniques used in the modality classification task, some par- ticipants split and rescaled the images [17, 16]. Borda–fuse methods were also used [20]. Table 3. Results of the visual runs for the medical image retrieval task Run Name Group MAP GM-MAP bpref P10 P30 DEMIR4 DEMIR 0.0185 0.0005 0.0361 0.0629 0.0581 medgift visual nofilter medGIFT 0.0133 0.0004 0.0256 0.0571 0.0448 medgift visual close medGIFT 0.0132 0.0004 0.0256 0.0543 0.0438 medgift visual prefix medGIFT 0.0129 0.0004 0.0253 0.0600 0.0467 IPL13 visual r6 IPL 0.0119 0.0003 0.0229 0.0371 0.0286 image latefusion merge ITI 0.0110 0.0003 0.0207 0.0257 0.0314 DEMIR5 DEMIR 0.0110 0.0004 0.0257 0.0400 0.0448 image latefusion merge filter ITI 0.0101 0.0003 0.0244 0.0343 0.0324 latefusuon accuracy merge ITI 0.0092 0.0003 0.0179 0.0314 0.0286 IPL13 visual r3 IPL 0.0087 0.0003 0.0173 0.0286 0.0257 sari SURFContext HI baseline MiiLab 0.0086 0.0003 0.0181 0.0429 0.0352 IPL13 visual r8 IPL 0.0086 0.0003 0.0173 0.0286 0.0257 IPL13 visual r5 IPL 0.0085 0.0003 0.0178 0.0314 0.0257 IPL13 visual r1 IPL 0.0083 0.0002 0.0176 0.0314 0.0257 IPL13 visual r4 IPL 0.0081 0.0002 0.0182 0.0400 0.0305 IPL13 visual r7 IPL 0.0079 0.0003 0.0175 0.0257 0.0267 FCT SEGHIST 6x6 LBP CITI 0.0072 0.0001 0.0151 0.0343 0.0267 IPL13 visual r2 IPL 0.0071 0.0001 0.0162 0.0257 0.0257 IBM image run min min IBM 0.0062 0.0002 0.0160 0.0286 0.0267 DEMIR2 DEMIR 0.0044 0.0002 0.0152 0.0229 0.0229 SNUMedinfo13 SNUMedInfo 0.0043 0.0002 0.0126 0.0229 0.0181 SNUMedinfo12 SNUMedInfo 0.0033 0.0001 0.0153 0.0257 0.0219 IBM image run Mnozero17 IBM 0.0030 0.0001 0.0089 0.0200 0.0105 SNUMedinfo14 SNUMedInfo 0.0023 0.0002 0.0090 0.0171 0.0124 SNUMedinfo15 SNUMedInfo 0.0019 0.0002 0.0074 0.0086 0.0114 IBM image run Mavg7 IBM 0.0015 0.0001 0.0082 0.0171 0.0114 IBM image run Mnozero11 IBM 0.0008 0 0.0045 0.0057 0.0095 nlm-se-image-based-visual ITI 0.0002 0 0.0021 0.0029 0.0010 Textual Retrieval As for visual retrieval, eight groups submitted runs in the textual retrieval task (see Table 4). ITI [14] achieves the best results with a combination of two queries using Essie. The participants explored a variety of retrieval techniques mostly described in Section 3.1. FCSE [12] proposed a concept–scape approach matching the text data to medical concepts. Multimodal Retrieval Only three groups submitted runs in the multimodal task (see Table 5). As in 2012, ITI [14] submitted the run with the highest MAP. For this run the group used the same method as the best textual run achieving exactly the same results. Mixed approaches combined the above textual and visual approaches using early [11, 14, 17] and late [11, 13, 14, 16] fusion strategies. Table 4. Results of the textual runs for the medical image retrieval task Run Name Group MAP GM-MAP bpref P10 P30 nlm-se-image-based-textual ITI 0.3196 0.1018 0.2982 0.3886 0.2686 IPL13 textual r6 IPL 0.2542 0.0422 0.2479 0.3314 0.2333 BM25b1.1 FCSE 0.2507 0.0443 0.2497 0.3200 0.2238 finki FCSE 0.2479 0.0515 0.2336 0.3057 0.2181 medgift text close medGIFT 0.2478 0.0587 0.2513 0.3114 0.2410 finki FCSE 0.2464 0.0508 0.2338 0.3114 0.2200 BM25b1.1 FCSE 0.2435 0.0430 0.2424 0.3314 0.2248 BM25b1.1 FCSE 0.2435 0.0430 0.2424 0.3314 0.2248 IPL13 textual r4 IPL 0.2400 0.0607 0.2373 0.2857 0.2143 IPL13 textual r1 IPL 0.2355 0.0583 0.2307 0.2771 0.2095 IPL13 textual r8 IPL 0.2355 0.0579 0.2358 0.2800 0.2171 IPL13 textual r8b IPL 0.2355 0.0579 0.2358 0.2800 0.2171 IPL13 textual r3 IPL 0.2354 0.0604 0.2294 0.2771 0.2124 IPL13 textual r2 IPL 0.2350 0.0583 0.229 0.2771 0.2105 FCT SOLR BM25L MSH CITI 0.2305 0.0482 0.2316 0.2971 0.2181 medgift text nofilter medGIFT 0.2281 0.0530 0.2269 0.2857 0.2133 IPL13 textual r5 IPL 0.2266 0.0431 0.2285 0.2743 0.2086 medgift text prefix medGIFT 0.2226 0.0470 0.2235 0.2943 0.2305 FCT SOLR BM25L CITI 0.2200 0.0476 0.2280 0.2657 0.2114 DEMIR9 DEMIR 0.2003 0.0352 0.2158 0.2943 0.1952 DEMIR1 DEMIR 0.1951 0.0289 0.2036 0.2714 0.1895 DEMIR6 DEMIR 0.1951 0.0289 0.2036 0.2714 0.1895 SNUMedinfo11 SNUMedInfo 0.1800 0.0266 0.1866 0.2657 0.1895 DEMIR8 DEMIR 0.1578 0.0267 0.1712 0.2714 0.1733 finki FCSE 0.1456 0.0244 0.1480 0.2000 0.1286 IBM image run 1 IBM 0.0848 0.0072 0.0876 0.1514 0.1038 Table 5. Results of the multimodal runs for the medical image retrieval task Run Name Group MAP GM-MAP bpref P10 P30 nlm-se-image-based-mixed ITI 0.3196 0.1018 0.2983 0.3886 0.2686 Txt Img Wighted Merge ITI 0.3124 0.0971 0.3014 0.3886 0.2790 Merge RankToScore weighted ITI 0.3120 0.1001 0.2950 0.3771 0.2686 Txt Img Wighted Merge ITI 0.3086 0.0942 0.2938 0.3857 0.2590 Merge RankToScore weighted ITI 0.3032 0.0989 0.2872 0.3943 0.2705 medgift mixed rerank close medGIFT 0.2465 0.0567 0.2497 0.3229 0.2524 medgift mixed rerank nofilter medGIFT 0.2375 0.0539 0.2307 0.2886 0.2238 medgift mixed weighted nofilter medGIFT 0.2309 0.0567 0.2197 0.2800 0.2181 medgift mixed rerank prefix medGIFT 0.2271 0.0470 0.2289 0.2886 0.2362 DEMIR3 DEMIR 0.2168 0.0345 0.2255 0.3143 0.1914 DEMIR10 DEMIR 0.1583 0.0292 0.1775 0.2771 0.1867 DEMIR7 DEMIR 0.0225 0.0003 0.0355 0.0543 0.0543 3.4 Case–Based Retrieval Results In 2013, the case–based retrieval task became more popular with seven groups submitting 42 runs. More groups than in previous years used visual and multi- modal techniques. Textual runs achived the best results and visual runs obtained lower results than the textual and multimodal runs. Visual Retrieval The results using visual retrieval on the case–based task are shown in Table 6. CITI [16] achived the best result outperforming the second best result by a factor of ten in terms of MAP. This group extracted a set of descriptors for 6 × 6 image grid. Table 6. Results of the visual runs for the medical case–based retrieval task Run Name Group MAP GM-MAP bpref P10 P30 FCT SEGHIST 6x6 LBP CITI 0.0281 0.0009 0.0335 0.0429 0.0238 medgift visual nofilter casebased medGIFT 0.0029 0.0001 0.0035 0.0086 0.0067 medgift visual close casebased medGIFT 0.0029 0.0001 0.0036 0.0086 0.0076 medgift visual prefix casebased medGIFT 0.0029 0.0001 0.0036 0.0086 0.0067 nlm-se-case-based-visual ITI 0.0008 0.0001 0.0044 0.0057 0.0057 Textual Retrieval Table 7 shows that SNUMedInfo [20] team achieved the best MAP (0.2429) in its first participation. SNUMedInfo used an external cor- pus (MEDLINE6 ) for robust and effective expansion term inference. CITI [16] achieved close results using MeSH expansion. ITI [14] and FCSE [12] incorporate UMLS (Unified Medical Language System) concepts. In general, the groups used the same techniques or very similar techniques compared to the ad–hoc image retrieval task. Multimodal Retrieval Three groups submitted multimodal runs, combining of visual and textual techniques. As in the visual case–based task, the CITI [16] team achieved the best results in terms of MAP (see Table 8). A rank–based fusion was applied in their approach improving existing algorithms by a small margin. 4 Conclusions After one decade of running the ImageCLEF medical task, in 2013 Image- CLEFmed is organized at the annual AMIA meeting in the form of a workshop. The task had 10 groups submitting 166 valid runs to the four subtasks. The main novelty in 2013 was the inclusion of a new task, the compound figure separation 6 http://www.nlm.nih.gov/bsd/pmresources.html Table 7. Results of the textual runs for the medical case–based retrieval task Run Name Group MAP GM-MAP bpref P10 P30 SNUMedinfo9 SNUMedInfo 0.2429 0.1163 0.2417 0.2657 0.1981 SNUMedinfo8 SNUMedInfo 0.2389 0.1279 0.2323 0.2686 0.1933 SNUMedinfo5 SNUMedInfo 0.2388 0.1266 0.2259 0.2543 0.1857 SNUMedinfo6 SNUMedInfo 0.2374 0.1112 0.2304 0.2486 0.1933 FCT LUCENE BM25L MSH PRF CITI 0.2233 0.1177 0.2044 0.2600 0.1800 SNUMedinfo4 SNUMedInfo 0.2228 0.1281 0.2175 0.2343 0.1743 SNUMedinfo1 SNUMedInfo 0.2210 0.1208 0.1952 0.2343 0.1619 SNUMedinfo2 SNUMedInfo 0.2197 0.0996 0.1861 0.2257 0.1486 SNUMedinfo7 SNUMedInfo 0.2172 0.1266 0.2116 0.2486 0.1771 FCT LUCENE BM25L PRF CITI 0.1992 0.0964 0.1874 0.2343 0.1781 SNUMedinfo10 SNUMedInfo 0.1827 0.1146 0.1749 0.2143 0.1581 HES-SO-VS FULLTEXT LUCENE medGIFT 0.1791 0.1107 0.1630 0.2143 0.1581 SNUMedinfo3 SNUMedInfo 0.1751 0.0606 0.1572 0.2114 0.1286 ITEC FULLTEXT AAUITEC 0.1689 0.0734 0.1731 0.2229 0.1552 ITEC FULLPLUS AAUITEC 0.1688 0.0740 0.1720 0.2171 0.1552 ITEC FULLPLUSMESH AAUITEC 0.1663 0.0747 0.1634 0.22 0.1667 ITEC MESHEXPAND AAUITEC 0.1581 0.0710 0.1635 0.2229 0.1686 IBM run 1 IBM 0.1573 0.0296 0.1596 0.1571 0.1057 IBM run 3 IBM 0.1573 0.0371 0.1390 0.1943 0.1276 IBM run 3 IBM 0.1482 0.0254 0.1469 0.2000 0.1410 IBM run 2 IBM 0.1476 0.0308 0.1363 0.2086 0.1295 IBM run 1 IBM 0.1403 0.0216 0.1380 0.1829 0.1238 IBM run 2 IBM 0.1306 0.0153 0.1340 0.2000 0.1276 nlm-se-case-based-textual ITI 0.0885 0.0303 0.0926 0.1457 0.0962 DirichletLM mu2500.0 Bo1bfree d 3 t 10 FCSE 0.0632 0.0130 0.0648 0.0857 0.0676 DirichletLM mu2500.0 Bo1bfree d 3 t 10 FCSE 0.0632 0.0130 0.0648 0.0857 0.0676 finki FCSE 0.0448 0.0115 0.0478 0.0714 0.0629 finki FCSE 0.0448 0.0115 0.0478 0.0714 0.0629 DirichletLM mu2500.0 FCSE 0.0438 0.0112 0.056 0.0829 0.0581 DirichletLM mu2500.0 FCSE 0.0438 0.0112 0.056 0.0829 0.0581 finki FCSE 0.0376 0.0105 0.0504 0.0771 0.0562 BM25b25.0 FCSE 0.0049 0.0005 0.0076 0.0143 0.0105 BM25b25.0 Bo1bfree d 3 t 10 FCSE 0.0048 0.0005 0.0071 0.0143 0.0105 Table 8. Results of the multimodal runs for the medical case retrieval task Run Name Group MAP GM-MAP bpref P10 P30 FCT CB MM rComb CITI 0.1608 0.0779 0.1426 0.1800 0.1257 medgift mixed nofilter casebased medGIFT 0.1467 0.0883 0.1318 0.1971 0.1457 nlm-se-case-based-mixed ITI 0.0886 0.0303 0.0926 0.1457 0.0962 FCT CB MM MNZ CITI 0.0794 0.0035 0.0850 0.1371 0.0810 task. In its first year three groups joined this complex task. More compound figures were included into the modality classification, so the training and test set are more difficult and correspond to the reality of the database, now. The other two tasks, image and case based retrieval, remained in the same format as in previous years but had a larger number of retrieval topics. As in previous years, visual, textual or multimodal techniques can all perform best depending on the situation. For the modality classification, a mixed run achieved the best accuracy. For the image–based retrieval task, the highest MAP was achieved by a multimodal run. In the case–based retrieval task, textual techniques obtained the best results. Finally, for the compound figure separation task only visual and mixed techniques were explored, with visual techniques leading to best results. In 2013, many groups used ImageCLEFmed 2012 database to optimize the parameters. Many of the techniques used had already been employed in previous years. This shows the utility of past campaigns, which provide databases as well as information regarding tools used by other participants. ImageCLEF conducts participative research and experimentation among free and reusable collections and has shown an important impact in visual medical information retrieval. 5 Acknowledgements We would like to thank the EU FP7 projects Khresmoi (257528) and PROMISE (258191) for their support. References 1. Caputo, B., Müller, H., Thomee, B., Villegas, M., Paredes, R., Zellhofer, D., Goeau, H., Joly, A., Bonnet, P., Martinez Gomez, J., Garcia Varea, I., Cazorla, C.: Im- ageclef 2013: the vision, the data and the open challenges. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013) 2. Hersh, W., Müller, H., Kalpathy-Cramer, J., Kim, E., Zhou, X.: The consolidated ImageCLEFmed medical image retrieval task test collection. Journal of Digital Imaging 22(6) (2009) 648–655 3. Müller, H., Clough, P., Deselaers, T., Caputo, B., eds.: ImageCLEF – Experi- mental Evaluation in Visual Information Retrieval. Volume 32 of The Springer International Series On Information Retrieval. Springer, Berlin Heidelberg (2010) 4. Müller, H., Kalpathy-Cramer, J., Jr., C.E.K., Hatt, W., Bedrick, S., Hersh, W.: Overview of the ImageCLEFmed 2008 medical image retrieval task. In Peters, C., Giampiccolo, D., Ferro, N., Petras, V., Gonzalo, J., Peñas, A., Deselaers, T., Mandl, T., Jones, G., Kurimo, M., eds.: Evaluating Systems for Multilingual and Multimodal Information Access – 9th Workshop of the Cross-Language Evaluation Forum. Volume 5706 of Lecture Notes in Computer Science (LNCS)., Aarhus, Denmark (September 2009) 500–510 5. Müller, H., Kalpathy-Cramer, J., Eggel, I., Bedrick, S., Radhouani, S., Bakke, B., Kahn, J.C.E., Hersh, W.: Overview of the clef 2009 medical image retrieval track. In: Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments. CLEF’09, Berlin, Heidelberg, Springer– Verlag (2010) 72–84 6. Kalpathy-Cramer, J., Müller, H., Bedrick, S., Eggel, I., Garcı́a Seco de Herrera, A., Tsikrika, T.: The CLEF 2011 medical image retrieval and classification tasks. In: Working Notes of CLEF 2011 (Cross Language Evaluation Forum). (September 2011) 7. Müller, H., Garcı́a Seco de Herrera, A., Kalpathy-Cramer, J., Demner Fushman, D., Antani, S., Eggel, I.: Overview of the ImageCLEF 2012 medical image retrieval and classication tasks. In: Working Notes of CLEF 2012 (Cross Language Evaluation Forum). (September 2012) 8. Müller, H., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S.: Creating a classification of image types in the medical literature for visual categorization. In: SPIE medical imaging. (2012) 9. Tirilly, P., Lu, K., Mu, X., Zhao, T., Cao, Y.: On modality classification and its use in text–based image retrieval in medical databases. In: Proceedings of the 9th International Workshop on Content–Based Multimedia Indenxing. CBMi2011 (2011) 10. Chhatkuli, A., Markonis, D., Foncubierta-Rodrı́guez, A., Meriaudeau, F., Müller, H.: Separating compound figures in journal articles to allow for subfigure classifi- cation. In: SPIE, Medical Imaging. (2013) 11. Garcı́a Seco de Herrera, A., Markonis, D., Schaer, R., Eggel, I., Müller, H.: The medGIFT group in ImageCLEFmed 2013. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013) 12. Kitanovski, I., Dimitrovski, I., Loskovska, S.: FCSE at medical tasks of ImageCLEF 2013. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013) 13. Ozturkmenoglu, O., Ceylan, N.M., Alpkocak, A.: DEMIR at ImageCLEFmed 2013: The effects of modality classification to information retrieval. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013) 14. Simpson, M.S., You, D., Rahman, M.M., Demner-Fushman, D., Antani, S., Thoma, G.: ITI’s participation in the 2013 medical track of ImageCLEF. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013) 15. Zhou, X., Han, M., Song, Y., Li, Q.: Fast filtering techniques in medical image classification and retrieval. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013) 16. Mourão, A., Martins, F., Magalhães, J.a.: NovaSearch on medical ImageCLEF 2013. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013) 17. Stathopoulos, S., Lourentzou, I., Kyriakopoulou, A., Kalamboukis, T.: IPL at CLEF 2013 medical retrieval task. In: Working Notes of CLEF 2013 (Cross Lan- guage Evaluation Forum). (September 2013) 18. Simpson, M.S., You, D., Rahman, M.M., Demmer-Fushman, D., Antani, S., Thoma, G.: ITI’s participation in the ImageCLEF 2012 medical retrieval and classification tasks. In: Working Notes of CLEF 2012. (2012) 19. Foncubierta-Rodrı́guez, A., Müller, H., Depeursinge, A.: Region–based volumet- ric medical image retrieval. In: SPIE Medical Imaging: Advanced PACS–based Imaging Informatics and Therapeutic Applications. (2013) 20. Sungbin, C., Lee, J., Cho, J.: SNUMedinfo at ImageCLEF 2013: Medical re- trieval task. In: Working Notes of CLEF 2013 (Cross Language Evaluation Forum). (September 2013)