=Paper=
{{Paper
|id=Vol-1173/CLEF2007wn-ImageCLEF-DeselaersEt2007b
|storemode=property
|title=FIRE in ImageCLEF 2007
|pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-DeselaersEt2007b.pdf
|volume=Vol-1173
|dblpUrl=https://dblp.org/rec/conf/clef/DeselaersGWN07
}}
==FIRE in ImageCLEF 2007==
FIRE in ImageCLEF 2007 Thomas Deselaers, Tobias Gass, Tobias Weyand, and Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University, Aachen, Germany deselaers@cs.rwth-aachen.de Abstract We present the methods we applied in the four different tasks of the ImageCLEF 2007 content-based image retrieval evaluation. We participated in all four tasks using a variety of methods. Global and local image descriptors are applied using nearest neighbour search for the medical and photo retrieval tasks and discriminative models for the object retrieval and the medical automatic annotation task. For the photo and medical retrieval task, we apply a maximum entropy training method to learn an optimal feature weighting from the queries and qrels from last year. This method works particularly well if the queries are very similar as they were in the medical retrieval task. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor- mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Managment]: Languages—Query Languages General Terms content-based image retrieval, image annotation Keywords bag-of-visual-words, maximum entropy 1 Introduction In this work we present our efforts in the four tasks of ImageCLEF 2007. For all of the experiments, the CBIR system FIRE1 developed in our group was used. In the following sections we present our efforts in the Medical Retrieval Task [10], the Photo- graphic Retrieval Task [8], and the Medical Automatic Image Annotation Task [10]. Our efforts for the Object Retrieval Task are not described here, but in the according overview paper [1]. 2 ImageCLEF 2007 Photographic Retrieval Task The ImageCLEF 2007 Photographic Retrieval Task is described in [8] and the database used is described in [9], here we describe the methods that we applied in the runs we submitted. 1 http://www-i6.informatik.rwth-aachen.de/∼deselaers/fire.html Table 1: Overview of the submissions to the photographic retrieval task. run id w/ text. inf. trained on MAP comment FIRE no no 0.1172 baseline run RWTH-FIRE-NT-emp no - 0.0834 RWTH-FIRE-NT-emp2 no - 0.0824 RWTH-FIRE-ME-NT-20000 no 2006 0.1122 RWTH-FIRE-ME-NT-1000 no 2006 0.1102 RWTH-FIRE-emp yes - 0.1969 RWTH-FIRE-emp2 yes - 0.1913 RWTH-FIRE-ME-500 yes 2006 0.1974 RWTH-FIRE-ME-1000 yes 2006 0.1904 RWTH-FIRE-ME-30000 yes 2006 0.1938 We submitted a total of nine runs to the photographic retrieval task, five using textual and visual information jointly and four runs using only visual information, furthermore, we provided a visual baseline run to all participants of ImageCLEF shortly after the queries were released. For these experiments we used the following image descriptors: • sparse patch histograms [2] • clustered patch histograms [4] • local & global colour descriptors from GIFT [14] • local & global colour descriptors from GIFT [14] • global texture features [16] • monomial invariant feature histograms [12] • relational invariant feature histograms [11] • Tamura texture histograms [15] • image thumbnails of 32x32 pixels • RGB colour histograms with 512 bins Furthermore, the textual information was available to the retriever in the same manner as described in [6] and also with a pure cosine-matching similarity measure. These features were extracted for all images and then the feature weights were trained according to [7]. As can be seen in Table 1, textual information greatly helps to achieve a much more precise retrieval result, which was to be expected. In the visual-only runs, maximum entropy training also clearly helps to improve the precision. Nevertheless, none of the tuned visual-only runs achieves the precision of our baseline runs, which is probably due to overfitting. 3 ImageCLEF 2007 Medical Retrieval Task The ImageCLEF 2007 Medical Retrieval Task is described in [10], here we describe the methods we applied. We submitted a total of ten runs to the medical retrieval task, five using textual and visual information jointly and five using only visual information. Three of the five runs use feature weights that were trained using the maximum entropy method [7] and the other two runs use an empirically determined set of parameters. The trained runs use the topic of 2005, 2006, and 2005 & 2006 jointly respectively to determine the optimal feature weighting. Table 2 gives an overview of our submissions to the ImageCLEF 2007 medical retrieval task. For all of these experiments the following image descriptors were used [3]: • image thumbnails of 32×32 pixels • image thumbnails of 16×16 pixels reduced to 16 colours (which is very similar to the MPEG colour layout descriptor[13]) • colour histograms in RGB space with 512 bins Table 2: Overview of the submissions to the medical retrieval task. run id w/ text. inf. trained on MAP FIRE-NT-emp no - 0.0284 FIRE-NT-emp2 no - 0.0280 FIRE-ME-nt-tr05 no 2005 0.1473 FIRE-ME-nt-tr06 no 2006 0.2227 FIRE-ME-nt-tr0506 no 2005&2006 0.2328 FIRE-emp yes - 0.2457 FIRE-emp2 yes - 0.2537 FIRE-ME-tr05 yes 2005 0.2922 FIRE-ME-tr06 yes 2006 0.3022 FIRE-ME-tr0506 yes 2005&2006 0.3044 Table 3: Results from the combined runs. weight for run id FIRE OHSU medGIFT easyIR MAP 3fire-7ohsu.clef 3 7 0 0 0.0344 3gift-3fire-4ohsu.clef 3 4 3 0 0.0334 5fire-5ohsu.clef 5 3 0 0 0.0327 7fire-3ohsu.clef 7 3 0 0 0.0325 4gift-4fire-2ohsu.clef 4 2 4 0 0.0322 5fire-5easyir.clef 5 0 0 5 0.0256 7fire-3easyir.clef 7 0 0 3 0.0251 3fire-7easyir.clef 3 0 0 7 0.0244 gift-fire-ohsu-easy.clef 1 1 1 1 0.0220 1gift-1fire-8ohsu.clef 1 8 1 0 0.0201 • global texture features [16] • monomial invariant feature histograms [12] • relational invariant feature histograms [11] • Tamura texture histograms [15] The textual information was included into the experiments as described in [6], we used one textual information retrieval system using only the English texts. These features were extracted for all images and then the feature weights were trained according to [7]. Again, it can be seen that the incorporation of textual increases the retrieval precision dra- matically. Maximum entropy training with the 2006 queries is generally better than with the 2005 queries, which is probably due to the greater similarity with the queries of this year. Combining both yields an even higher precision. 3.1 Combined runs with the medGIFT and the OHSU groups Furthermore, we combined our results with those from the medGIFT group from Geneva and with the OHSU group from Portland, OR. The combinations were done on a submission file basis. That is, the two groups sent us submissions files which they considered to be good runs and then a new score for an image was created by creating a weighted sum of the scores for that particular image from all runs that should be combined. Unfortunately, none of these runs outperforms any of the individual runs which might be due to the combination on the submission file level: if an image is not included in a submission it has a score of 0.0 for that particular run which might have negative influence on the combination. An overview of the results for the combined runs with the used weighting is given in Table 3. rank run tag score error rate [%] 6 RWTHi6-4RUN-MV3 30.93 13.2 8 RWTHi6-SH65536-SC025-ME 32.98 11.9 10 RWTHi6-SH65536-SC05-ME 33.21 12.3 11 RWTHi6-SH4096-SC025-ME 34.56 12.7 12 RWTHi6-SH4096-SC05-ME 34.70 12.4 13 RWTHi6-SH4096-SC025-AXISWISE 44.56 17.8 Table 4: Results from the medical automatic annotation run. 4 ImageCLEF 2007 Medical Image Annotation Task For the medical image annotation task, we applied the same method as last year which is based on the widely adopted assumption that objects in images can be represented as a set of loosely coupled parts. In contrast to former models [4, 5], this method can cope with an arbitrary number of object parts. Here, the object parts are modelled by image patches that are extracted at each position and then efficiently stored in a histogram. In addition to the patch appearance, the positions of the extracted patches are considered and provide a significant increase in the recognition performance. Using this method, we create sparse histograms of 65536 (216 = 84 ) bins, which can either be classified using the nearest neighbour rule and a suitable histogram comparison measure or a discriminative model can be trained for classification. Here, we used a support vector machine with a histogram intersection kernel and a discriminatively trained log-linear maximum entropy model. A detailed description of the method is given in [2]. We submitted six runs to the medical automatic annotation task [10]. Four of the runs use the method described above using slightly different parameters. The run RWTHi6-4RUN-MV is a combination of these runs, where the wild card character for a position (and all succeeding positions on the same axis) is set, if not at least three of the basis-runs agree about the position. The run RWTHi6-SH4096-SC025-AXISWISE is the same method as the other runs, but the code is predicted axis-wise. An overview of our runs together with their ranking in the official results is given in Table 4. From the results it can be seen that the last run, which tries to use the hierarchy in the first step cannot compete with the methods that use all data for classification at once. However, a slight accuracy improvement is possible if different well-performing runs are combined in a suitable way. 5 Conclusion From the results of the medical image retrieval task it can be seen that the maximum entropy method for finding feature weights in image retrieval works extremely well if sufficient training data is available and the queries to be processed are similar to those which occur in the training data. On the other hand, for the photographic retrieval task, the visual baseline run outperforms all tuned settings which is an indicator for overfitting to the training data of the trained runs. This can be due to the training data not being similar enough to this years topics. The results of the medical annotation task show that using the class hierarchy can lead to a slight accuracy improvement in a second stage but using it in the first stage could not lead to an improved classification performance. Acknowledgement This work was partially funded by the DFG (Deutsche Forschungsgemeinschaft) under contract NE-572/6. References [1] Thomas Deselaers, Allan Hanbury, and et al. Overview of the ImageCLEF 2007 object retrieval task. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, September 2007. [2] Thomas Deselaers, Andre Hegerath, Daniel Keysers, and Hermann Ney. Sparse patch- histograms for object classification in cluttered images. In DAGM 2006, Pattern Recognition, 26th DAGM Symposium, volume 4174 of Lecture Notes in Computer Science, pages 202–211, Berlin, Germany, September 2006. [3] Thomas Deselaers, Daniel Keysers, and Hermann Ney. Features for image retrieval – a quantitative comparison. In DAGM 2004, Pattern Recognition, 26th DAGM Symposium, number 3175 in Lecture Notes in Computer Science, pages 228–236, Tbingen, Germany, September 2004. [4] Thomas Deselaers, Daniel Keysers, and Hermann Ney. Discriminative training for object recognition using image patches. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 157–162, San Diego, CA, June 2005. [5] Thomas Deselaers, Daniel Keysers, and Hermann Ney. Improving a discriminative approach to object recognition using image patches. In DAGM 2005, Pattern Recognition, 26th DAGM Symposium, number 3663 in Lecture Notes in Computer Science, pages 326–333, Vienna, Austria, August 2005. [6] Thomas Deselaers, Tobias Weyand, Daniel Keysers, Wolfgang Macherey, and H. Ney. FIRE in ImageCLEF 2005: Combining content-based image retrieval with textual information re- trieval. In Workshop of the Cross–Language Evaluation Forum (CLEF 2005), volume 4022 of Lecture Notes in Computer Science, pages 652–661, Vienna, Austria, September 2005. [7] Thomas Deselaers, Tobias Weyand, and Hermann Ney. Image retrieval and annotation using maximum entropy. In Evaluation of Multilingual and Multi-modal Information Retrieval – Seventh Workshop of the Cross-Language Evaluation Forum, CLEF 2006, LNCS, page to appear, Alicante, Spain, September 2007. [8] Michael Grubinger, Paul Clough, Allan Hanbury, and Henning Müller. Overview of the ImageCLEF 2007 photographic retrieval task. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, September 2007. [9] Michael Grubinger, Paul Clough, Henning Mller, and Thomas Deselaers. The iapr benchmark: A new evaluation resource for visual information systems. In LREC 06 OntoImage 2006: Language Resources for Content-Based Image Retrieval, page in press, Genoa, Italy, May 2006. [10] Henning Müller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer, Thomas M. Deserno, Paul Clough, and William Hersh. Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, September 2007. [11] Marc Schael. Texture defect detection using invariant textural features. In DAGM 2001, Pattern Recogntion, 23rd DAGM Symposium, volume 2191 of Lecture Notes in Computer Science, pages 17–24, Munich, Germany, September 2001. Springer Verlag. [12] Sven Siggelkow, Marc Schael, and Hans Burkhardt. SIMBA — Search IMages By Appearance. In DAGM 2001, Pattern Recognition, 23rd DAGM Symposium, volume 2191 of Lecture Notes in Computer Science, pages 9–17, Munich, Germany, September 2001. Springer Verlag. [13] Thomas Sikora. The MPEG-7 visual standard for content description – an overview. IEEE Trans. on Circuits and Systems for Video Technology, 11(6):696–702, 2001. [14] David McG. Squire, Wolfgang Müller, Henning Müller, and Jilali Raki. Content-based query of image databases, inspirations from text retrieval: Inverted files, frequency-based weights and relevance feedback. In Scandinavian Conference on Image Analysis, pages 143–149, Kangerlussuaq, Greenland, June 1999. [15] Hideyuki Tamura, Shunji Mori, and Takashi Yamawaki. Textural features corresponding to visual perception. IEEE Transaction on Systems, Man, and Cybernetics, 8(6):460–472, June 1978. [16] Boris Terhorst. Texturanalyse zur globalen Bildinhaltsbeschreibung radiologischer Aufnah- men. Research project, RWTH Aachen, Institut für Medizinische Informatik, Aachen, Ger- many, June 2003.