=Paper=
{{Paper
|id=Vol-1177/CLEF2011wn-ImageCLEF-AlpkocakEt2011
|storemode=property
|title=DEMIR at ImageCLEFMed 2011: Evaluation of Fusion Techniques for Multimodal Content-based Medical Image Retrieval
|pdfUrl=https://ceur-ws.org/Vol-1177/CLEF2011wn-ImageCLEF-AlpkocakEt2011.pdf
|volume=Vol-1177
}}
==DEMIR at ImageCLEFMed 2011: Evaluation of Fusion Techniques for Multimodal Content-based Medical Image Retrieval==
DEMIR at ImageCLEFMed 2011: Evaluation of Fusion
Techniques for Multimodal Content-based Medical
Image Retrieval
Adil Alpkocak, Okan Ozturkmenoglu, Tolga Berber,
Ali Hosseinzadeh Vahid and Roghaiyeh Gachpaz Hamed
Dokuz Eylul University
Department of Computer Engineering
DEMIR Dokuz Eylul Multimedia Information Retrieval Research Group
Tinaztepe, 35160 Izmir, Turkey
alpkocak@cs.deu.edu.tr, okan.ozturkmenoglu@deu.edu.tr, tberber@cs.deu.edu.tr,
{ali_h_vahid, ramisa_84}@yahoo.com
Abstract. This paper present the details of participation of DEMIR (Dokuz
Eylul University Multimedia Information Retrieval) research team to the
context of our participation to the ImageCLEF 2011 Medical Retrieval task.
This year, we evaluated fusion and re-ranking method which is based on the
best low level feature of images with best text retrieval result. We improved
results by examination of different weighting models for retrieved text data and
low level features. We tested multi–modality image retrieval in ImageCLEF
2011 medical retrieval task and obtained the best seven ranks in mixed
retrieval, which includes textual and visual modalities. The results clearly show
that proper fusion of different modalities improve the overall retrieval
performance.
Keywords: Information Retrieval, Weighting-schemes, Re-ranking, Medical
Imaging, Content-based Image Retrieval, Medical Image Retrieval.
1 Introduction
In this paper we present the experiments performed by Dokuz Eylul University
Multimedia Information Retrieval (DEMIR) Group, Turkey, in the context of our
participation to the ImageCLEF 2011 Medical Image retrieval task [1]. The main
focus of this work is to improve results by evaluation of different weighting models in
text retrieval and then choose the best low-level feature of images for fusion with text
only results. During the combination of text and low-level features, we check the
variation of methods to gain the best result. On the other hand, we performed the
experiments for narrowing down the data collection by defining and filtering out of
irrelevant documents. Also we checked the weighted querying system performance in
retrieval systems by weighting the special words in queries.
2 Adil Alpkocak, Okan Ozturkmenoglu, Tolga Berber,
Ali Hosseinzadeh Vahid and Roghaiyeh Gachpaz Hamed
Fig. 1. Basic block diagram of retrieval system.
After analyze the visual and textual features set we used (Section 2), we describe
the multimodal fusion techniques for multimodal information (Section 3). After we
present experiments on ImageCLEF 2010 Medical and Wikipedia Retrieval tracks
data (Section 4), then Section 5 concludes the paper by pointing out the open issues
and possible avenues of further research in the area of multimodal re-ranking and
fusion techniques for content-based image retrieval.
2 The Feature Set
The data collection of ImageCLEF 2011 Medical retrieval has textual and visual
information. Participants will be given a set of 30 textual queries with 2-3 sample
images for each query. The queries will be classified into textual, visual and mixed,
based on the methods that are expected to yield the best results.[1]
We performed our experiments using ImageCLEF 2010 Medical and Wikipedia
Retrieval track’s text and image data. We check the variation of retrieval methods on
textual and visual information to gain the best result.
2.1 Textual Features
Since the choice of the weighting model may crucially affect the performance of any
information retrieval system, first of all we decided to work on evaluating the relative
merits and drawbacks of different weighting models using Terrier IR Platform [2],
open source search engine written in Java and is developed at the School of
Computing Science, University of Glasgow.
We performed our experiments on textual features using ImageCLEF 2010
Medical track collection. We started from a traditional bag-of-words representation of
pre-processed texts that pre-processing includes stemming (Porter stemmer [3] for
English) and stop words removal. DFR- BM25 model’s MAP score is not the best
one, but the all weighting model’s number of relevant retrieved score results are close
to each other and considering achievements of this model [11], we submitted our
DEMIR at ImageCLEFMed 2011: Evaluation of Fusion Techniques for Multimodal
Content-based Medical Image Retrieval 3
textual base point run using this model on ImageCLEF 2011 Medical retrieval task
data collection as RUN_1 .
Fig. 2. MAP scores of weighting models for textual features
Fig. 3. Number of relevant retrieved document in different weighting models for textual
features
2.2 Visual Features We Used
Selection of low-level features is one of the major aspects of a typical content-based
information retrieval (CBIR) system. We call these low-level features because most
4 Adil Alpkocak, Okan Ozturkmenoglu, Tolga Berber,
Ali Hosseinzadeh Vahid and Roghaiyeh Gachpaz Hamed
of them are extracted directly from digital representations of objects in the database
and have little or nothing to do with human perception. Thanks to Img(Rummager)
application [4], is developed in the Automatic Control Systems & Robotics
Laboratory at the Democritus University of Thrace-Greece, and we extracted features
explained below for all images in ImageCLEF 2011 test collection and query
examples:
EHD: This Edge Histogram Descriptor proposed for MPEG-7 expresses only the
local edge distribution in the image and is designed to contain only 80 bins for this
purpose. The EHD basically represents the distribution of 5 types of edges in each
local area called a sub-image that is defined by dividing the image space into 4x4
non-overlapping blocks. Thus, the image partition always yields 16 equal-sized
sub-images regardless of the size of the original image. Edges in the sub-images
are categorized into 5 types: vertical, horizontal, 45-degree diagonal, 135-degree
diagonal and non-directional edges. Thus, the histogram for each sub-image
represents the relative frequency of occurrence of the 5 types of edges in the
corresponding sub-image and contains 5 bins [7].
CEDD: This feature combines EHD with color histogram information and named
“Color and Edge Directivity Descriptor”. CEDD size is limited to 54 bytes per
image, rendering this descriptor suitable for use in large image databases.
Important attribute of the CEDD is the low computational power needed for its
extraction, in comparison to the needs of the most MPEG-7 descriptors [4].
FCTH: This feature fuzzy version of CEDD feature which contains fuzzy set of
color and texture histogram and named “Fuzzy Color and Texture Histogram”.
This feature contains results from the combination of 3 fuzzy systems including
histogram, color and texture information. FCTH size is limited to 72 bytes per
image, and also suitable for use in large image databases [5].
BTDH: This feature is very similar to FCTH feature. The main difference from
FCTH feature is using brightness instead of color histogram. This feature is
originally developed for radiology images which do not contain color data [6].
After extracting features, we gain an n-dimensional feature space per feature. For
query processing, we had to map all of the objects in the database and the query onto
this space and then evaluate the similarity difference between the vector
corresponding to the query and the vectors representing the data. We selected the
Euclidean distance, one of commonly used similarity and distance functions for
measuring distances between points in the 3D space, as distance/similarity function
and based on obtained similarity scores; we found that CEDD and FCTH are the best
descriptors for image retrieval based on low level features only. Therefore we
submitted our visual only base point run for CEDD feature. Moreover we use these
features for multimodal fusion in next experiments.
DEMIR at ImageCLEFMed 2011: Evaluation of Fusion Techniques for Multimodal
Content-based Medical Image Retrieval 5
Fig. 4. Comparison of low level feature performance on ImageCLEF 2010 Wikipedia Retrieval
task.
3 Fusion Techniques in Multimodal Information Retrieval
Multimedia fusion is referred to as integration of multiple media, their associated
features, or the intermediate decisions in order to perform an analysis task, has gained
much attention of many researchers in recent times. The fusion of multiple modalities
can provide complementary information and increase the accuracy of the overall
decision making process [8].
The fusion of different modalities is generally performed at two levels: feature
level or early fusion and decision level or late fusion. Some researchers have also
followed a hybrid approach by performing fusion at the feature as well as the decision
level. In the feature level or early fusion approach, the features, some distinguishable
properties of a media stream, extracted from input data are first combined and then
sent as input to a single analysis unit that performs the analysis task. In the decision
level or late fusion approach, the analysis units first provide the local decisions D1 to
Dn that are obtained based on individual features F1 to Fn. Then a decision fusion unit
combines local decisions to make a fused decision vector that is analyzed further to
obtain a final decision D about the task or the hypothesis. To achievement the
advantages of both the feature level and the decision level fusion strategies, several
researchers have opted to use a hybrid fusion strategy, which is a combination of both
feature and decision level strategies.
The decision level fusion strategy has many advantages over feature fusion. For
instance, the decisions (at the semantic level) usually have the same representation.
Therefore, the fusion of decisions becomes easier. Moreover, the decision level fusion
strategy offers scalability (i.e. graceful upgrading or degradation) in terms of the
6 Adil Alpkocak, Okan Ozturkmenoglu, Tolga Berber,
Ali Hosseinzadeh Vahid and Roghaiyeh Gachpaz Hamed
modalities used in the fusion process, which is difficult to achieve in the feature level
fusion. Another advantage of late fusion strategy is that it allows us to use the most
suitable methods for analyzing each single modality and this provides much more
flexibility than the early fusion.
Because of these profits, we exerted Linear Weighted Fusion, one of the simplest
and most widely used methods on our extracted CEDD and FCTH similarity scores
and similarity scores that gained from text retrieval as explained in previous chapters.
We applied Fagin’s Combination Algorithms [9] for Ranked Input Sets putting on
two score aggregation function defined as “Average” and “Weighted Average”. The
average function is applied by taking mean of individual similarity scores of any
object.
On the other hand, the weighted average function is applied in the same manner
but differing on multiplying each individual similarity with a weight value. The
weight assignment to individual scores provides an importance level for each feature
defined in a whole query [10]. After comparison of several studies we decided to
multiply textual feature by 3 and CEDD feature by 2 to gain the best fusion result
based on weighted average combination method.
Before fusion operation takes place, normalization should be applied to get
accurate and correct results since different modalities results a different ranges of
similarity values [12]. Here, we applied Min-Max normalization on similarity values
to ensure that the range of these features is between 0 and 1. The following equations
will ensure the range of this feature from 0 to 1.
Suppose the range for a feature is from to . Then the normalized
feature is defined as follows:
(1)
Min-Max normalization is a process of taking data measured in its units and
transforming it to a value between 0.0 and 1.0. The lowest (min) value is set to 0.0
and the highest (max) value is set to 1.0. This provides an easy way to compare values
that are measured using different scales (i.e., textual, shape, visual, density etc.) or
different units of measure (i.e., Euclidean or non-metric space values). After
normalization of the similarity values, we combined the different modalities in ranked
results.
4 Experimentations
We submitted 10 runs to ImageCLEF Medical Retrieval task, in three different
categories. The first category includes the runs for baseline retrieval in single
modality, numbered as 1 and 3 are baseline retrievals for textual-only and visual-only
retrieval, respectively. The second groups of runs to evaluate re-ranking affects to
base line, numbered as 8 is re-indexed the baseline retrieval result and re-ranked in
textual modality. The last group includes mixed retrieval experiments with fusion of
different modalities, numbered as 2, 4, 5, 6, 7, 9 and 10. As illustrated in Table 1, it is
obvious that results of mixed runs are better than textual or visual only. Moreover
DEMIR at ImageCLEFMed 2011: Evaluation of Fusion Techniques for Multimodal
Content-based Medical Image Retrieval 7
results of weighted average combination method are better than normal average
method in all approach. Below, we provide a short description of each runs, shortly.
RUN_1: This run is our baseline retrieval result for textual modality. In this run,
we removed the stop words, applied Porter stemmer algorithm and used the
DFR_BM25 weighting model on text retrieval engine system, Terrier. Let the
subscript indicates the arbitrary run ID, the similarity of first run, S1, is defined as
follows:
(2)
RUN_3: Our baseline retrieval result is this run for visual modality. We used the
CEDD feature in visual modality because its performance is better than other
features, also you can see in Figure 4.
(3)
RUN_8: This run of our group on textual feature is based on our proposed a two-
level re-ranking approach in for move relevant documents upward. Re-ranking is a
method to reorder the initially retrieved documents with the aim to increase
precision. Basically, relevant documents with low similarity scores are re-weighted
and reordered. In this run, we propose a new re-ranking approach which includes
the narrowing-down phase of search space. Result sets of each query and
corresponding base similarity scores are inputs for re-ranking operation. Firstly, we
selected relevant documents using initial similarity scores. In other word, we
filtered out non-relevant documents based on initial similarity scores. For this we
selected first 1000 relevant documents if it existed. Then we constructed a new
VSM using this small document sets. This operation drastically reduced both the
number of documents and the number of terms. In short, this level shrinks down
the initial VSM data into more manageable size. Then we calculated similarity
score of new VSM and submitted the results as RUN_8. As illustrated in Table 1,
unlike the achievements of this approach in ImageCLEF 2010 Wikipedia retrieval
task, all factors of retrieval system decline in contrast to our textual base line run.
(4)
RUN_2: Another narrowing down approach that we examine this year is based on
medical image modality classification. Result sets of each query and corresponding
base similarity scores and their class based on any classification algorithm are
inputs for this approach. We also expanded query structure by assignment a type
for example images of each query. A query can have a more than one type. In the
narrowing down phase we filtered out non relevant images that its class was not the
same as corresponding query type. We applied this method filtering the modality
classification using GIFT system and 1NN approach and submitted RUN_2 as
results. As obtained from Table 1, although MAP in this method is decreased but
there are a considerable improvement in P@10 and P@ 20 values in contrast to
textual base line.
8 Adil Alpkocak, Okan Ozturkmenoglu, Tolga Berber,
Ali Hosseinzadeh Vahid and Roghaiyeh Gachpaz Hamed
(5)
RUN_5: In this run, we combined the multiplied textual feature by 3 with the
multiplied visual retrieval result using CEDD feature by 2, divided total score with
the rated value 5.
(6)
RUN_4: We combined the baseline textual retrieval result with visual retrieval
result using CEDD feature and get average score.
(7)
RUN_7: Another approach that we experimented in text retrieval this year is
evaluation of effects of weighting to special words in queries. For this purpose we
selected the medical modality names in queries (i.e., CT, PET, X-RAY, MRI etc.)
and weighted them by 2.5 using query language of Terrier. Although result of this
approach decline in compare to baseline too, but they are better than result of re-
ranking methods. Due to limitation of submitted runs of participant, we did not
submitted weighted text retrieval results as a new run but we fused them with low
level feature of images to obtain better performance. In this run, we combined the
multiplied weighted textual feature by 3 with the multiplied visual retrieval result
using CEDD feature by 2, divided total score with the rated value 5.
(8)
RUN_10: After we combined the multiplied RUN_8 result by 3 with the multiplied
visual retrieval result using CEDD feature by 2 and divided total score with the
rated value 5.
(9)
RUN_6: After we combined the weighted textual retrieval result with visual
retrieval result using CEDD feature and get average score.
(10)
RUN_9: After we combined the RUN_8 result with visual retrieval result
using CEDD feature and get average score.
(11)
DEMIR at ImageCLEFMed 2011: Evaluation of Fusion Techniques for Multimodal
Content-based Medical Image Retrieval 9
Table 1. Runs of DEMIR group in ImageCLEFMed 2011.
RunID Rank Type MAP P10 P20 Rprec bpref rel_ret
5 1 Mixed 0.2372 0.3933 0.3550 0.2881 0.2738 1597
4 2 Mixed 0.2307 0.3967 0.3400 0.2706 0.2606 1595
7 3 Mixed 0.2014 0.3400 0.3233 0.2587 0.2481 1455
10 4 Mixed 0.1983 0.4067 0.3350 0.2397 0.2428 1349
6 5 Mixed 0.1972 0.3367 0.3083 0.2489 0.2383 1443
9 6 Mixed 0.1853 0.3667 0.3283 0.2309 0.2230 1338
2 7 Mixed 0.1645 0.3967 0.3350 0.2340 0.2198 890
1 15 Text 0.1942 0.3400 0.2933 0.2242 0.2215 1444
8 49 Text 0.1452 0.3033 0.2633 0.1683 0.1859 1288
3 12 Visual 0.0174 0.1067 0.0833 0.0434 0.0602 569
5 Conclusion
In this year, we examined effects of different weighting models on text retrieval and
found that the role of proper weighting model selection is to improve the performance
of text retrieval systems. Also, we compare MAP of different extracted low-level
features normalized similarity scores and due to this comparison we select CEDD and
FCTH descriptors as suitable features to utilize for fusion to textual results. Also due
to analogy of combination methods in our previous studies, we acquire choosing a
suitable combination method for fusion improved the results. The results clearly show
that combining text-based and content-based image retrieval results with a proper
fusion technique improves the performance.
References
1. Medical Image Retrieval Task 2011, http://www.imageclef.org/2011/medical
2. The Terrier IR Platform, http://terrier.org/docs/v2.2.1/
3. Porter, M.F.: An algorithm for suffix stripping, Program: electronic library and
information systems, vol. 14, iss. 3, pp. 130--137 (1980)
4. Chatzichristofis, S.A., Boutalis, Y.S., Lux, M.: Img(Rummager): An Interactive
Content Based Image Retrieval System. In: 2nd International Workshop on
10 Adil Alpkocak, Okan Ozturkmenoglu, Tolga Berber,
Ali Hosseinzadeh Vahid and Roghaiyeh Gachpaz Hamed
Similarity Search and Applications, pp. 151--153. IEEE Computer Society,
Washington (2009)
5. Chatzichristofis, S.A., Boutalis, Y.S.: FCTH: Fuzzy Color and Texture Histogram -
A Low Level Feature for Accurate Image Retrieval. In: 9th International Workshop
on Image Analysis for Multimedia Interactive Services, vol., no., pp.191--196.
Klagenfurt, Austria (2008)
6. Chatzichristofis, S.A., Boutalis Y.S.: Content based radiology image retrieval
using a fuzzy rule based scalable composite descriptor. Multimedia Tools and
Applications, vol. 46, iss. 2, pp. 493--519 (2009)
7. Won C. S., Park D. K., Park S.J.: Efficient Use of MPEG-7 Edge Histogram
Descriptor, ETRI Journal, vol. 24, no. 1 (2002)
8. Pradeep K. Atrey, Anwar Hossain M.: Multimodal fusion for multimedia analysis,
Multimedia Systems, vol 16, pp. 345--379 (2010)
9. Fagin R, Lotem A, Naor M.: Optimal aggregation algorithms for middleware, In:
Journal of Computer and System Sciences, vol. 66, pp. 614--656 (2003)
10. Croft, W.B.: Combining Approaches to Information Retrieval, In: Advances in
Information Retrieval, vol. 7, pp. 1--36 (2002)
11. He, Ben., Ounis, Iadh: Term Frequency Normalisation Tuning for BM25 and DFR
Models., Advances in Information Retrieval, vol. 3408, pp. 200--214 (2005)
12. Ulker T.: Analysis and comparison of combination algorithms for joining ranked
inputs, MSc Thesis, Dokuz Eylül University Department of Computer
Engineering, Izmir, Turkey (2003)