=Paper=
{{Paper
|id=Vol-1276/MedIR-SIGIR2014-05
|storemode=property
|title=Multi-modal Relevance Feedback for Medical Image Retrieval
|pdfUrl=https://ceur-ws.org/Vol-1276/MedIR-SIGIR2014-05.pdf
|volume=Vol-1276
|dblpUrl=https://dblp.org/rec/conf/sigir/MarkonisSM14
}}
==Multi-modal Relevance Feedback for Medical Image Retrieval==
Multi–modal relevance feedback for medical image
retrieval
Dimitrios Markonis Roger Schaer Henning Müller
HES–SO HES–SO HES–SO
TechnoPole 3 TechnoPole 3 TechnoPole 3
Sierre, Switzerland Sierre, Switzerland Sierre, Switzerland
dimitrios.markonis@hevs.ch roger.schaer@hevs.ch henning.mueller@hevs.ch
ABSTRACT in formulating a precise query for a specific task, they gener-
Medical image retrieval can assist physicians in finding in- ally see quickly whether a returned result is relevant to the
formation supporting their diagnosis. Systems that allow information need or not. This technique found use in im-
searching for medical images need to provide tools for quick age retrieval particularly with the emerge of content–based
and easy navigation and query refinement as the time for image retrieval (CBIR) systems [18, 19, 20]. Following the
information search is often short. CBIR mentality, the visual content of the marked results is
Relevance feedback is a powerful tool in information re- used to refine the initial image query. With the result im-
trieval. This study evaluates relevance feedback techniques ages represented as a grid of thumbnails, relevance feedback
with regard to the content they use. A novel relevance feed- can be applied quickly to speed up the search iterations and
back technique that uses both text and visual information refine results. Recent user–tests with radiologists on a med-
of the results is proposed. ical image search system also showed that this method is
Results show the potential of relevance feedback tech- intuitive and straightforward to learn [7].
niques in medical image retrieval and the superiority of the Depending on whether the user manually provides the
proposed algorithm over commonly used approaches. feedback to the system (e.g. by marking results) or the
Future steps include integrating semantics into relevance system obtains this information automatically (e.g. by log
feedback techniques to benefit of the structured knowledge analysis) relevance feedback can be categorized as explicit or
of ontologies and experimenting on the fusion of text and implicit. Moreover, the information obtained by relevance
visual information. feedback can be used to affect the general behaviour of the
system (long–term learning). In [11] a market basket analy-
sis algorithm is applied in image retrieval of long–term learn-
Keywords ing. A recent review of short–term and long–term learning
relevance feedback, content–based image retrieval, medical relevance feedback techniques in CBIR can be found in [6].
image retrieval An extensive survey of relevance feedback in text–based re-
trieval systems is presented in [15] and for CBIR in [14].
1. INTRODUCTION In the medical informatics field, [1] applies CBIR with
relevance feedback on mammography retrieval. In [12], an
Searching for images is a daily task for many medical pro-
image retrieval framework using relevance feedback is evalu-
fessionals, especially in image–oriented fields such as radiol-
ated on a dataset of 5000 medical images that uses support
ogy. However, the huge amount of visual data in hospitals
vector machines to compute the refined queries.
and the medical literature is not always easily accessible and
In this paper we evaluate different explicit, short–term
physicians have generally little time for information search
relevance feedback techniques using visual content or text
as they are charged with many tasks.
for medical image retrieval. We propose a technique that
Therefore, medical image retrieval systems need to return
combines visual and text–based relevance feedback and show
information adjusted to the knowledge level and expertise of
that it achieves a competitive performance to the state–of–
the user in a quick and precise fashion. A well known tech-
the–art approaches.
nique trying to improve search results by user interaction is
relevance feedback [13]. Relevance feedback allows the user
to mark results returned in a previous search step as relevant 2. METHODS
or irrelevant to refine the initial query. The concept behind
relevance feedback is that though user may have difficulties 2.1 Rocchio algorithm
One of the most well known relevance feedback techniques
is Rocchio’s algorithm [13]. Its mathematical definition is
given below:
1 1
qm = α qo + β dj − γ dj (1)
|Dr | |Dnr |
j ∈Dr
d j ∈Dnr
d
where qm is the modified query,
Copyright is held by the author/owner(s). qo is the original query,
MedIR 2014, July 11, 2014, Gold Coast, Australia. Dr is the set of relevant images,
20
Dnr is the set of non–relevant images and
α, β and γ are weights.
Typical values for the weights are α = 1, β = 0.8 and
γ = 0.2. Rocchio’s algorithm is typically used in vector
models and also for CBIR. Intuitively, the original query
vector is moved towards the relevant vectors and away from
the irrelevant ones. By giving a weight to the positive and
negative parts a problem of CBIR can be avoided that when
more negative than positive feedback exists that also many
relevant images disappear from the results set.
2.2 Late fusion
Another technique that showed potential in image retrieval [5]
is late fusion. Late fusion [2] is used in information retrieval
to combine result lists. It can be applied for fusing multiple Figure 1: Mean average precision per search itera-
features, multiple queries and in multi–modal techniques. tion for k = 5.
The concept behind this method is to merge the result lists
into a single list while boosting common occurrences using
a fusion rule. Table 1: Best mAP scores
Run k=5 k = 20 k = 50 k = 100
For example, the fusion rule of the score–based late fusion text 0.197 (1) 0.2544 (4) 0.3107 (3) 0.3349 (4)
method CombMNZ [17] is defined as: visual lf 0.2099 (2) 0.2243 (3) 0.2405 (4) 0.2553 (3)
visual roc 0.2096 (2) 0.2187 (2) 0.2249 (3) 0.2268 (2)
ScombMNZ (i) = F (i) ∗ ScombSUM (i) (2) mixed lf 0.1971 (3) 0.2606 (4) 0.3079 (4) 0.3487 (3)
mixed roc 0.1947 (1) 0.2635 (4) 0.3207 (4) 0.3466 (4)
where F (i) is the number of times an image i is present in
retrieved lists with a non–zero score, and S(i) is the score
assigned to image i. CombSUM is given by were used for the visual modelling of the images. In multi-
Nj modal runs, the fusion of the visual and text information is
ScombSUM (i) = Sj (i) (3) performed only for the text 1000 top results as in the evalu-
j=1 ation of ImageCLEF only the top 1000 documents are taken
into account in any case.
where Sj (i) is the score assigned to image i in retrieved list Five techniques were evaluated in this study:
j.
1. text: text–based RF using vector space model. Word
2.3 Multi–modal relevance feedback stemming, tokenization and stopword removal is per-
Most of the techniques use vectors either from the text formed in both text and multi–modal runs.
or the visual models. However, it has been shown that ap-
proaches that use both text and visual information can out- 2. visual rocchio: visual RF using Rocchio to fuse the
perform single–modal ones in image retrieval. We propose relevant image vectors and CombMNZ fusion to fuse
the use of multi–modal information for relevance feedback the original query’s results with the visual ones.
to enhance the retrieval performance. This is, to the extend 3. visual lf : visual RF using late fusion (and the CombMNZ
of our knowledge, the first time that such a technique is pro- fusion rule) to fuse the relevant image results and the
posed in image retrieval. As late fusion is applied on result original query results with the visual ones.
lists, it is straightforward to use for combining results from
visual and text queries. 4. mixed rocchio: multimodal RF using Rocchio to fuse
the relevant image vectors and CombMNZ fusion to
2.4 Experimental setup fuse the original query results with the relevant cap-
For evaluating the relevance feedback techniques the fol- tion results and relevant visual results.
lowing experimental setup was followed: The n search iter-
ations are initiated with a text query in iteration 0. The 5. mixed lf : multimodal RF using late fusion (and the
relevant results from the top k results of iteration i were CombMNZ fusion rule) to fuse the relevant image re-
used in the relevance feedback formula of the iteration i + 1 sults and the original query results with the captions’
for i = 0...n − 2. results and relevant visual results.
The image dataset, topics and ground truth of Image-
CLEF 2012 medical image retrieval task [9] were used in 3. RESULTS
this evaluation. The dataset contains more than 300’000
The evaluation of the five techniques was performed for
images from the medical open access literature.
k = 5, 20, 50, 100 and n = 5. Results of the mean average
The image captions were accessed by the text–based runs
precision (mAP) of each technique per iteration are shown
and indexed with the Lucene1 text search engine. Vector
in Figures 1, 2, 3, 4.
space model was used along with tokenization, stopword
Table 1 gives the best mAP scores of each run. The num-
removal, stemming and Inverse document frequency-Term
bers in parentheses are the number of the iteration when
frequency weighting. The Bag–of–visual–words model de-
this score was achieved. For scores that were the same in
scribed in [3] and the bag–of–colors model appearing in [4]
multiple iterations of the same run, the iteration closer to
1
http://lucene.apache.org/ the first is used.
21
Figure 2: Mean average precision per search itera- Figure 4: Mean average precision per search itera-
tion for k = 20. tion for k = 100.
significance difference between the three best approaches.
However, applying different fusion rules for combining vi-
sual and text information (such as linear–weighting) could
further improve the results of the mixed approaches.
It can be noted that as the k increases, the performance
improvement also increases, highlighting the added value of
relevance feedback. Larger values of k were not explored as
this scenarios were judged as unrealistic.
In the visual runs using Rocchio for combining the visual
queries is performing worse than late fusion. This comes in
accordance with the findings in [3]. The reason behind this
could be that the large visual diversity of relevant images in
medicine and the curse of dimensionality cause the modified
vector to behave as an outlier in the high dimensional visual
Figure 3: Mean average precision per search itera-
feature space. In the mixed runs the difference between
tion for k = 50.
the two methods is not statistically significant with Rocchio
performing slightly better than the late fusion.
Irrelevant results were ignored, as they often have little or
4. DISCUSSION no impact on the retrieval performance [10, 16]. More im-
All of the evaluated techniques improve retrieval after the portantly, the ground truth of the dataset used contains a
initial search iteration. This demonstrates the potential of much larger portion of annotated irrelevant results than rel-
relevance feedback for refining medical image search queries. evant ones. This was considered to potentially simulate an
Relevance feedback using only visual appearance models, unrealistic scenario, as users do not usually mark many re-
even though improving the retrieval performance after the sults as negative examples. Having too many negative exam-
first iteration, performed worse than the text–based runs in ples could also cause the modified vector to follow an outlier
most cases. Visual features still suffer from the semantic behaviour. Preliminary results confirmed this hypothesis,
gap between the expressiveness of visual features and our where the use of negative results for relevance feedback can
human interpretation. Still, this shows their usefulness in decrease performance after the first iteration.
image datasets where no or little text meta–data are avail- It should be noted that this is an automated relevance
able. Moreover, when combined with the text–information feedback experiment of positive only feedback and that in
in the proposed method, they improve the text–only base- selective relevance feedback situations the retrieval perfor-
line. mance is expected to perform even better. A larger number
The proposed multi–modal runs provide the best results of steps could be investigated but this might be unrealistic,
in all the cases except for case k = 5. Surprisingly, the given the fact that physicians have little time and stop after
visual runs perform slightly better than the text and the a few minutes of search [8]. Often users will only test a few
multi–modal approaches for this case. However, assuming steps of relevance feedback at the most.
independent and normal distributed average precision val-
ues the significance tests show that the difference is not sta-
tistically significant. 5. CONCLUSIONS
We consider the case k = 20 as the most realistic scenario This paper proposes the use of multi–modal information
since users do not often inspect more than 2 pages of re- when applying relevance feedback to medical image retrieval.
sults. Especially for grid–like result interface views, where An experiment was set up to simulate the relevance feedback
each page can contain 20 to 50 results, we consider k = 20 of a user on a number of medicine–related topics from Im-
more realistic than k = 5. In this case the proposed meth- ageCLEF 2012.
ods achieve the best performance with 0.2606 and 0.2635 In general, all the techniques evaluated in this study im-
respectively. Again, the significance tests do not find any prove the performance, which shows the added value of rele-
22
vance feedback. Text–based relevance feedback showed con- Working Notes of CLEF 2012 (Cross Language
sistently good results. Visual–based techniques showed com- Evaluation Forum), September 2012.
petitive performance for small shortlist sizes, underperform- [10] H. Müller, W. Müller, D. M. Squire,
ing in the rest of the cases. The proposed multi–modal S. Marchand-Maillet, and T. Pun. Strategies for
approaches showed promising results slightly outperforming positive and negative relevance feedback in image
the text–based one but without statistical significance. retrieval. Technical Report 00.01, Computer Vision
More fusion techniques are going to be evaluated in the Group, Computing Centre, University of Geneva, rue
future. Comparison to manual query refinement by users is G n ral Dufour, 24, CH–1211 Gen ve, Switzerland,
considered in future plans, to assess relevance feedback as a Jan. 2000.
concept in medical image retrieval. The addition of semantic [11] H. Müller, D. M. Squire, and T. Pun. Learning from
search is also of interest, to take advantage of the structured user behavior in image retrieval: Application of the
knowledge of the medical ontologies such as RadLex (Radi- market basket analysis. International Journal of
ology Lexicon) and MeSH (Medical Subject Headings). Computer Vision, 56(1–2):65–77, 2004. (Special Issue
on Content–Based Image Retrieval).
6. ACKNOWLEDGEMENTS [12] M. M. Rahman, P. Bhattacharya, and B. C. Desai. A
This work was supported by the EU 7th Framework Pro- framework for medical image retrieval using machine
gram in the context of the Khresmoi project (grant 257528). learning and statistical similarity matching techniques
with relevance feedback. Information Technology in
7. REFERENCES Biomedicine, IEEE Transactions on, 11(1):58–69,
[1] C.-C. Chen, P.-J. Huang, C.-Y. Gwo, Y. Li, and C.-H. 2007.
Wei. Mammogram retrieval: Image selection strategy [13] J. J. Rocchio. Relevance feedback in information
of relevance feedback for locating similar lesions. retrieval. In The SMART Retrieval System,
International Journal of Digital Library Systems Experiments in Automatic Document Processing,
(IJDLS), 2(4):45–53, 2011. pages 313–323. Prentice Hall, Englewood Cliffs, New
[2] A. Depeursinge and H. Müller. Fusion techniques for Jersey, USA, 1971.
combining textual and visual information retrieval. In [14] Y. Rui, T. S. Huang, and S. Mehrotra. Relevance
H. Müller, P. Clough, T. Deselaers, and B. Caputo, feedback techniques in interactive content–based
editors, ImageCLEF, volume 32 of The Springer image retrieval. In I. K. Sethi and R. C. Jain, editors,
International Series On Information Retrieval, pages Storage and Retrieval for Image and Video Databases
95–114. Springer Berlin Heidelberg, 2010. VI, volume 3312 of SPIEProc, pages 25–36, Dec. 1997.
[3] A. Garcı́a Seco de Herrera, D. Markonis, I. Eggel, and [15] I. Ruthven and M. Lalmas. A survey on the use of
H. Müller. The medGIFT group in ImageCLEFmed relevance feedback for information access systems. The
2012. In Working Notes of CLEF 2012, 2012. Knowledge Engineering Review, 18(02):95–145, 2003.
[4] A. Garcı́a Seco de Herrera, D. Markonis, and [16] G. Salton and C. Buckley. Improving retrieval
H. Müller. Bag of colors for biomedical document performance by relevance feedback. Readings in
image classification. In H. Greenspan and H. Müller, information retrieval, 24:5, 1997.
editors, Medical Content–based Retrieval for Clinical [17] J. A. Shaw and E. A. Fox. Combination of multiple
Decision Support, MCBR–CDS 2012, pages 110–121. searches. In TREC-2: The Second Text REtrieval
Lecture Notes in Computer Sciences (LNCS), Oct. Conference, pages 243–252, 1994.
2013. [18] D. M. Squire, W. Müller, H. Müller, and T. Pun.
[5] A. Garcı́a Seco de Herrera, D. Markonis, R. Schaer, Content–based query of image databases: inspirations
I. Eggel, and H. Müller. The medGIFT group in from text retrieval. Pattern Recognition Letters
ImageCLEFmed 2013. In Working Notes of CLEF (Selected Papers from The 11th Scandinavian
2013 (Cross Language Evaluation Forum), September Conference on Image Analysis SCIA ’99),
2013. 21(13–14):1193–1198, 2000. B.K. Ersboll, P. Johansen,
[6] J. Li and N. M. Allinson. Relevance feedback in Eds.
content-based image retrieval: a survey. In Handbook [19] L. Taycher, M. L. Cascia, and S. Sclaroff. Image
on Neural Information Processing, pages 433–469. digestion and relevance feedback in the ImageRover
Springer, 2013. WWW search engine. pages 85–94, 1997.
[7] D. Markonis, F. Baroz, R. L. Ruiz de Castaneda, [20] M. E. Wood, N. W. Campbell, and B. T. Thomas.
C. Boyer, and H. Müller. User tests for assessing a Iterative refinement by relevance feedback in
medical image retrieval system: A pilot study. In content–based digital image retrieval. pages 13–20,
MEDINFO 2013, 2013. 1998.
[8] D. Markonis, M. Holzer, S. Dungs, A. Vargas,
G. Langs, S. Kriewel, and H. Müller. A survey on
visual information search behavior and requirements
of radiologists. Methods of Information in Medicine,
51(6):539–548, 2012.
[9] H. Müller, A. Garcı́a Seco de Herrera,
J. Kalpathy-Cramer, D. Demner Fushman, S. Antani,
and I. Eggel. Overview of the ImageCLEF 2012
medical image retrieval and classification tasks. In
23