CLaC at ImageCLEFPhoto 2008
                    Osama El Demerdash, Leila Kosseim and Sabine Bergler
                                   Concordia University
                      {osama el,kosseim,bergler}@cse.concordia.ca


                                             Abstract
     This paper presents our participation at the ImageCLEFPhoto 2008 task. We submit-
     ted six runs, experimenting with our own block-based visual retrieval as well as with
     query expansion. The results we obtained show that despite the poor performance
     of the visual and text retrieval components, good results can be obtained through
     Pseudo-relevance feedback and the fusion of the results.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database
Management]: Languages—Query Languages

General Terms
Measurement, Performance, Experimentation

Keywords
Image Retrieval, Pseudo-relevance feedback


1    Introduction
This paper presents our participation at the ImageCLEFPhoto 2008 task. We submitted six runs
experimenting with our own block-based visual retrieval as well as with query expansion. While
the 2008 task introduced a focused clustering theme for the first time, we did not attempt to use
the cluster target information. Our resources comprise a text search engine and a content-based
search system that we developed. The results we obtained show that despite the poor performance
of the visual and text retrieval components, good results can be obtained through Pseudo-relevance
feedback and the inter-media fusion of the results.


2    Related Work
Block-based methods have been used extensively in image retrieval. Examples can be found in [2]
and [4]. [3] investigated pseudo-relevance feedback and fusion methods on the same dataset used
in our experiments, the IAPR TC-12 collection. They reported a precision gain with feedback but
not with fusion.
                   Visual query          Visual Search              Results


                                            Image
                                           Database


                     Text Query             Text Search              Query Expansion

                                                                                       Combined
                                                                                         Results
                                                          Results


                                  Figure 1: Architecture of the System


3    Resources
For text retrieval, we used the Apache Lucene engine [1] , which implements a TF-IDF paradigm.
Stemming was done using the Snow-ball stemmer based on the porter algorithm and available
separately under the BSD license at http://snowball.tartarus.org/. For image analysis, we utilized
the Java Advanced Imaging (JAI) API v. 1.1.3. JAI is available from http://www.java.net.


4    Text Retrieval
As mentioned earlier, for text retrieval, we used the Apache Lucene engine, which implements
a TF-IDF paradigm. Stop-words were removed, the rest of the terms were stemmed using the
porter stemming algorithm. The documents were then indexed as field data retaining only the
title, notes and location fields, all of which were concatenated into one field. All text query terms
were joined using the OR operator.
     When searching the text, the query is also stemmed and stop words are removed. we found
that using the first sentence of the narrative field in addition to the title field improves the result.
By contrast, the rest of the narrative needs semantic processing to avoid introducing noise.
                      Figure 2: Partitioning the Image for Visual Retrieval


5     Visual Retrieval
For visual retrieval, we implemented a system based on unsupervised analysis of the image. We
sought to capture basic global and local color, texture and shape information. In order to achieve
this goal, we divided the image into 2X2, 3X3, 4X4 and 5X5 blocks yielding 4, 9, 16 and 25 equal
partitions respectively. We also used the image as a whole as well as a center block occupying half
the image dimensions. Figure 2 shows the different regional divisions used to analyze the image.
The image was first converted to the Intensity/Hue/Saturation (IHS) color space, a perceptual
color space which more intuitive and reflective of human color perception than the RGB color
Space, then the following features were extracted:
    • A three-band color histogram for each of the image divisions
    • A histogram of the grey level image
    • A histogram of the gradient magnitude image for each of the divisions of the grey-level image
    • A three-band color histogram of the thumbnail of the image
The first feature captures the color characteristics of the image, while the grey-level histogram
conveys some texture information. The gradient magnitude adds the outline of the shapes in the
image.
    For retrieval, the different partitions are compared to their counter parts in the query image.
We did not experiment with assigning different weights to features. After experimenting with
several measures including the Euclidean distance, the Manhattan distance was chosen as the
distance measure. The images in the database were ranked according to their highest proximity
to any of the three query images.
6     Query Expansion and Fusion of the Results
The next step in processing the query is the text query expansion involving a pseudo-relevance
feedback mechanism and the fusion of the text and visual search results.

6.1    Query Expansion
We attempted several ways of query expansion. The highest ranked results from each of the
text and visual search engines were passed to the other engine. We also attempted adding noun
synonyms from WordNet to the query.
    Our best run uses pseudo-relevance feedback for query expansion. While the MAP of the visual
only run is only 0.055, its precision at five retrieved documents (0.328) is significantly higher than
that of the text only run (0.236). For this reason, we use the highest ranked document for expansion
of the text query. This is only done if the document meets a confidence level that we determined
empirically. The confidence score is assigned based on the proximity score to the query image.

6.2    Fusion of Image and Text Search Results
To combine the results from the different media searches, we again took into consideration the
confidence level in the visual results (i.e. the level of proximity from the query images). A
maximum of three highest ranked images is taken from the visual query result depending on the
confidence score, followed by a majority of the text results after query expansion.


7     Results
We submitted six runs at ImageCLEFPhoto 2008. Following is their description:

    • clacTX: Uses text search with title field only
    • clacTXNR: uses Text Search with Title field and the first sentence of Narrative field with
      query expansion
    • clacIR: uses visual search only
    • clacIRTX: combines the results from clacTXNR and clacIR
    • clacNoQE: uses Text Search on Title and Narrative Field without query expansion
    • clacNoQEMX: same as clacNoQE combined with clacIR

Table 1 shows the results we obtained in comparison to the mean, median and best runs of the
track, taken from the best 4 runs from each participating group (25 groups and 100 runs in total) .
As we expected we obtained our highest Mean Average Precision (MAP) for the run that utilized
the maximum of our resources and methods.


References
[1] Otis Gospodnetic and Erik Hatcher. Lucene in Action. 2005.
[2] Jun-Hua Han and De-Shuang Huang. A novel BP-Based Image Retrieval System. In Inter-
    national Symposium on Circuits and Systems (ISCAS 2005), 23-26 May 2005, Kobe, Japan,
    pages 1557–1560. IEEE, 2005.
                          Table 1: Results at ImageCLEFPhoto 2008.
        Run ID           Modality MAP         P10    P20    P30         GMAP      Rel
        clacTX           Text       0.1201 0.1872 0.1487 0.1462         0.019     1155
        clacTXNR         Text       0.2577 0.4103 0.3449 0.3085         0.1081    1859
        clacIR           Visual     0.0552 0.2282 0.1615 0.1214         0.0268    629
        clacIRTX         Mixed      0.2622 0.4359 0.3744 0.3308         0.1551    1630
        clacNoQE         Mixed      0.2034 0.3205 0.2705 0.2487         0.078     1701
        clacNoQEMX       Mixed      0.218     0.4026 0.3269 0.2855      0.129     1546
        Average run      N/A        0.2187            0.3203
        Median run       N/A        0.2096            0.3203
        Best run         N/A        0.4288            0.6962


[3] Nicolas Maillot, Jean-Pierre Chevallet, and Joo-Hwee Lim. Inter-media pseudo-relevance feed-
    back application to imageclef 2006 photo retrieval. In Carol Peters, Paul Clough, Fredric C.
    Gey, Jussi Karlgren, Bernardo Magnini, Douglas W. Oard, Maarten de Rijke, and Maximil-
    ian Stempfhuber, editors, CLEF, volume 4730 of Lecture Notes in Computer Science, pages
    735–738. Springer, 2006.
[4] Takala V., Ahonen T., and Pietikäinen M. Block-based methods for image retrieval using
    local binary patterns. 2005. In: Image Analysis, SCIA 2005 Proceedings, Lecture Notes in
    Computer Science 3540, Springer, 882-891.