<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>R. Granados</string-name>
          <email>rgranados@fi.upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>X. Benavent</string-name>
          <email>xaro.benavent@uv.es</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Agerri</string-name>
          <email>rodrigo.agerri@upm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. García-Serrano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J.M. Goñi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Gomar</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E. de Ves</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Domingo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>G. Ayala</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Information Recognition</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de Educación a Distancia</institution>
          ,
          <addr-line>UNED</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>UPM</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <abstract>
        <p>The Miracle-FI participation at ImageCLEF 2009 photo retrieval task main goal was to improve the merge of content-based and text-based techniques in our experiments. The global system includes our own implemented tool IDRA (InDexing and Retrieving Automatically), and the Valencia University CBIR system. Analyzing both “topics_part1.txt” and “topics_part2.txt” task topics files, we have built different queries files, eliminating the negative sentences with the text from title and clusterTitle or clusterDescription, one query for each cluster (or not) of each topic from 1 to 25 and one for each of the three images of each topic from 26 to 50. In the CBIR system the number of low-level features has been increased from the 68 component used at ImageCLEF 2008 up to 114 components, and in this edition only the Mahalanobis distance has been used in our experiments. Three different merging algorithms were developed in order to fuse together different results lists from visual or textual modules, different textual indexations, or cluster level results into a unique topic level results list. For the five runs submitted we observe that MirFI1, MirFI2 and MifFI3 obtain quite higher precision values than the average ones. Experiment MirFI1, our best run for precision metrics (very similar to MirFI2 and MirFI3), appears in the 16th position in R-Precision classification and in the 19th in MAP one (from a total of 84 submitted experiments). MirFI4 and MirFI5 obtain our best diversity values, appearing in position 11th (over 84) in cluster recall classification, and being the 5th best group from all the 19 participating ones.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Analyzing both “topics_part1.txt” and “topics_part2.txt” task topics files, we have built different queries files:
[qf2] “BELGAtopics-tctcd-(q-cl)-fQ.txt”: one query for each cluster of each topic with the text from title,
clusterTitle and clusterDescription (eliminating the negative sentences and negative clusters), [qf3]
“BELGAtopics-tct-(q-cl)-fQ.txt”: the same as above but just with the text from title and clusterTitle, [qf4]
“BELGAtopics-topEnt(1..25)+capEnt(26..50)-(q-cl)-fQ.txt”: one query for each cluster (except negatives
ones) of each topic from 1 to 25 and one for each of the three images of each topic from 26 to 50, [qf5]
“BELGAtopics-cap(title+desc)-(26..50)-(q-cl)-fQ.txt”: one query for each one of the three images of each
topic from 26 to 50, obtained from the title and description fields of the XML captions.</p>
      <p>In the CBIR system the number of low-level features has been increased from the 68 component used at
ImageCLEF 2008 up to 114 components, mainly due to the use of local color histogram descriptors that were not
use last year. This edition only the Mahalanobis distance has been used in our experiments.</p>
      <p>Three different merging algorithms were developed in order to fuse together different results lists from visual
or textual modules, different textual indexations, or cluster level results into a unique topic level results list:
MAXmerge (the algorithm selects the results from the N lists which have a higher relevance value),
EQUImerge (the algorithm selects the first result of each query (cluster), not selected yet), and ENRICH (this
merging uses two results lists, a main list and a support list, and when a concrete result appears in both lists, the
relevance will be increased).</p>
      <p>The five runs submitted were: [run1] “MirFI1_T-CT-I_TXT-IMG”: launching [qf3], reordering textual
results list with CBIR system, and merging both lists with the ENRICH algorithm, [run2]
“MirFI2_T-CT-CDI_TXT-IMG”: the same as above, but launching [qf2], [run3] “MirFI3_T-CT-CD_TXT”: textual experiment
launching [qf2], [run4] “MirFI4_T-CT-CD-I_TXT”: topics 1 to 25 as [run3], and topics 26 to 50, launching
[qf5], [run5] “MirFI5_T-CT-CD-I_TXT”: textual experiments with NER. [qf4]. All the queries files in the
submitted experiments were built from different queries of different clusters of a topic, so the EQUImerge
algorithm was applied to fuse the different clusters-based result lists into an unique one. In the results, we
observe that experiment MirFI1, our best run for precision metrics (very similar to MirFI2 and MirFI3), appears
in the 16th position in R-Precision classification and in the 19th in MAP one (from a total of 84 submitted
experiments). MirFI4 and MirFI5 obtain our best diversity values, appearing in position 11th (over 84) in cluster
recall classification, and being the 5th best group from all the 19 participating ones.</p>
      <p>BELGA captions</p>
      <p>TO XML
queries files
CONSTRUCTION</p>
      <p>NE list
NER tagger
BELGA
Images DB
captions.txt
(Part1, part2</p>
      <p>XML
captions
QUERIES</p>
      <p>file
BELGA
topics.txt</p>
      <p>XML fields</p>
      <p>Selection</p>
      <p>Text
Extractor
TEXTUAL</p>
      <p>PreProcess
docs
queries</p>
      <p>IDRA
Index
IDRA</p>
      <p>Search
Feature</p>
      <p>Extraction Visual Search
VISUAL</p>
      <p>Fig. 1. System overview</p>
      <p>VISUAL
Result
List</p>
      <p>IDRA
Result</p>
      <p>List
MERGING
module
Result List</p>
    </sec>
    <sec id="sec-2">
      <title>2 System Description</title>
      <p>The global system (shown at Fig. 1) includes our own implemented tool IDRA (InDexing and Retrieving
Automatically), and the Valencia University CBIR system. The main goal, using IDRA [12] with such a large
collection, was to analyze how the obtained results from the textual module could be improved using information
from the content-based module. In this year, a global strategy for all experiments has been that the
ContentBased module always starts working with a selected textual results list as part of his input data (different from
our participation at ImageCLEF 2008 [11]).
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Collection preprocessing</title>
      <p>
        ImageCLEFphoto09 task uses the so-called “BELGA Collection” which contains 498,920 images from Belga
News Agency. Each photograph is accompanied by a caption composed of English text up to a few sentences in
length [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Image captions are provided without a specific format. Because of this, we preprocess the captions
file to build a semi-structured XML description for each image, similar to the used in the ImageCLEFphoto08
task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This format includes 8 tags (docno, title, description, notes, location, date, image and thumbnail), which
we try to fill preprocessing the captions texts. This preprocess consists on trying to identify in each caption the
appropriate part of the text to fill the XML tags. We can see an example of this transformation in Fig. 2.
caption
1012315|UK OUT NO MAGS NO SALES NO ARCHIVES NO INTERNET MAN02 - 20020802
- MANCHESTER, UNITED KINGDOM : England's Sarah Price emerges out of the
water, during the Women's 50m backstroke heats at the Manchester Aquatic
Centre, as part of the Commonwealth Games in Manchester, Friday 02
August 2002. EPA PHOTO PA-RUI VIEIRA
      </p>
      <p>XMLcaption
&lt;doc&gt;
&lt;docno&gt;1012315&lt;/docno&gt;
&lt;title&gt;England's Sarah Price emerges out of the water&lt;/title&gt;
&lt;description&gt; during the Women's 50m backstroke heats at the
Manchester Aquatic Centre, as part of the Commonwealth Games in
Manchester, Friday 02 August 2002&lt;/description&gt;
&lt;notes&gt;UK OUT NO MAGS NO SALES NO ARCHIVES NO INTERNET MAN02
- 20020802 - MANCHESTER, UNITED KINGDOM&lt;/notes&gt;
&lt;location&gt;MANCHESTER, UNITED KINGDOM&lt;/location&gt;
&lt;date&gt;20020802&lt;/date&gt;
&lt;image&gt;1012315.jpg&lt;/image&gt;
&lt;thumbnail /&gt;
&lt;/doc&gt;</p>
      <p>The images of the database have been pre-processed for the Content-Based Image module because some of
them have extra-information on the image itself. This extra-information consists of some bands on the frame of
the image with color pixels of the RGB and MCY system colors. This kind of information is often used for color
calibration. So that, the first attempt was to use this extra-information in order to calibrate the color images of the
database. But, after a visual analysis of different images we realize that they don’t follow an established format.
At fig. 3 different images formats are shown: two vertical color bands, two horizontal color bands, only one
color band, some color bands have the two color systems (RGB and MCY), others only one of the color systems,
others extra white frame of different sizes. Therefore, the solution adopted was to reduce all the images to the
90% of his real size in order to eliminate the different bands and the white pixels frames.
Analyzing both “topics_part1.txt” and “topics_part2.txt” task topics files, we built different queries files to be
launched against the IDRA indexation as the first step in the generation of our experiments. The IDRA queries
format separated by a blank. The different queries files constructed are explained in the following using the
example in Fig. 4 of this year’s topics (from ‘Topics – part 1’) shown in the official website of the task [14].
&lt;top&gt;
&lt;num&gt; Number: 0 &lt;/num&gt;
&lt;title&gt; soccer &lt;/title&gt;
&lt;clusterTitle&gt; soccer belgium &lt;/clusterTitle&gt;
&lt;clusterDesc&gt; Relevant images contain photographs of the Belgium team in a soccer match. &lt;/clusterDesc&gt;
&lt;image&gt; belga38/00704995.jpg &lt;/image&gt;
&lt;clusterTitle&gt; spain soccer &lt;/clusterTitle&gt;
&lt;clusterDesc&gt; Relevant images contain photographs of the Spain team in a soccer match. &lt;/clusterDesc&gt;
&lt;image&gt; belga6/00110574.jpg &lt;/image&gt;
&lt;clusterTitle&gt; beach soccer &lt;/clusterTitle&gt;
&lt;clusterDesc&gt; Relevant images contain photographs of a soccer beach match. &lt;/clusterDesc&gt;
&lt;image&gt; belga33/06278068.jpg &lt;/image&gt;
&lt;clusterTitle&gt; italy soccer &lt;/clusterTitle&gt;
&lt;clusterDesc&gt; Relevant images contain photographs of the Italy team in a soccer match. &lt;/clusterDesc&gt;
&lt;image&gt; belga20/1027435.jpg &lt;/image&gt;
&lt;clusterTitle&gt; soccer netherlands &lt;/clusterTitle&gt;
&lt;clusterDesc&gt; Relevant images contain photographs of the Netherlands team in a soccer match or the teams
in Netherlands' league. &lt;/clusterDesc&gt;
&lt;image&gt; belga10/01214810.jpg &lt;/image&gt;
&lt;clusterTitle&gt; soccer -belgium -spain -beach -italy –Netherlands &lt;/clusterTitle&gt;
&lt;clusterDesc&gt; Relevant images contain photographs of any aspects or subtopics of soccer which are not
related to the above clusters. &lt;/clusterDesc&gt;
&lt;image&gt; belga20/01404831.jpg &lt;/image&gt;
[qf1] “BELGAtopics-all-(q)-fQ.txt”: one query per topic containing one stream with all the text from
all the clusters (not used for runs). [qf1] would contain the query shown in Fig. 5.
0 soccer soccer belgium Relevant images contain photographs of the Belgium team in a soccer match. spain soccer Relevant
images contain photographs of the Spain team in a soccer match. beach soccer contain Relevant images contain photographs of
a soccer beach match. italy soccer Relevant images contain photographs of the Italy team in a soccer match. soccer
netherlands Relevant images contain photographs of the Netherlands team in a soccer match or the teams in Netherlands'
league. soccer -belgium -spain -beach -italy –netherlands Relevant images contain photographs of any aspects or subtopics
of soccer which are not related to the above clusters.</p>
      <p>Fig 5. [qf1] example
[qf2] “BELGAtopics-tctcd-(q-cl)-fQ.txt”: one query for each cluster of each topic with the text from
title, clusterTitle and clusterDescription. We eliminate the negative sentences (those containing words
“not” or “irrelevant”). We do not include the negative clusters as “soccer -belgium -spain -beach -italy
netherlands”. [qf2] would contain the queries shown in Fig. 6 for the topic in the example of Fig. 4:
0-1 soccer soccer belgium Relevant images contain photographs of the Belgium team in a soccer match.
0-2 soccer spain soccer Relevant images contain photographs of the Spain team in a soccer match.
0-3 soccer beach soccer contain Relevant images contain photographs of a soccer beach match.
0-4 soccer italy soccer Relevant images contain photographs of the Italy team in a soccer match.
0-5 soccer soccer netherlands Relevant images contain photographs of the Netherlands team in a soccer match or the teams in
Netherlands' league.
1-1 …</p>
      <p>Fig 6. [qf2] example
[qf3] “BELGAtopics-tct-(q-cl)-fQ.txt”: the same as above but just with the text from title and
clusterTitle. Fig 7. shows the constructed queries from the example topic for [qf3].
0-1 soccer soccer Belgium
0-2 soccer spain soccer
0-3 soccer beach soccer
0-4 soccer italy soccer
0-5 soccer soccer Netherlands
1-1 …</p>
      <p>Fig 7. [qf3] example
[qf4] “BELGAtopics-topEnt(1..25)+capEnt(26..50)-(q-cl)-fQ.txt”: one query for each cluster (except
negatives ones) of each topic from 1 to 25 and for each one of the three images of each topic from 26 to
50. The associated text of each query is obtained extracting the named entities (with the NER tagger
module) from the clusterTitle and clusterDescription fields of the corresponding topic, in the case of
topics 1 to 25, and from the associated XML files for each of the three images in the case of topics 26 to
50. In the case of the topic example in Fig. 4, the corresponding constructed queries in [qf4] would be:
0-1 soccer belgium belgium belgium soccer soccer match
0-2 spain spain soccer spain spain team soccer soccer match
0-3 beach soccer soccer soccer beach match beach match
0-4 italy soccer soccer italy italy team soccer soccer match.
0-5 soccer netherlands netherlands netherlands team soccer soccer match netherlands league
1-1 …</p>
      <p>Fig 8. [qf4] example
[qf5] “BELGAtopics-cap(title+desc)-(26..50)-(q-cl)-fQ.txt”: one query for each one of the three
images of each topic from 26 to 50. The text for each query is obtained from the concatenation of the
TITLE and DESCRIPTION fields of the XML files for these captions.
2.3</p>
    </sec>
    <sec id="sec-4">
      <title>IDRA text-based index and retrieval</title>
      <p>IDRA textual retrieval is based on the VSM approach using weighted vectors based on the TF-IDF weight.
Applying this approach, a representing vector will be calculated for each one of the image captions in the
collection. The components of the vectors will be the weight values for the different words in the collection.
When a query is launched, a vector for that query is also calculated and compared with all the vectors stored
during the index process. This comparison will generate the ranked results list for the launched query.</p>
      <p>The textual retrieval task architecture can be seen in the Fig. 1. Each one of the components takes care of a
specific task. These tasks will be sequentially executed:</p>
      <p>Text Extractor. Is in charge of extracting the text from the different files. It uses the JDOM Java API
to identify the content of each of the tags of the captions XML files. This API has problems with some
special characters, so it is needed to carry out a pre-process of the text to eliminate them.</p>
      <p>Preprocess. This component process the text in two ways:
o special characters deletion: characters with no statistical meaning, like punctuation marks, are
eliminated.
o stopwords detection: exclusion of semantic empty words from a new constructed list, different
from last year one.</p>
      <p>XML Fields Selection. With this component, it is possible to select the desired XML tags of the
captions files, which will compound the associated text describing each image. In the captions XML
files there are eight different tags (DOCNO, TITLE, DESCRIPTION, NOTES, LOCATION, DATE,
IMAGE and THUMBNAIL). In the index process, the selected tags from the captions XML files had
been three: TITLE, DESCRIPTION, and LOCATION.</p>
      <p>IDRA Index. This module indexes the selected text associated with each image (its XML caption). The
approach consists in calculate the weights vectors for each one of the images selected texts. Each vector
is compounded by the TF-IDF weights values [16] of the different words in the collection. TF-IDF
weight is a statistical measure used to evaluate how important a word is to a text in a concrete
collection.</p>
      <p>⎛ N ⎞
TF − IDF = ti, j * log2 ⎜ ⎟
⎝ ni ⎠
ti,j: number of occurrences of the word tj in caption text Ti.</p>
      <p>N: total number of images captions in the collection.
ni: number of captions in which appears the word ti.</p>
      <p>(1)
All the weights values of each vector will be then normalized using the Euclidean distance between the
elements of the vector. Therefore, the IDRA Index process update the next values for each one of the
words appearing in the XML captions collection: ni: number of captions in which appears the word ti,
Ti: identifier of the image XML caption, ti,j: number of occurrences of the word tj in a caption text Ti,
idfj: inverse document frequency ( log2(N/ni) ) in Ti, Ei: Euclidean distance in the corresponding vector
used to normalize, wj,i: weight of word tj in Ti.</p>
      <p>IDRA Search. For the query text is also calculated his weights vector in the same way as above. Now,
the similarity between the query and an image caption will depend on the proximity of their associated
vectors. To measure the proximity between two vectors we use the cosine.</p>
      <p>sim (T i, q ) = cos( Ö ) =</p>
      <p>∑ w j, i * w j, i
∑ w j, i *w j, i *
∑ w q , i * w q , i
(2)
This value of similarity will be calculated between the query and all the images captions indexed, and
the images will be ranked in descending order as the IDRA result list.</p>
      <p>To index the collection, the system needs approximately 2 days to index each one of the 5 parts in which the
collection was divided to be indexed. These 5 indexations processes can be executed concurrently. Queries file
response time depends on the concrete queries file launched (on the large of the queries texts), but it takes over
10 hours to obtain a results file for 119 queries (119 queries at cluster level).
2.4</p>
    </sec>
    <sec id="sec-5">
      <title>Named Entity Recognition (NER) functionality</title>
      <p>The general aim of using Named Entity Recognition (NER) in our approach was to perform retrieval using the
named entities extracted from both the documents (XML-structured captions as discussed in section 2.1) and the
topics. Due to time constraints, only the later was used in one of the runs submitted ([run5]).</p>
      <p>
        Considering the nature of the text in the topics, namely, all words were lowercase, topics in the part 2 file, did
not contain enough text, etc., would not make it easy to use an off-the-self named entity tagger, we decided
instead to tag (tokenize, Part-Of-Speech and NER) the captions document as released by the imageCLEF
organisers. The C&amp;C taggers [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] were used off-the-self. After that, our task was reduced to extract those unique
linguistic expressions that were tagged as Location, Person or Organization.
      </p>
      <p>However, we soon realized that the resulting annotation was not correctly picking up information which
seems to be crucial to determine the topic of the images. More specifically, "Time" would refer to an
Organization whereas the description "Time magazine correspondent" would refer to a person and, as such, the
modifier correspondent seems relevant to describe the situation captured by a given image. Most of these
modifiers would not be picked-up by a NER tagger because they are not in uppercase.</p>
      <p>This turned out to be a symptom of a more general problem as presented in [9]. For tasks such as IR, it is not
sufficient to have low level tools that produce high quality linguistic annotation and analysis, but it is also
required that the results of the various levels of annotation be consistent with each other Our proposal aims to
exploit the interaction between the various levels of annotations (POS, NER and Chunks) provided by the C&amp;C
taggers in order to obtain a better bracketing of named entities. The general idea is to create foci consisting of
those words or expressions marked-up as named entities. Whenever the C&amp;C tagger annotates a word as a
named entity, a chunk/phrase is built around it by attaching those surrounding/satellites terms that act as
modifiers of the named entity according to their POS and membership to a particular chunk. We are currently
able to deal with periods and abbreviations, with prepositions, adjectives and nouns. This approach allows us to
extract entities such as Paris-Roubaix race, princess Mathilde, Leonardo da Vinci international airport (instead of
Leonardo da Vinci), District of Columbia, Royal Palace of Brussels, etc. The following caption (id 1470132) is
illustrative of our procedure:
“American photojournalist James Nachtwey in a file photograph from May 18 2003 as he is awarded the
Dan David prize in Tel Aviv for his out standing contribution to photography. It was announced by Time
magazine on Thurs day, 11 December 2003 that Nachtwey was injured in Baghdad along with Time
magazine senior correspondent Michael Weisskopf when a hand grenade was thrown into a Humvee they
were traveling in with the US Army. Both journalists are reported in stable condition and are being
evacuated to a US military hospital in Germany.”</p>
      <p>On the one hand, the named entities annotated by the tagger for this text are: American James Nachtwey, Dan
David, Tel Aviv, Baghdad, Michael Weisskopf, US Army, US, Germany, and Nachtwey. On the other, the
descriptions extracted by our system are: American photojournalist James Nachtwey, Dan David prize, Tel Aviv,
Baghdad, Time magazine senior correspondent Michael Weisskopf, US Army, US military hospital, Germany
and Natchtwey. It is particularly noticeable that our system was able to recognize Time magazine, and that the
topic of the caption is about the Dan David prize and a US military hospital in Germany.</p>
      <p>This approach was used to extract both the named entities and descriptions of all the captions. We then used
the lowercase version of the entities found in the captions that were present in the part 1 file of the topics. This
method was also applied to the captions corresponding to the images that formed the clusters in part 2 of the
topics. They constituted the queries file [qf4] for the [run5] (see section 2.2 and 3). The total process of
annotation, analysis, extraction of entities/descriptions took 87 machine hours (on a standard Pentium 4 PC).
2.5</p>
    </sec>
    <sec id="sec-6">
      <title>Visual Retrieval</title>
      <p>
        The VISION-Team at the Computer Science Department of the University of Valencia has its own CBIR system
mainly used for relevance feedback algorithms evaluation [
        <xref ref-type="bibr" rid="ref8">8, 15</xref>
        ], and that was used for ImageCLEF 2008 for
the first time. The low-level features of the CBIR system have been adapted for the images of the new image
database (2009) taking into account the results of the last year.
      </p>
      <p>As in most CBIR systems, a feature vector represents each image. The first step at the Visual Retrieval system
is extracting these features for all the images on the database as for each of the cluster query topic images for
each question. We use different low-level features describing color and texture to build a vector of features. The
number of low-level features has been increased from the 68 component (ImageCLEF 2008) up to 114
components at the current edition. This increment is mainly due to the use of local color histogram descriptors
that were not use last year.</p>
      <p>•
•</p>
      <p>Color information: Color information has been extracted calculating both local and global histograms
of the images using a bin of size 10x3 on a HSV color system. Local histograms have been calculated
dividing the images in four fragments of the same size. For this database, only the H (hue) component
has been used so that the other values where almost zero for as it happened at the IAPR database.
Therefore, a feature vector of 10 components for the global histogram, and 40 components for the local
histograms represent the color information of the image.</p>
      <p>
        Texture information: As it was done for the IAPR data base, six feature textures have been computed
for this repository respectively. The first three ones use code from the implementation done by Smith
and Burn in Meastex [19]; the rest have been implemented by the authors. The total of texture features
builds a vector of 64 components:
o Gabor Convolution Energies [10].
o Gray Level Coocurrence Matrix also known as Spatial Gray Level Dependence [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
o Gaussian Random Markov Fields [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
o The granulometric distribution function, first proposed by Dougherty [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We have used here
not the raw distribution but the coefficients that result of fitting its plot with a B-spline basis.
o Finally, the Spatial Size Distribution [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We have used two different versions of it by using as
the structuring elements for the morphological operation that get size both a horizontal and a
vertical segment.
      </p>
      <p>The second step is to calculate the similarity distance between the features vectors from each image on the
database to each the cluster images. Last edition (ImageCLEF 2008), we tested two different metrics to calculate
this distance: the Euclidean and the Mahalanobis. In all experiments better results were obtained with the
Mahalanobis distance due to the fact that this measure takes into account the correlations of the data set and is
scale-invariant being this feature very useful because the broad differences between the different low-level
feature values. Therefore, this edition only the Mahalanobis distance has been used in our experiments. These
sorted lists are passed to the merging module of the global system.
2.6</p>
    </sec>
    <sec id="sec-7">
      <title>Merging algorithms</title>
      <p>Different merging algorithms were developed in order to fuse together different results lists from visual or
textual modules, different textual indexations, or cluster level results into a unique topic level results list. All the
3 merging algorithms detailed here require trec_eval format [19] for input results lists, which is the format
required by the ImageCLEF organization to submit the definitive runs.</p>
      <p>MAXmerge. This algorithm is used to fuse together the N (configurable value) results lists obtained when a
concrete queries file is launched against the N indexations corresponding to the N parts in which the collection
was divided to be indexed. For each query, the algorithm selects the results from the N lists which have a higher
relevance value for the corresponding query, independently of the list the results appears in. The maximum
number of results per query in the resulting list is set up to 1000 (‘max’ is a configurable parameter).</p>
      <p>EQUImerge. When a results list contains results for queries which references clusters (not topics), this
algorithm is used to select the formers represents of each cluster to build a unique results list with results for
queries which references topics. The algorithm selects the first result of each query-cluster, not selected yet, to
build the results list for the corresponding query. A preliminary step is carried out to separate the results into
several lists depending on the number of cluster (inside each topic) the results belong to. The relevance value
will be decremented (configurable value ‘decr’) for each result starting with the original relevance value of the
first selected result. The maximum number of results per query is set up to 1000 (‘max’ is a configurable
parameter).</p>
      <p>ENRICH. This merging uses two results lists, a main list and a support list. The merged results list will have
a maximum of 1000 results per query (configurable). If a concrete result appears in both lists for the same query,
the relevance of this result in the merged list will be increased in the following way:
newRel = mainRel +</p>
      <p>supRel
( posRel +1)
newRel: new relevance v. in the merged list
mainRel: relevance value in the main list
supRel: relevance value in the support list
posRel: position in the support list
(6)</p>
      <p>Relevance values will be then normalized from 0 to 1. Every result appearing in the support list but not in the
main one (for each query), will be added at the end of the corresponding list. In this case, relevance values will
be normalized according with the lower value in the main list. In the last year implementation of this algorithm,
this addition didn’t work correctly, so this year it has been modified in a proper way. In this experiment, main
and support lists are compound by a maximum of 1000 results for each query. Also the merged lists resulting
will be limited to the same number of results per query.
3</p>
    </sec>
    <sec id="sec-8">
      <title>Experiments (submitted runs)</title>
      <p>This campaign 5 different runs were submitted to the ImageCLEFphoto2009. All of them were based on the
5parts indexation carried out using the TITLE, DESCRIPTION and LOCATION XML tags. Runs start from the
launch of some of the queries files described in section 2.2. When one of these files is launched, it is against
these 5 indexations and later, the 5 obtained results lists over each indexation are merged using the MAXmerge
algorithm.</p>
      <p>[run1] “MirFI1_T-CT-I_TXT-IMG”: mixed (textual/visual) experiment launching [qf3], reordering
this textual results list with content-based results, and merging both lists with the ENRICH algorithm.
[run2] “MirFI2_T-CT-CD-I_TXT-IMG”: the same as above, but launching [qf2].
[run3] “MirFI3_T-CT-CD_TXT”: textual experiment launching [qf2].
[run4] “MirFI4_T-CT-CD-I_TXT”: the first part (topics 1 to 25) doing exactly the same as [run3], and
the second one (26 to 50) launching [qf5].
[run5] “MirFI5_T-CT-CD-I_TXT”: textual experiments treating with NER. [qf4] is launched to
obtain the results list.</p>
      <p>As it can be observed, all the queries files used in the submitted experiments were built at cluster level, that is,
with different queries for the different clusters of a topic, as explained in the queries files construction (section
2.2). The obtained results lists will have results for each one of the clusters of each topic, and EQUImerge
algorithm will be applied to fuse together the different clusters-based lists into a unique query-based one.
4</p>
    </sec>
    <sec id="sec-9">
      <title>Results and concluding remarks</title>
      <p>After the evaluation by the task organizers, obtained results for each of the submitted experiments are presented
in Table 1. The table shows for each run: the identifier, the mean average precision (MAP), the R-Precision, the
precision at 10 and 20 first results, the number of relevant images retrieved (out of a total of 34887 relevant
images in the collection), and the cluster recall at 10 and 20. Average values from all the experiments presented
to the task for these metrics are also shown in the table, as well as the best value obtained for each of the metrics.
MAP, R-Precision and CR@10 values are all shown in Fig. 9 in order to be visually compared.</p>
      <p>At first sight, we can observe that MirFI1, is our best run for precision metrics (very similar to MirFI2 and
MirFI3), and appears in the 16th position in R-Precision classification and in the 19th in MAP one (from a total
of 84 submitted experiments). 19 groups participate in the task and only 6 of them obtain better precision results
than our best experiment.</p>
      <p>Regarding the diversity metrics (cluster recall at 10 an 20, CR@10 and CR@20), MirFI4 and MirFI5 obtain
our best diversity values, appearing in position 11th (over 84) in cluster recall classification, and being the 5th
best group from all the 19 participating ones.</p>
      <p>%
90
80
70
60
50
40
30
20
10
0
t
esb Iir1FM Iir2FM iIr3FM</p>
      <p>MAP</p>
      <p>e
Iir4F reavg
M I5 a</p>
      <p>F
r
i
M
tesb Iir1F I2F Ir3F</p>
      <p>M irM iM iIr4F aeeg</p>
      <p>M Iir5F rav</p>
      <p>M
R-Prec
Metrics
t
s
e
b</p>
      <p>I4F I5
iIr1FM iIr2FM iIr3FM irM irFM reag
e
v
a</p>
      <p>Fig. 9. Comparison of own experiments with best and average values</p>
      <p>Comparing obtained results from experiments MirFI1 and MirFI2, we can see that not using CD (cluster
description) tag from the topics is a quite better for precision results and very similar in diversity ones. So we can
say that the addition of this field in the queries construction step was not very useful. Obtained results for
experiments MirFI2 and MirFI3 are almost the same. So we can conclude that the use of the ENRICH merging
algorithm with the visual re-ranked results list, does not affect the results in a significant way.</p>
      <p>MirFI3 and MirFI4 are differentiated in the way of constructing the second half of the queries (topics from 26
to 50). As explained in sections 2.2 and 3, MirFI4 extracts the text from the captions corresponding to the
example images included in the second part of the topics. The evaluation of the results shows that MirFI3 obtains
better precision results than MirFI4, but worse diversity ones. One reason is that the use of the captions text adds
more information to the queries, which is useful for the diversity aim, is noise for the precision one.</p>
      <p>The goal of experiment MirFI5 was to analyze if results could be improved with the use of NER techniques
for the construction of the queries. The obtained precision results for this experiment was our worse ones (seems
as a lot of noise was introduced) but the addition of entities information to the queries, improves the diversity
results. Experiments MirFI4 and MirFI5, also show how this additional information improves the diversity
results, but makes the precision ones worse.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by the Spanish R+D National Plan, by means of the project BRAVO
(Multilingual and Multimodal Answers Advanced Search – Information Retrieval), TIN2007-67407-C03-03; by
the Madrid’s R+D Regional Plan, by means of the project MAVIR (Enhancing the Access and the Visibility of
Networked Multilingual Information for the Community of Madrid), S-0505/TIC/000267; and by the Spanish
Ministry of Education and Science, by means of the project MCYT, TIC2002-03494.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arni</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanderson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grubinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of the ImageCLEFphoto 2008 Photographic Retrieval Task</article-title>
          .
          <source>In: Working Notes for the CLEF 2008 Workshop</source>
          ,
          <fpage>17</fpage>
          -
          <lpage>19</lpage>
          September, Aarhus, Denmark (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ayala</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingo</surname>
          </string-name>
          , J.: Spatial Size Distributions.
          <article-title>Applications to Shape and Texture Analysis</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>23</volume>
          , n. 4, pp.
          <fpage>1430</fpage>
          --
          <lpage>1442</lpage>
          . (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. Belga News Agency, http://www.belga.be.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chellapa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chatterjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Classification of Textures using Gaussian Markov Random Fields</article-title>
          .
          <source>IEEE Transactions on Acoustics Speech and Signal Processing</source>
          , vol.
          <volume>33</volume>
          , pp.
          <fpage>959</fpage>
          --
          <lpage>963</lpage>
          . (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dougherty</surname>
            ,
            <given-names>E.R.</given-names>
          </string-name>
          :
          <article-title>Gray-scale morphological granulometric texture classification</article-title>
          .
          <source>Optical Engineering</source>
          , vol.
          <volume>33</volume>
          , n. 8, pp.
          <fpage>2713</fpage>
          --
          <lpage>2722</lpage>
          . (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Curran</surname>
          </string-name>
          , J.:
          <article-title>Wide-coverage efficient statistical parsing with CCG and log-linear models</article-title>
          .
          <source>Computational Linguistics</source>
          , vol.
          <volume>33</volume>
          , n. 4, pp.
          <fpage>493</fpage>
          --
          <lpage>553</lpage>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Conners</surname>
            ,
            <given-names>R.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trivedi</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harlow</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          :
          <article-title>Segmentation of high-resolution urban scene using texture operators</article-title>
          .
          <source>Computer Vision</source>
          , Graphics, and
          <source>Image Processing</source>
          , vol.
          <volume>25</volume>
          , n. 3, pp
          <fpage>273</fpage>
          --
          <lpage>310</lpage>
          . (
          <year>1984</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>de Ves</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Domingo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ayala</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccarello</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A novel Bayesian framework for relevance feedback in image contentbased retrieval systems</article-title>
          .
          <source>Pattern Recognition</source>
          , vol.
          <volume>39</volume>
          , pp.
          <fpage>1622</fpage>
          --
          <lpage>1632</lpage>
          . (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>