=Paper=
{{Paper
|id=Vol-1173/CLEF2007wn-ImageCLEF-GaoEt2007
|storemode=property
|title=IPAL at ImageClef 2007 Mixing Features, Models and Knowledge.
|pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-GaoEt2007.pdf
|volume=Vol-1173
|dblpUrl=https://dblp.org/rec/conf/clef/GaoCLPL07
}}
==IPAL at ImageClef 2007 Mixing Features, Models and Knowledge.==
<pdf width="1500px">https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-GaoEt2007.pdf</pdf>
<pre>
                IPAL at ImageClef 2007
        Mixing Features, Models and Knowledge.
    Sheng Gao, Jean-Pierre Chevallet, Thi Hoang Diem Le, Trong Ton Pham, Joo Hwee Lim
                             IPAL French-Singaporean Joint Lab
                            Institute for Infocomm Research (I2R)
                    Centre National de la Recherche Scientiﬁque (CNRS)
                       21 Heng Mui Keng Terrace - Singapore 119613
             gaosheng,viscjp,stuthdl,ttpham,joohwee@i2r.a-star.edu.sg


                                             Abstract
     This paper presents IPAL ad-hoc photographic retrieval and medical image retrieval
     results in the ImageClef 2007 campaign. For the photo task, IPAL group is ranked
     at the 3rd place among 20 participants. The MAP of our best run is 0.2833, which
     is ranked at the 6th place among the 476 runs. The IPAL system is based on the
     mixed modality search, i.e. textual and visual modalities. Compare with our results
     in 2006, our results are signiﬁcantly enhanced by extracting multiple low-level visual
     content descriptors and fusing multiple CBIR. Several text based image search (TBIR)
     engines are also developed such as the language model (LM) approach, the latent
     semantic indexing (LSI) approach. We also have used external knowledge like Wordnet,
     and Wikipedia for document expansion. Then the cross-modality pseudo-relevance
     feedback is applied to boost each individual modality. Linear fusion is used to combine
     diﬀerent ranking lists. Combining the CBIR and TBIR outperforms the individual
     modality search. On medical side, our run ranks 19 among 111. We continue to use
     a conceptual indexing with vector space weighting, but we add this year a Bayesian
     network on concepts extracted from UMLS meta-thesaurus.

Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 In-
formation Search and Retrieval—Query Formulation

General Terms
Algorithms, Measurement, Performance, Experimentation

Keywords
Language model, latent semantic indexing, information retrieval, multimodality fusion, content-
based image retrieval, text based image retrieval


1    Introduction
This year IPAL group continues to participate both in the ad-hoc photographic retrieval task and
the medical task. The photographic retrieval task is diﬀerent from the task of last year in that
this year 1) the notes in the image document are not used while last year they can be used, 2)
only text keywords in the title ﬁeld are used for the query while last year all keywords in all ﬁelds
could be used, 3) query image samples are excluded from the image database, and 4) 244 new
images are added. Thus there is much less text information available, which obviously decreases
discrimination, especially for the query. That is the reason why we have tried this year to expend
documents using external information source (Wikipedia).


        Figure 1: Images with very similar text content but very diﬀerent visual content.


    After removing stop words, the average query length is about 3.3 text keywords with the max-
imal 6 keywords and the minimal 2 keywords. The less text keywords also make it impossible to
discriminate some image documents which have same text keywords; however, their real visual
contents are much diﬀerent. Some image documents in the database have very similar text decryp-
tions. Thus, only text information cannot distinguish the two images. However, when we measure
their visual content, it is easy to discriminate one from another. For example in the ﬁgure 1, both
images have similar textual content: ”Accommodation Huanchaco - Exterior View,Trujillo, Peru”
except one word ”interior/exterior”. The left image (”exterior”, image id=01/1311), is relevant
with the query 1, i.e. ”accommodation with swimming pool”, while the image at right (”interior”,
image id=01/1310) is irrelevant. With some external knowledge one could deduce that an ”inte-
rior view” has less chance to show a swimming pool. Unfortunately, this remains diﬃcult to do
in practice. Without being able to use in a eﬃcient way, extra knowledge for the matching, this
example clearly shows that the visual content plays the same important role and can complement
text matching. Therefore, this year we pay much attention to improve our Content based Image
Retrieval (CBIR) system. We also try to use external knowledge from Wikipedia.


2    Photographic Image Indexing
We build various CBIR and Text Based Information Retrieval (TBIR) systems using diﬀerent
indexing methods and similarity functions. Totally we submit 27 runs. In the following, the
details are given. To enrich the visual content representation, 4 types of low-level visual feature
are extracted from the local regions or global images. They are detailed in the following:
COR: Auto color correlogram with the depth 2 in the HSV space. It is extracted from the whole
   image and is represented by one 324-dimensional vector.
HSV: Histogram in HSV and gray-level space with 162-dimension for HSV plus 4-dimension for
   gray-level. An image is represented by a 166-dimensional vector.
EDGE: We used canny operator to detect the contours object. A vector of 80-dimensional is
   used to capture magnitudes and gradient of the contours.
GABOR: texture feature using Gabor ﬁlter in the uniformly segmented 5x5 grids at the 2-scale
   and 12-orientations. Thus, the mean and variance are calculated at each grid to generate
   48-dimensional feature vector.
SIFT: the SIFT feature is extracted using David Lowe’ tool and the region around the keypoint
    is described by a 128-dimensional appearance feature [8].
   As the above low-level features are obtained, we now begin to discus how to index the image
and calculate the similarity score. For the ﬁrst two visual features extracted from the image, two
methods are developed to index the image.
tf-idf: the tf-idf method is a popular way to describe the text document [2]. Here we use it to
      process the histogram. We treat each bin in the histogram as a distinctive word. Then
      we can calculate the occurrence number of the word occurred in the image, which is the
      frequency in the bin, and the number of the images contained the word in the database.
      Now the histogram feature is further processed using the tf-idf method to index the image.
      The ranking function is the cosine function.
SVD: use Single Value Decomposition (SVD) to remove the correlation among the feature com-
   ponents. Assuming the full rank is N, we empirically select the top N*0.8 eigenvectors and
   index the image in the eigen-space. The cosine function is used to calculate similarity score.
Bag-of-visterm: images are divided into 16 equal patches. Then, image features are extracted
    for each patch. Patches are clustered into 1000 cluster using their feature vectors. Finnally,
    we quantise each image by a discret vector. Image retrieval will be performed by normalizing
    and matching using their vector signatures.
ISM: the ranking function is learned from the training sample using the supervised learning
    algorithms. Here it is the Integrated Statistic Model (ISM) based binary classiﬁer [4]. The
    training samples include 3 query images as the positive class and another 3 negative samples
    randomly selected from the image database. Then the likelihood ratio between the positive
    model and the negative model is calculated for each image documents in the database and
    all images are ranked from the highest value to the lowest one.
    Therefore, 3 individual runs are generated for each type of visual features. For the other
two grid/patch based visual features, i.e. Gabor and SIFT, we use the following approaches to
calculate the ranking score:
HME: Use the supervised learning approach discussed above to train the hidden maximum en-
   tropy (HME) based model using the Gabor or SIFT feature and then use likelihood ratio
   for ranking images.
WORD SVD: the method is to index images at in word space rather than the image signal level
   (Gabor or SIFT space). To do so, we ﬁrst ﬁnd 200 most frequent keywords by analyzing
   the attached text documents in the image database. For each keyword, a HME classiﬁer is
   trained using the corresponding training set. The training set for a word is constructed as
   follows: if the attached text for an image contains the word, then it is labelled as a positive
   sample. Otherwise, it is labelled as the negative. After 200 HME classiﬁers are trained, we
   will get all likelihood ratio scores for an image to index the image at the 200-dimensional
   keyword space. To further reduce the correlation among the feature components, SVD is
   used as discussed above. The similarity function is still the cosine distance.
WORD ISM: we can also train the ISM model for each query topic using the 200-dimensional
   feature in the keyword space. Then the likelihood ratio is used as the ranking score.
    So there are also 3 runs for the Gabor or SIFT feature to be generated. Totally we have 12
visual runs. Table 2 summarizes the individual visual runs. Row are feature type, column are
indexing method. A ”+” means the indexing method is available for the feature

2.1    Pseudo-relevance Feedback
Learned from the ImageClef 2006 [9], the cross-modality pseudo-relevance feedback (PRF) can
improve the system performance, i.e. the TBIR can be boosted by the top-N (here 10 documents
are selected) documents from the CBIR as the feedback and vice versa. This year PRF is also
adopted.
                               Table 1: Summary of individual visual runs

                             tf-idf   SVD   ISM   HME     WORD SVD          WORD ISM
                 COR           +       +     +
                 HSV           +       +     +
                GABOR                               +          +                +
                 SIFT                               +          +                +


3      Text Indexing
We think the use of knowledge to assist the retrieval task is important, because, retrieval is clearly
a human task linked to understanding. Knowledge can be deﬁned as: information used to complete
a task. In our case it is every piece of information that can be used to help the derivation of the
document to the query. For example, the language used, the part of speech of words, and the
grammar, are simple examples of knowledge that can be useful to use for text retrieval. The
document itself can be viewed as a source of knowledge in diﬀerent steps of the retrieval process:
     • At indexing time: in the matching model. For example, the use of inverse document fre-
       quency (idf) is a simple way to use statistical global knowledge from a corpus. In a language
       model [12], the collection is also used to smooth the language model of the document.
     • At querying time: pseudo relevance feed back is a technique that uses content of top docu-
       ments that are supposed to be relevant, to enlarge the query. In a sense, part of information
       of the corpus is used to help to interpret the query. Co-occurrence thesaurus is obtained
       from text-mining from the corpus. It can be used also to enlarge the query. The eﬀect is
       similar to pseudo-feedback.
    In this experiment, we have both used knowledge from document corpus and external knowl-
edge (i.e. information that are not part of documents). The external source we use is Wikipedia.
This choice is appropriate because Wikipedia includes a large amount of proper nouns, and spe-
cially named entities. In IAPR collection, named entities play a major role.

3.1      Using Language Model and Document Knowledge
We have built two individual runs using the language model (LM) based information retrieval [12]
and latent semantic indexing (LSI) method [3] [5]. The attached text document for each image
is provided in XML format. It is pre-processed as follows: 1) extract pure text, 2) remove the
stop words and 3) do the stemming. For the LM-based TBIR, we ﬁrst build a lexicon dictionary
(6,644 words) from all text documents and then train a unigram language model only based
on the attached text document for each image in the database. This is done using the CMU-
Cambridge Statistical Language Modelling Toolkit [13] with the Witten-Bell smoothing. We
have also produced some run using the Lemur Toolkit1 . With Lemur we use absolute discount
smoothing.
    Each image is indexed by the word probabilities in the lexicon. Given a query topic and any
image in the database, the probability of query words generated by the corresponding image LM
can be calculated. The image documents in the database are ranked by the probability from the
highest to the lowest. For the LSI-based TBIR, we ﬁrst use latent semantic indexing [3, 5] to ﬁnd
the 1,635-dimension eigen space. Then the image text document is indexed by a 1,635-dimensional
vector. The cosine distance is applied to get the similarity scores between the query words and
the image documents.
    1 http://www.lemurproject.org
3.2    Using Language Model and External Knowledge (Wikipedia)
There is an increasing production of information in a collaborative form in the ”Wiki” format. It
starts from Wikipedia2 , which is a free Encyclopedia, but there exists also now Wiktionary3 , a
free open dictionnary, Wikiquote4 a compendium of quotations from notable people, Wikibooks
and Wikisource, free collections of open-content text books, Wikinews, etc. All these information
can be valuable source to help document indexing because of their large coverage. The drawback
is the lake of explicit structure: these documents are produces for human reading only.
    The full content of Wikipedia can be freely downloaded5 in XML format. XML is a more
convenient form to extract some information than HTML because tags refer to content and not
only layout. This format is also more adapted for automatic mining than HTML because of special
tags (i.e. non XML) that are used to produce the HTML page. They can be use to localize typed
content. For example information boxes on the right of a Wikipedia page, comes from diﬀerent
types of structures: Infobox, Geobox, Taxobox. These are structured in explicit ﬁelds : name,
type, etc. They are templates which provide structured standardized information, mainly for
named entities.

                         Figure 2: Exemple of information box
{{Infobox Mountain
| Name=Quilotoa [[Image:Flag_of_Ecuador.svg|38px]]
| Photo=QuilotoaPanorama1.jpg
| Caption=Panorama of the lake-filled Quilotoa caldera.
| Elevation=3,914 m
| Location=[[Ecuador]]
| Range=[[Andes]]
| Prominence =
| Coordinates = {{coor dm|00|85|S|78|90|W|type:mountain_region:EC}}
| Topographic map =
| Type=[[Caldera]]
| Age= approx. 40,000 years
| Last eruption= 1280 (?)
| First ascent=
| Easiest route=
}}


    We have used the dump of 2 April 2007 of Wikipedia in English, which is a 9.8Gb ﬁle. This
ﬁle is organized in 4881983 pages. We have used the Taxobox structure to extract a list of animal
and plants. This structure is used to abstract common fact in a box about taxonomy of living
things (animal and plants). We found 36188 of this box from which we extract 34321 terms, using
the name and regnum ﬁeld. For CLEF, only the 23399 animal terms are useful for queries 5, 20
and 35. Due to lake of disambiguation, terms like wall (a butterﬂy) and plane (a tree) can cause
noise.
    We have also used the Geobox for geographical terms. There is only 1805 box of this type,
from which we can extract only 709 terms because we put a strong typographical constraint on
terms, for example, we exclude abbreviations. This list contains river, mountain, region, city.
For CLEF only mountain should be useful for queries 4, 44. The general box type Infobox is the
most common in this version of Wikipedia, with 320311 objects. The keyword Infobox is followed
by 2057 diﬀerent types. These types are a useful indication of the category of the object described
in the page. Table 2 shows the top 24 Infobox categories with frequencies.
  2 http://www.wikipedia.org/
  3 http://www.wiktionary.org/
  4 http://en.wikiquote.org
  5 http://download.wikimedia.org
                      Figure 3: Exemple extractions from Taxobox
stevia|plant
squirrel|animal
springbok|animal
springbok antelope|animal
smelt|animal
smelts|animal
nitidulidae|animal
sap beetle|animal
scorpion|animal
streptococcus|eubacteria
snakes|animal


                               Table 2: Top 24 of Infobox types
       47765    Album               4857 City                   3379    Biography
       20042    Film                4839 actor                  3163    band
       10056    Single              4670 Television episode 2972        Radio Station
        9793    musical artist      4353 University             2936    Software
        9026    CVG                 3976 Military Conﬂict       2789    Australian Place
        7961    Company             3925 road                   2681    Swiss town
        6307    CityIT              3805 Book                   2540    Broadcast
        5070    Indian urban area 3449 Mountain                 2475    Military Person


    For all these extractions, we use the title of the page to complete the name on the box. This
helps to capture variations. More variations can be found using the directive #REDIRECT that
link a page directly to another page. By using the title tag, we build a ﬁle that contains terms
variations. This variation is then used to enlarge the previous lists of terms.


                      Figure 4: Example of redirection extraction
torres del paine|cordillera del paine
whitpaines creek|wissahickon creek
cocker spainell|cocker spaniel
paine art center|paine art center and gardens
emma paine|the x factor uk series 1
torres del paine national park|cordillera del paine


3.3    Document Expansion with Extracted Knowledge
We have ﬁrst proposed a basic indexing, using language model implemented into Lemur (see below
run 09) with a MAP of 0.1342. We use corpus knowledge in the form of pseudo-feedback with
7 documents and 6 best terms. This gives an important improvement (run 21). We have test
the diﬀerent smoothing methods implemented in Lemur. The simple linear interpolation between
document and corpus probability (Jelinek-Mercer method) gives good results with a weight of 0.8
between document and corpus statistics. The absolute discount seems to give similar results, a
little better than the Bayesien dirichet one. With this setting we obtain a MAP of 0.1588. These
results are similar with the CMU Toolkit (run 08). We have then expanded the documents with
knowledge extracted from Wikipedia (run 15) with MAP of 0.1593. Using Wikipedia and pseudo
feed back give similar results. Usually, it is the query which is expanded. We have chosen to
expand document, ﬁrst because, in this collection, document are very short, but also because
we wanted to generalize named entities to a more general category in order to answer queries
with generic terms. Also, because we are using language model, hence statistics on document are
important. When we expand all documents with the Taxobox information, and all Geobox and
Infobox on the type Mountain, the increase is strong but similar to pseudo feedback.
    We use little information from the images: the fact that they are black an white or colors using
a tool based on maximum saturation (HVS) of the image. This information (”black and white
photo”) is automatically added to the description of all images (run 17). With this information
from image we obtain 0.1707 of MAP, only by strongly enhancing the query 11. Finally the adding
of more information with a manual built thesaurus (run 19), is a more eﬃcient way to increase
the MAP to 0.1806 from run 17. Combining this run with the visual one 04 give good results in
run 20 (MAP 0.2295 but not as good are our other results. We can say that adding knowledge is
eﬀective, but not as good as one may expect. In fact run 11 with MAP of 0.2442 is better because
is uses feed back information from the visual side. Hence it seems that inter-media feed back is
more eﬀective that adding external knowledge.


4      Description of Submitted Runs
Totally 27 runs where submitted including the CBIR runs, TBIR runs and mixed runs. To combine
the ranking scores from diﬀerent runs, the linear fusion method is utilized. The weight coeﬃcients
are empirically set. In the following we describe6 the condition of each type of run. After the run
number, the V means only visual, T only textual, VrT means pseudo relevance feedback on visual
rank list using textual modality, TrV is also preudo feedback but on text using visual ranking,
TiV means a textual indexing but with some input from visual analysis, ﬁnally TfV means a late
fusion between the two modalities. We give also the oﬃcial MAP.

01V HSV 0.0684 index at the HSV SVD space and cosine distance. It is the CBIR baseline
    with only one type of feature.
02VrT HSV 0.0693 run 01 plus the PRF. The feedback documents are from the LM-based
    TBIR, i.e. run 08. It is provided to demonstrate the PRF eﬀect. For the other 11 CBIR
    runs, the same method is carried out. Thus, 12 PRF-based CBIR runs are generated.
03V 12RUNS EQWEIGHT 0.1122 combine 12 CBIR runs using the equal weights, i.e. not
    tuning the weights.
04V 12RUNS WEIGHT 0.1204 combine 12 CBIR runs using the empirically tuned weights.
05VrT 12RUNS 0.1358 combine 12 PRF-based CBIR runs using the empirically tuned weights.
    Refer to run 02 for more details.
06V FMLF HSVEDGEWW 0.0547 global feature matching (HSV histogram and EDGE his-
    togram) with late fusion and weighting score for each feature. Normalize feature matrices
    (vectors) by Mean and Var. Linear fusion of rankink lists with weighting scores 0.7 for HSV
    and 0.3 for edge.
07V FMLF HSVEDGEGABORNW 0.0541 global feature matching (HSV histogram and
    EDGE histogram and Gabor features) with late fusion and equal weight for each feature.
08T LM SWITTEN 0.1377 only LM-based TBIR system with CMU and Wittenbaum smooth-
    ing.
09T LMKL S0M2D0.7 0.1342 only LM-based TBIR system with Lemur, no stemming. Smooth-
    ing used with interpolate smoothing strategy. The smoothing method is the ”Absolute
    discount” with the default parameter 0.7 for the discount delta.
    6 to reduce the name size we remove the IPAL preﬁx of the run name
10T LSI 0.1322 only LSI based TBIR system.
11TrV LM 12RUNSVISUAL 0.2442 LM-based TBIR, i.e., run 08, plus the PRF. The feed-
    back documents are from the combined CBIR without PRF, i.e. run 04.
12TiV LM 12RUNSVISUAL FB 0.2516 LM-based TBIR, i.e., run 08T, plus the PRF. The
    feedback documents are from the combined CBIR with PRF, i.e. run 05.
13TfV LM 12RUNSVISUAL 0.2226 7 combine runs 08 and 04.
14TfV LM FB 12RUNSVISUAL 0.2833 combine runs 11 and 04.
15T WTm S0M2D0.8C6T6 0,1593 LM-based TBIR with document expansion using infor-
    mation automatically extracted from Wikipedia. It uses extracted plant, animals, geographic
    terms, mountains, and also information of the redirection. New terms are added to docu-
    ment to enlarge the matching. LM discount is adapted to 0.8 and top document in pseudo
    relevance feedback to 6 and keep 6 terms.
16TfV WTm 12RUNSVISUAL 0.2180 combine runs 15 and 04.
17TiV WTm S0M2D0.8C6T6 0.1707 This run is an extension of run 15 using a black and
    white image detector based on HSV value of image. This information is injected into all
    corresponding texts. The matching is still done on text domain. This just boosts only one
    query, but boost globally the result.
18TiVfV T1.0V1.0 0.2165 The run 19 is fused with purely visual run 04. This late fusion is
    linear without weight, but with a linear transformation of all RSV output to the interval
    [1.0;0.00001] We do not choose to go to zero as minimum, to diﬀerentiate with document
    that are not retrieved, in either of the two modalities. We consider that a document not
    retrieved has a Relevance Status Value of zero.
19TiV WTmM S0M2D0.8C6T6 0.1806 This run is an extension of run 17, with a very small
    adds thesaurus manually extracted from Wikipedia. It is terms that are not from info boxes
    but are relevant to this collection.
20TiVfV T1.0V1.0 0.2295 The same simple fusion is performed than run 18 but with the
    textual run 19.
21T LMKL S0M2D0.7C9T6 0.1588 This run is a variation of run 09. Relevance feed back
    is used on text only, with 9 top documents used as feedback and 6 best terms added to the
    query. The improvement is important.
22V 13RUNSVISUAL 0.1195 combine 12 CBIR runs and one run with the edge histogram
    using the equal weight, i.e. not tuning the weights.
23TfV LM FB 13RUNSVISUAL 0.2732 combine runs 11 and 22.
24 FMLF HSVEDGENW 0.0526 same as run 06 but using eual weight for fusion.
25V FMLF HSVEDGEGABORWW 0.0554 same as run 07. Linear fusion of two ranking
    lists is performed with weighting scores 0.6 for HSV and 0.3 for edge and 0.1 for gabor.
26V COOC PATCHES HSVEDGE 1000CL 0.0369 index using Bag-of-visterm model with
    1000 clusters.
27V COOC 16PATCHES HSVEDGE 1000CL KEYWORDFILTER 0.0311 same as run
    26 but with ﬁlter of visual cluster by using top 100 most frequent keywords from text
  7 Due to some error this run has not being oﬃcially evaluated.
5     Results for Photographic Task
The best run is 14 with the MAP value 0.2833. It is ranked at the 6th place among 476 runs
and at the 2nd place among automatic runs. This run is the mixed modality retrieval plus the
CBIR based pseudo-relevance feedback (see section 2.4 (item 11) for the detailed description to the
run). The best CBIR run reaches the MAP value 0.1204 for run 04 without the PRF. It is ranked
at the 4th place for the visual based retrieval. Compared with the 1st run (INAOE/ INAOE-
TIA-INAOE-VISUAL-EN-AN E, MAP: 0.1925) and the 2nd run (XRCE/ XRCE-IMG COMBFK,
MAP: 0.1890), there are still room for us to improve our CBIR system. Now we see how PRF
eﬀects on the performance. First, for the CBIR system, the feedback from the TBIR seems have
few eﬀect on the performance. For example, with the PRF, the MAP of the HSV-based CBIR
(run 02) is only increased to 0.0693 from 0.0684 (run 02). When combining 12 CBIR runs, the
MAP is increased to 0.1358 (run 05)) from 0.1204 (run 04). This should due to the diversity of
visual content and divergence of visual features. However, the PRF has a signiﬁcant improvement
for the TBIR. For example, the MAP value of the LM-based TBIR is 0.1377 (run 08). When the
CBIR based pseudo-relevance feedback (run 04) is applied, signiﬁcant improvement is observed
and its MAP value reaches 0.2442 (run 11).


6     Medical Task: Bayesian Network Conceptual Indexing
We propose a Bayesian network based model, inferred from [14], for capturing the concepts and
semantic relationship between document concepts and query concepts. The conceptualisation is
based on an external knowledge, named UMLS (Uniﬁed Medical Language System). The method
of concept extraction is the same as our previous work in CLEF2006 [6], i.e. we use Metamap [1]
for mapping concepts in English reports, and our tool for French and German reports after tagging
by TreeTager. These concepts extracted are then used to build up the network for our Bayesian
network based retrieval framework. Bayesian network is a Direct Acyclic Graph(DAG) [11], called
also belief network, causal network. Our Bayesian network based model includes document nodes,
query nodes and concept nodes and direct links between nodes.
    There are two types of links: links which connect concepts nodes to documents or queries in
which these concepts appear and links between document concept nodes and query concept nodes
if there’re semantic relations between them found in UMLS. These semantic relations are derived
from UMLS relationship database ﬁles (MRREL.RRF for direct relation and MRHIER.RRF for
hierarchical context of each instant of concept in diﬀerent knowledge sources of UMLS). We decide
the direction of links is on the path from documents to queries so that we can calculate the
probabilistic inference from documents to queries in the network. Nodes which are pointed by a
link are considered as child nodes.
    The retrieval process is a probabilistic inference from document node to query node, considered
as a 3-steps process:
    • A document is observed which permits an initiation of prior probabilities of document con-
      cept nodes: in this model we chose tf.idf normalized as prior probability for document
      concept nodes.
                                                        wc
                                        P (c|D) = 
                                                               2
                                                        c ∈D wc


      with
                                             wc = tf ∗ idf

    • These prior probabilities of document concept nodes are then inferred to query concept nodes
      via semantic links between them by this formulas:

                                P (c|pa(c), D) = maxi (α ∗ P (pai (c)|D))
      in which pa(c) is the set of parent concepts of c, α is parameter used to estimate the
      uncertainty of the relatedness or ”aboutness” between c and its parent according to the type
      of semantic link. We suppose α may depend also on the quality of knowledge source who
      supply the relationship information. The bigger α is, the more relevant the two concepts
      are.
      We predeﬁne α empirically in the range [0, 1] according to the semantic relation correspond-
      ing or estimate it by the semantic similarity formulas of Leacock-Chorodo [7] in case of
      indirect hierarchical relations:

                                                                 2∗L
                                         sim(c1 , c2 ) = log
                                                               l(c1 , c2 )

      in which L is the maximum length of the taxonomy, l(c1 , c2 ) is the minimum length of path
      from c1 to c2 .
    • The relevance status value (RSV) of a query Q corresponding to a document D is the con-
      ditional probability of query node Q with the condition that document node D is observed.
      In this case we use the inference mechanism as [11] in which the probability of a node given
      their parents is calculated by link matrix. We chose weighted sum link matrix formulas in
      this case in other to take into account at the same time the inferred probability and weight
      of query concept nodes:
                                                              
                                                                  pi wi
                                 RSV (Q, D) = P (Q|pa(Q), D) = i
                                                                  i wi


      pi is the probability of query concept node (in the set pa(Q)) inferred from document concept
      nodes of D; wi is the weight of query concept node correspond to query Q. This weight is
      measured by tf.idf normalized similarly to document concept nodes.

    The relevant status value in the ranking list then can be re-weighted by semantic groups
of concepts: RSV is multiplied with the number of concepts in the matched concepts set that
correspond to the three important semantic groups: Anatomy, Pathology, and Modality. The
purpose is to emphasize the important of these three semantic groups considering the structure of
queries.

6.1    Results of Medical Runs
For the CLEF image medical 2007 collection [10], we analyse the new image sets(MyPacs and
Endoscopic) the same way as the collection last year.
    In the submitted runs, based on predeﬁned set of semantic relations in UMLS, we use isa-
indirect relations in four runs and diﬀerent types of relations in the two other runs: isa, parent-child
other than isa(PAR-CHD), broader-narrower(BR-RN), similar or alike(RL), related or possibly
synonym(RQ). The value of α in all of these submitted runs is predeﬁned in the range (0, 1),
corresponding: 0.1, 0.2, 0.3, 0.4. In addition to these runs, we experiment another run which
replaces predeﬁned α by semantic similarity measurement of Leacock-Chorodo for isa-indirect
relation. This method yeilds similar result. Results of our submitted runs are reported in table 3.


7     Conclusion
The Bayesian model proposed here, subsume the VSM model with cosine similarity, and in addition
take into account the semantic relationship between documents concepts and query concepts in
an uniﬁed framework. Experimentation shows that this model can enhance the VSM by adding
the semantic relatedness between concepts. Improvements on relationship weighting issue as well
as performance of model are our further study.
                            Table 3: MAP results of CLEFMed2007
  Run                            Isa PAR-CHD BR-RN RL                       RQ      Map      R-prec
  IPAL-TXT-BAY-ISA0.1            0.1   0             0             0        0       0.3057   0.332
  IPAL-TXT-BAY-ALLREL2           0.2   0.01          0.01          0        0       0.3042   0.333
  IPAL-TXT-BAY-ALLREL1           0.2   0.01          0.01          0.001    0.001   0.304    0.3338
  IPAL-TXT-BAY-ISA0.2            0.2   0             0             0        0       0.3039   0.3255
  IPAL-TXT-BAY-ISA0.3            0.3   0             0             0        0       0.2996   0.3212
  IPAL-TXT-BAY-ISA0.4            0.4   0             0             0        0       0.2935   0.3177


    For the photographic task, we have improved our CBIR system the use of multiple visual
features and diﬀerent indexing techniques. Combined with the cross-modality pseudo-relevance
feedback and text based search, our system obtains a good performance. From our results, we ﬁnd
1) CBIR is an important component to search multi-modality image database, especially when the
text description has weak discrimination; 2) cross-modality pseudo-relevance feedback can enhance
the performance. But it has much more eﬀect on the text based search than on the visual based
search. The use of external knowledge like Wikipedia gives an extra bonus to the results but is
less eﬀective than expected. The large size of this set tends also to produce confusion because of
lake of disambiguation. This disambiguation problem occurs mostly on small terms.


References
 [1] Alan R. Aronson.         Metamap:      Mapping         text    to     the   umls   metathesaurus.
     http://mmtx.nlm.nih.gov/docs.shtml, July 2006.
 [2] Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM
     Press / Addison-Wesley, 1999.
 [3] Jerome R. Bellegarda. Exploiting latent semantic information in statistical language modeling.
     Proceedings of the IEEE, 88(8):1279–1296, August 2000.
 [4] Sheng Gao, Joo-Hwee Lim, and Qibin Sun. An integrated statistical model for multimedia
     evidence combination. In ACM Multimedia’07, 2007.
 [5] Sheng Gao, Wen Wu, Chin-Hui Lee, and Tat-Seng Chua. A maximal ﬁgure-of-merit (mfom)-
     learning approach to robust classiﬁer design for text categorization. ACM Trans. Inf. Syst.,
     24(2):190–218, 2006.
 [6] Caroline Lacoste, Jean-Pierre Chevallet, Joo-Hwee Lim, Xiong Wei, Daniel Raccoceanu, Diem
     Le Thi Hoang, Roxana Teodorescu, and Nicolas Vuillenemot. Ipal knowledge-based medical
     image retrieval in imageclefmed 2006. In Working Notes for the CLEF 2006 Workshop, 20-22
     September , Medical Image Track, Alicante, Spain, 2006.

 [7] C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense
     identiﬁcation. In C. Fellbaum, editor, WordNet: An Electronic Lexical Database, chapter 11,
     pages 265–284. MIT Press, 1998.
 [8] David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput.
     Vision, 60(2):91–110, 2004.

 [9] Nicolas Maillot, Jean-Pierre Chevallet, Vlad Valea, and Joo Hwee Lim. Ipal inter-media
     pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In Working Notes for
     the CLEF 2006 Workshop, 20-22 September , Alicante, Spain, 2006.
[10] Henning Müller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer, Thomas M.
     Deserno, Paul Clough, and William Hersh. Overview of the ImageCLEFmed 2007 medical
     retrieval and annotation tasks. In Working Notes of the 2007 CLEF Workshop, Budapest,
     Hungary, sept 2007.
[11] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference.
     Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.
[12] Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In
     SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research
     and development in information retrieval, pages 275–281, New York, NY, USA, 1998. ACM
     Press.
[13] Clarkson P. R. and Rosenfeld R. Statistical language modeling using the cmu-cambridge
     toolkit. In Eurospeech 1997, 1997.
[14] Howard Turtle and W. Bruce Croft. Evaluation of an inference network-based retrieval model.
     ACM Trans. Inf. Syst., 9(3):187–222, 1991.

</pre>