=Paper=
{{Paper
|id=Vol-1173/CLEF2007wn-ImageCLEF-EscalanteEt2007
|storemode=property
|title=TIA-INAOE's Participation at ImageCLEF 2007
|pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-EscalanteEt2007.pdf
|volume=Vol-1173
|dblpUrl=https://dblp.org/rec/conf/clef/EscalanteHLMMMSP07
}}
==TIA-INAOE's Participation at ImageCLEF 2007==
TIA-INAOE’s Participation at ImageCLEF 2007 H. Jair Escalante, Carlos A. Hernández, Aurelio López, Heidy M. Marı́n, Manuel Montes, Eduardo Morales, Luis E. Sucar, Luis Villaseñor Coordinación de Ciencias Computacionales Instituto Nacional de Astrofı́sica, Óptica y Electrónica, Luis Enrique Erro No. 1, 72840, Puebla, México {hugojair,carloshg,allopez,hmarinc,mmontesg,emorales,esucar,villasen}@ccc.inaoep.mx Abstract This paper describes the participation of INAOEs research group on machine learning, image process- ing and information retrieval (TIA) in the photographic retrieval task at ImageCLEF2007. Many ex- periments were performed comprising most of the query and target languages proposed for this years competition. A web-based query expansion technique was proposed for introducing context terms in the query topics. An intermedia blind relevance feedback technique was adopted in some runs for query refinement. Furthermore, experiments were performed with a novel technique for query/document expan- sion based on automatic image annotation methods. Initial experimental results give evidence that this idea could help to improve effectiveness of retrieval methods, though several issues should be addressed. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Management]: Languages—Query Languages General Terms Measurement, Performance, Experimentation Keywords Image retrieval, automatic image annotation, web-based query expansion, intermedia feedback 1 Introduction Technology innovations have provided humans with more information than we are able to manually analyze. Currently, many huge repositories with textual, visual and auditive information are available. In addition there are collections of mixed data, for example images accompanied with textual descriptions. These last type of collections are referred to as annotated collections (one data type, text, is the annotation of another data type, images). Since documents in annotated collections are described in two modalities, document retrieval from these collections is a more challenging task than that of traditional information retrieval. This due to the problem of the lack of correspondence between modalities (the semantic gap) [11]. Though the consideration of two modalities could help improving accuracy of information retrieval systems. This paper is concerned with the task of retrieval of images in (manually) weakly annotated collections. Weakly annotated image collections are image databases in which each image is accompanied by a very small textual description, regarding the visual or semantic content of the image. Current solutions for image retrieval from annotated collections are based on a single modality (text or image). Though retrieval accuracy of such systems is limited, mainly, because of the omitted modality and the querying complexity. In consequence there exist an increasing interest on the development of methods that can take advantage of all of the available information, that is text + image. In this research line, an effort is being carried out by the organizers of ImageCLEF1 ; the cross-language image retrieval track of the CLEF 2 forum, whose goal is to investigate the effectiveness of combining text and image for retrieval, to collect and provide resources for benchmarking image retrieval systems and to promote the exchange of ideas which will lead to improvements in the performance of retrieval systems. Organizers provide participants with image collections, query topics and relevance assessments of retrieval systems’ runs [9]. This sort of forums are very useful because they motivate research advances on several problems related to information retrieval, furthermore they contribute to the creation of new multidisciplinary research groups and collaborations between participants. This paper describes the participation of the INAOE’s research group on machine learning, image process- ing and information retrieval (TIA) in the photographic retrieval competition at ImageCLEF2007. This is the first time TIA participates on ImageCLEF. A total of 95 runs were submitted comprising all of the tar- get languages and the following query ones: English, Spanish, German, Italian, French, Swedish, Japanese, Portuguese, Russian and Visual. Experiments were carried out with different techniques, like intermedia feedback and fusion of independent retrieval models. Other experiments were carried out with an approach based on web query expansion and with a new method based on automatic image annotation. We propose a web-based technique for expanding ImageCELF2007 queries, with which promising results were obtained. Furthermore, we propose a new research direction not explored yet for image retrieval from annotated collec- tions. The approach consist of using automatic image annotation methods for obtaining text from the visual content of the images. This text is then combined with the original annotations of the images (and/or with the queries) and then standard strategies are adopted for retrieving documents. Experimental results show that runs based on text and image improve those based on text only. The best results were obtained with a combination of intermedia feedback and our Web-based query expansion technique. Relatively good results were obtained with the annotation based expansion. However, several issues should be addressed in order to obtain better results with this technique. The rest of the document is organized as follows. In the next section we briefly introduce the ad-hoc photographic retrieval task, for further information we refer the reader to the overview paper by Grubinger et al [9]. Next, in Section 3 the text-only methods are described. In Section 4 two techniques that were used for combining textual and image information are presented. Then, in Section 5 the proposed annotation- based expansion technique is introduced. In Section 6 the results of our runs are presented and analyzed. Finally, in Section 7, we highlight some conclusions and discuss future work directions. 2 Ad-hoc photographic retrieval The ImageCLEF2007 is a track running as part of the CLEF campaign. It comprises four tasks related to image retrieval, namely: ad-hoc photographic retrieval, object retrieval, medical image retrieval and automatic annotation for medical images. This paper presents developments and contributions for the first task, that of retrieving images from a collection of annotated photographs. The goal of this task is the following: given a multilingual statement describing an user information need, find as many relevant images as possible from the given document collection [4, 9]. Organizers provide participants with a collection of annotated images, together with some topics describing information needs. Participants use these resources with their retrieval systems and submit to the organizers the identifiers of the relevant documents (according to the retrieval system used) for each topic. Organizers evaluate the results set of each submission for every participant and rank submissions according to a standard evaluation measure. The collection of documents used for ImageCLEF2007 is the recently created IAPR TC-12 Benchmark [10]. Each query topic consist of a fragment of text describing a single information need, together with three sample images visually similar to the desired relevant images [9]. Participants use topic’s content for creating queries that are used with their retrieval systems. System’s runs are then evaluated by the organizers using standard evaluation measures like MAP [9]. 1 http://ir.shef.ac.uk/imageclef/2007/ 2 http://clef-campaign.org/ 3 Textual methods for image retrieval The predominant approach for image retrieval from annotated collections consist of using text-only retrieval methods. This mainly because in some collections annotations describe effectively the visual and semantic content of images, for example the collection used for ImageCLEF2006. However in realistic collections images are described by only a few words indicating the semantic content of each image, for example the collection used for ImageCLEF2007. In these scenario information extracted from the images could be helpful for improving retrieval performance. For this reason we performed experiments with methods based on both text and images, text-only runs were also submitted for measuring the gain we have by using visual information. Independently of the type of run all of our submissions used a textual retrieval system (for the mixed runs text was extracted from images and used with the textual retrieval system), which is described in Section 3.1. Furthermore, in Section 3.2 the proposed technique for expansion of textual queries is presented. 3.1 Baseline retrieval system Since text was used in all of the TIA’s runs a textual retrieval system was implemented. For this purposed the TMG MatlabR toolbox was used, kindly provided by Zeimpekis et al [17]. This toolbox includes standard methods for indexing and retrieval of middle size text collections. We decided to use this method because working with MatlabR allows the easy processing of images (through predefined functions), necessary for the development of the annotation based expansion technique. After removing meta-data and useless information, the text of the captions was indexed on the four target languages (English, Spanish, German and Random), resulting on four indexed collections. For indexing we used a tf-idf weighting, English stop words were removed and standard stemming was applied [1, 17]. Queries for the baseline runs were created by using the text in topics as provided by the organizers, after remov- ing meta-data. For multilingual experiments queries were translated using the online Systran3 translation software. For retrieval we considered the cosine similarity function [1]. 3.2 Web based query expansion The Web is the largest repository of information ever existed. There are millions of documents available in the Web that can be used to extract semantic knowledge. In this work we propose a web-based query expansion technique in order to incorporate context terms into the original queries, with the hope that expanded queries are able to reach (through a retrieval model) relevant documents containing other terms than the ones contained in the original queries. The proposed approach is a very intuitive method that uses the GoogleR search engine. For each topic, we take the textual description and submitted a search to GoogleR ; the top−k snippets returned by the search engine are considered for expanding a query. We tried two approaches that we called naive and repetition. The naive approach consist of taking the snippets as they are returned by GoogleR with no preprocessing. On the other hand, the repetition approach consist of retaining the terms that most co-occurred among the snippets. For the experiments reported here the top−20 snippets were considered. Queries expanded with the naive method resulted on larger queries, while the ones expanded with the repetition method included only a few extra terms. 4 Mixed methods for image retrieval In order to consider both textual and visual information in the retrieval process, two strategies were adopted: intermedia feedback and late fusion of independent retrieval systems. We decided to use these techniques due to their simplicity, and because both methods have obtained relatively good results on the retrieval task [2, 4]. 3 http://www.systranbox.com/ 4.1 Intermedia pseudo-relevance feedback Relevance feedback is a way of allowing user interaction in the retrieval process. It was first proposed for retrieval from text collections and then used on the content-based image retrieval (CBIR) task [15]. Although user interaction is allowed on the retrieval process, always is preferable to develop fully automatic systems. Pseudo-relevance feedback (PRF ) is a variant that does not need user interaction, as it considers as relevant the set of the top−k documents returned by the retrieval system. Queries are refined considering such a set of documents as relevant for the user’s information needs. This technique has been widely used in several text-based image retrieval systems within the ImageCLEF [5, 4, 9]. A novel technique based on PRF has been proposed for image retrieval from annotated collections [3, 2, 14, 12]. This technique called intermedia feedback (IMFB ) consist of using a CBIR4 system with a query image for retrieving documents. The top−k documents returned are assumed to be relevant and the captions of such documents are combined to create a textual query. The textual query is then used with a pure textual retrieval system, and the documents returned by such a system are returned to the user. This technique was first used by the NTU group at ImageCLEF2005 and since then several groups have been adopted it [2, 12, 14]. We decided to perform experiments with this technique in order to improve retrieval effectiveness of the query expansion method proposed, by taking into account information extracted from images. Combined runs of query expansion and IMFB consist of applying the query expansion technique to the textual topics. Next, the expanded queries are combined with the captions of the top−k relevant documents according to a CBIR system, and then used with our text-only retrieval system. FIRE was used as CBIR system; using the baseline run provided by the organizers [9]. 4.2 Late fusion of independent systems For some runs we adopted a late fusion strategy for merging the output of independent visual and textual systems. This merging method consists of running two independent retrieval systems using a single (different) modality each. Then the relevant documents returned by both systems are combined. For this work we proposed the following fusion strategy. Assuming that TR is the list of relevant documents (ranked in descending order of relevance) according to a textual retrieval system applied to the images’ captions; similarly, VR is the list of ranked relevant documents according the a CBIR system that only uses the images. We combined and re-ranked the documents returned by both retrieval systems, generating a new list of relevant documents LFR = {TR ∪ VR }; where each document di ∈ LFR is ranked according to the score formula given by Equation (1) α × RTR (di ) + (1 − α) × RVR (di ) score(di ) = (1) 1TR (di ) + 1VR (di ) where RTR (di ) and RVR (di ) is the position in the ranked list of document di according to the textual and visual retrieval systems, respectively. 1TR (di ) and 1TR (di ) are indicator functions that take the value 1 if document di is in the list of relevant documents according to the textual and visual retrieval systems respectively, and zero otherwise. The denominator accounts for documents appearing in both lists of relevant documents (TR and VR ). Documents are sorted in ascending order of their score. Intuitively with this score documents appearing in both sets (visual and textual) will appear at the top of the ranking, regarding their position on the independent lists of relevant documents. We tried several values for α and the best results were obtained with α = 0.9. 5 Annotation-based document and query expansion The task of automatic image annotation (AIA) consists of assigning textual descriptors (words, labels) to images, starting from visual attributes extracted from them. There are two ways of facing this problem, at image level and at region level. In the first case, labels are assigned to the entire image as an unit, not specifying which words are related to which objects within the image. In the second approach, which can 4 Note that the process can start from text, obtaining query images for a CBIR system, as well. Figure 1: Graphical schema of the proposed approach for AIA-based expansion. Figure 2: Sample images from the IAPR TC-12 2007 collection, segmented with the normalized cuts algorithm. Manual annotations are shown for each region. be conceived as an object recognition task, the assignment of labels is at region level; providing a one-to-one correspondence between words and regions5 . The last approach can provide more semantic information for the retrieval task, although it is more challenging than the former. In this work we decided to use region-level AIA methods for obtaining text from images and then using these textual labels for expanding topics and images’ annotations. We believe that by taking both modalities to a common representation and then using standard retrieval strategies can be helpful for improving single- modality approaches. A graphical description of the proposed approach is shown in Figure 1. As we can see the process involves several steps that can be summarized in the following tasks: segmentation, annotation, and expansion. 5.1 Segmentation The first step for obtaining words from regions within images is obtaining regions from images. This task is known as segmentation and consists of discovering partitions within a given image with the restriction that each of the partitions contains a single object. Many segmentation methods have been proposed, although this is still an open problem in vision. For segmenting the IAPR-TC 12 Benchmark collection we decided to use a state of the art algorithm named normalized cuts [16]. This is the algorithm used by most of the region-level annotation approaches [6, 11]. In Figure 2 sample images segmented with normalized cuts are shown. As we can see the algorithm works well for some images, isolating single objects; however for other similar images segmentation is not as good, partitioning single objects into several regions. Using a recently developed segmentation-annotation interface [13], the full IAPR-TC 12 Benchmark collection was segmented with the normalized cuts algorithm. Then visual attributes were extracted from each region. Attributes include color, texture and shape information of the region, for a total of 30 attributes. Each region is then described by its vector of attributes. In order to facilitate reading hereafter we refer to the attribute’s vector describing a region simply by the term region. 5 Note that with region-level annotation we have also an annotation at image level, which is the set of labels assigned to the regions within the image. 1 2 3 4 5 6 7 8 9 10 11 12 Sky Person Building Trees Clouds Grass Water Mountain Sand Other Furniture Road 13 14 15 16 17 18 19 20 21 22 23 24 Animal Snow Rock Sun Vehicle Boat Church Tower Plate Flag Statue Prize Table 1: Vocabulary of labels considered for the annotation process. The number above each label is the identifier for such label used in the histogram of Figure 3. Figure 3: Histogram of regions in the training set annotated with the words defined in the vocabulary. 5.2 Annotation AIA methods start from visual attributes extracted from the image or regions and assign labels based on some training examples. Each training example is a pair composed of a region and its corresponding label. In order to create a training set of region-label pairs an annotation interface was used [13]. Such interface allows the manual annotation, at region-level, of segmented images, as well as options for re-segmentation and for joining adjacent regions containing the same object. The IAPR-TC 12 Benchmark consist of 20,000 images, taking the larger 5 regions for each image we have 100,000 regions to annotate. In consequence we would need a large training set of annotated images-regions. However time constrains allowed us to create only a small training set. For creating the training set we randomly selected around 2% of the total number of regions and manually annotated them using the developed interface. The set of labels that can be assigned to regions (that is, the annotation vocabulary) was defined sub- jectively by the authors, by looking at the ImageCLEF2007 textual topic descriptions. The vocabulary of allowed words is shown in Table 1, and the number of regions, in our training set, annotated with each label is shown in Figure 3. Some labels are supposed to represent several concepts, for example the label water was used to label regions of rivers, ocean, sea, and streams. While other labels represent specific objects, such as swimming-pool and tower. From Figure 3 we can see that there are several labels that have many training examples (for example, Sky, Person), though several other labels have only a few. This fact together with poor segmentation made difficult the process of annotation. Sample images with their corresponding annotations are shown in Figure 2. The training set of region-label pairs is used with a knn classifier for annotating the rest of un-annotated images-regions on the IAPR-TC 12 Benchmark collection for document expansion, or the topic’s images for query expansion. Note that the training set size is very small6 for achieving good results with the knn algorithm; even when knn have obtained good results on this task, outperforming other state of the art annotation methods [7, 8]. In order to overcome in part the issues of poor segmentation and imbalanced/small training set we decided to apply a postprocessing to knn for improving annotation accuracy. Recently a method, called MRFI, for improving accuracy on AIA has been proposed [8]. MRFI considers a set of candidate labels for each region and selects an unique label for each region based on sematic information between labels and the confidence of the AIA method on each label. The process of improvement is graphically described in Figure 4, left. Candidate labels and confidence values are obtained from knn, the annotation method. Semantic asso- ciation between labels is obtained by measuring co-occurrences of labels at document level in an external 6 The training set was created for applying a recently proposed semi-supervised learning algorithm based on unlabeled data [13], however time constrains do not allowed us to use such an algorithm. Figure 4: Left: graphical description of the improvement process of MRFI. Right: graphical representation of the MRF ; (red) line-arcs consider semantic cohesion between labels, while (blue) dashed-arcs consider relevance weight of each label according to knn. Figure 5: Expansion of the topic 36 using annotations. Annotations are showed below each segmented image. The expanded query is shown below image’s annotations. corpus of documents. For each region knn ranks labels in decreasing order of the relevance of the labels to being the correct annotation for such region. We consider the set of the top−k more relevant labels for each region. In this way we have k−candidate labels for each region in each image, each of these candidate labels is accompanied by a relevance weight, according to the knn ranking. Taking into account these relevance weights for the region-label assignments MRFI assigns to each region within a given image the label that is more likely, given the labels assigned to neighboring regions in the same image. Intuitively MRFI selects the best configuration of regions-labels assignments for each image, given the semantic cohesion between labels assigned to spatially connected regions and given the relevance weight of each region-label assignment. For doing this the set of regions-labels assignments for a given image is modeled with a Markov random field (MRF ), see Figure 4, right. Each possible assignment of regions-labels for the image is said a configuration of the MRF. The goal of MRFI is to select the (pseudo) optimal configuration by considering the relevance of each assignment and the semantic association between labels assigned to neighboring regions. The optimal configuration for each image is the configuration that minimizes an energy function defined by potentials. Each potential is a function that considers one aspect of the problem at hand. A first potential function considers the relevance weight of each region-label assignment, while another potential attempts to keep semantic cohesion between labels assigned to connected regions. The minimization of the energy function is carried out by simulated annealing. The best configuration is then considered the correct annotation of each image. For further details about this method we suggest the reader to follow the references [8]. 5.3 Query/document expansion For document expansion the annotations assigned to each region in each image in the IAPR-TC 12 Benchmark are added to the original annotation. For query expansion the sample topic images were segmented and annotated, then the set of annotations was used for expanding (or creating) textual queries, in Figure 5 an expanded topic is shown. As we can see some labels are repeated on the resulting expanded topic (sky, people and tree); we decided to consider repeated labels because in this way repeated terms are considered representative terms of the expanded query. If we would not considered label’s repetitions the word people, Languages Run-ID Methods Type MAP Ranking English-English INAOE-EN-EN-NaiveQE-IMFB NQE+IMFB Mixed 0.1986 22 / 142 Dutch-English∗ INAOE-NL-EN-NaiveWBQE-IMFB NQE+IMFB Mixed 0.1986 1/4 French-English INAOE-FR-EN-NaiveQE-IMFB NQE+IMFB Mixed 0.1986 3 / 21 German-English INAOE-DE-EN-NaiveQE-IMFB NQE+IMFB Mixed 0.1986 3 / 20 Italian-English INAOE-IT-EN-NaiveWBQE-IMFB NQE+IMFB Mixed 0.1986 3 / 10 Japanese-English INAOE-JA-EN-NaiveWBQE-IMFB NQE+IMFB Mixed 0.1986 2/6 Portuguese-English INAOE-PT-EN-NaiveWBQE-IMFB NQE+IMFB Mixed 0.1986 2/9 Russian-English INAOE-RU-EN-NaiveWBQE-IMFB NQE+IMFB Mixed 0.1986 2/6 Spanish-English INAOE-ES-EN-NaiveWBQE-IMFB NQE+IMFB Mixed 0.1986 2/9 Visual-English∗ INAOE-VISUAL-EN-AN-EXP-3 NQE+ABE+IMFB Mixed 0.1925 1/1 German-German INAOE-DE-DE-NaiveQE-LATE-FUSION NQE+LF Mixed 0.1341 13 / 30 English-German INAOE-EN-DE-NaiveWBQE-LATE-FUSION NQE+LF Mixed 0.1113 11 / 17 Spanish-Spanish INAOE-ES-ES-NaiveQE-LATE-FUSION NQE+LF Mixed 0.1481 5 / 15 English-Spanish INAOE-EN-ES-NaiveWBQE-LATE-FUSION NQE+LF Mixed 0.1145 2/6 Dutch-Random∗ INAOE-NL-RND-NaiveQE NQE Text 0.0828 1/2 English-Random INAOE-EN-RND-NaiveQE-IMFB NQE+IMFB Mixed 0.1243 6 / 11 French-Random INAOE-FR-RND-NaiveQE-IMFB NQE+IMFB Mixed 0.1243 3 / 10 German-Random INAOE-DE-RND-NaiveQE-IMFB NQE+IMFB Mixed 0.1243 4 / 11 Italian-Random∗ INAOE-IT-RND-NaiveQE NQE Text 0.0798 1/2 Portuguese-Random∗ INAOE-PT-RND-NaiveQE NQE Text 0.0296 1/2 Russian-Random∗ INAOE-RU-RND-NaiveQE NQE Text 0.0763 1/2 Spanish-Random∗ INAOE-ES-RND-NaiveQE-IMFB NQE+ IMFB Mixed 0.1243 1/5 Table 2: Top ranked entries for each of the query-target language configurations comprised in the TIA’s submitted runs. Entries marked with an asterisk indicate configurations in which TIA was the only participant group. for example, would be considered equally important (for representing the query) if it appeared as annotation of an unique region in a single image or as 4 annotations of 4 different regions within the three topic’s images. 6 Experimental results A total of 95 runs were submitted to the ImageCLEF2007, comprising all of the above described methods: intermedia feedback (IMFB ), naive web-based query expansion (NQE), repetition web-based query expansion (WBQE ), late fusion of independent retrievers (LF ) and annotation based expansion (ABE ). Note that some runs are combination of these methods. The top ranked entries for each language configuration together with a description of the methods used are shown in Table 2. As we can see, most of the entries are ranked near the top one. The best performance overall runs were obtained by using IMFB together with NQE s. We can observe that the runs with IMFB+NQE for target language English have exactly the same MAP value, independently of the query language. This is due to the fact that all of the queries were translated to English, then the snippets returned by GoogleR were used for expanding the translated query. Snippets were similar because the English search engine was used, and translations were very similar for all the query languages. The expanded queries were then combined with the captions of the top−10 relevant documents according to the baseline FIRE run, which were the same for all the query languages. Other entries ranked high were those using NQE. Actually the NQE is present in all of the top ranked runs. NQE outperformed WBQE in all of the language configurations. This can be graphically appreciated in Figure 6 (left). This is a surprising result because with the naive approach several noisy terms are added to the queries. While with the repetition approach only the terms that most appear among all the snippets are added. Therefore, the good results of NQE are due to the inclusion of many highly related terms, while the insertion of some noisy terms does not affect the performance of the retrieval model. From Figure 6 (left) we can also clearly appreciate that IMFB outperformed the LF approach. Six runs based on annotation expansion were submitted to the ImageCLEF2007. In these runs document and query expansion was combined with the other techniques proposed in previous sections. The descriptions of the annotation based expansion (ABE ) runs submitted to ImageCLEF2007 are shown in Table 3, together with their overall ranking position. Results with ABE are mixed. The two top ranked runs with ABE ID ID Methods ABQE ABDE Ranking 1 INAOE-EN-EN-AN-EXP-5 Baseline,IMFB X - 57 2 INAOE-VISUAL-EN-AN-EXP-3 Baseline,IMFB X X 58 3 INAOE-EN-EN-NaiveWBQE-ANN-EXP-LF-2 NQE,LF, X X 84 4 INAOE-EN-EN-AN-EXP-2 NQE,Baseline X X 133 5 INAOE-EN-EN-DQE-ANN-EXP-LF NQE,LF X - 389 6 INAOE-EN-EN-AN-EXP-1 Baseline X X 447 Table 3: Settings of the annotation expansion based runs. An X indicates that the corresponding technique is used. ABQE is for annotation-based query expansion and ABDE is for annotation-based document expansion. The ranking position is shown, a total of 474 runs were submitted for the IAPR-TC 12 Benchmark 2007 Collection. Figure 6: Left: MAP value for entries based on intermedia feedback (IMFB ), late fusion (LF ), naive Web-based query expansion (NQE ), repetition web-based query expansion (WQBE ), and annotation based expansion (ABE ); the upper dashed line indicates the performance of our English-English baseline; while the dotted line indicates the average MAP overall the submitted runs. Right: Comparison of the MAP value obtained by mixed and text-only runs. MAP of ABE entries is shown too. The dashed line indicates the average MAP value over all submitted runs. correspond to entries that used ABE +IMFB. This the result can be due to the IMFB performance instead of the ABE technique. The third ABE ranked run used ABE of documents and queries with NQE+LF which obtained a slightly lower MAP than NQE+LF without ABE. Therefore no gain can be attributed to the ABE technique. The other ABE runs were ranked low. Summarizing, relatively good results were obtained with ABE, though this results can be due to the other techniques employed (IMFB and NQE ). However, we should emphasize that this was our very first effort towards developing annotation based methods for improving image retrieval. In Figure 6 (right) we compare mixed and text-only runs. From this Figure it is clear that the mixed approaches were always superior to the text-only runs. This result clearly illustrates the fact that performance of independent retrievers can be improved by considering both modalities for image retrieval. 7 Conclusions We have described the first participation of INAOE’s TIA research group at the ImageCLEF2007 photo- graphic retrieval task. A total of 95 runs were submitted comprising most of the query languages and all of the target ones. Experiments were performed with two widely used approaches on image retrieval from annotated collections: intermedia feedback and late fusion of independent retrievers. A Web-based query expansion technique was proposed for introducing context into the topics and a novel annotation based ex- pansion technique was also developed. The best results were obtained with the intermedia feedback combined with a version of the web-based query expansion technique. With the naive expansion technique several rel- evant related terms are added to the original query, as well as many noisy terms. However noisy terms do not affected the performance of the retrieval system. The intermedia feedback technique outperformed late fusion by a large margin. Relatively good results were obtained with the annotation based runs, however this can be due to the techniques combined with the ABE methods. Results with this technique give evidence that automatic image annotation methods could be helpful on image retrieval from annotated collections. This because promising results were obtained even when segmentation was poor, the training set was extremely small and imbalanced, and annotations did not covered the objects present within the image collection. In consequence for future work we will address all of these issues by creating a larger training set of annotated images, using other segmentation algorithms, defining labels objectively trying to cover most of the objects present in images within the collection and keeping balanced the training set. Also we intent to use other annotation methods. Acknowledgements. We would like to thank M. Grubinger, A. Harbury, T. Desealers, P. Clough and all other organizers of ImageCLEF2007 because of their important effort. This work was partially supported by CONACyT under grant 205834. References [1] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Pearson E. L., 1999. [2] Y. Chang and H. Chen. Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In Working Notes of the CLEF. CLEF, 2006. [3] Y. Chang, W. Lin, and Hsin-Hsi Chen. Combining text and image queries at imageclef 2005. In Working Notes of the CLEF. CLEF, 2005. [4] P. Clough, M. Grubinger, T. Deselaers, A. Hanbury, and H. Müller. Overview of the imageclef 2006 photographic retrieval and object annotation tasks. In Working Notes of the CLEF. CLEF, 2006. [5] P. Clough, H. Müeller, T. Deselaers, M. Grubinger, T. Lehmann, J. Jensen, and W. Hersh. The clef 2005 cross-language image retrieval track. In Working Notes of the CLEF 2005. CLEF, 2005. [6] R. Datta, J. Li, and J. Z. Wang. Content-based image retrieval - approaches and trends of the new age. In Pro- ceedings ACM International Workshop on Multimedia Information Retrieval, Singapore, 2005. ACM Multimedia. [7] H. J. Escalante, M. Montes, and L. E. Sucar. Improving automatic image annotation based on word co-occurrence. In Proccedings of the 5th International Adaptive Multimedia Retrieval workshop, Paris, France, July 2007. [8] H. J. Escalante, M. Montes, and L. E. Sucar. Word co-occurrence and mrf’s for improving automatic image annotation. In In Proceedings of the 18th British Machine Vision Conference (BMVC 2007) To appear, Warwick, UK, September, 2007. [9] M. Grubinger, P. Clough, A. Hanbury, and H. Müller. Overview of the ImageCLEF 2007 photographic retrieval task. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, September 2007. [10] M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. 2005. [11] J. S. Hare, P. H. Lewis, P. G.B. Enser, and C. J. Sandom. Mind the gap: Another look at the problem of the semantic gap in image retrieval. In Hanjalic A. Chang, E. Y. and Eds. Sebe, N., editors, Proceedings of Multimedia Content Analysis, Management and Retrieval, volume 6073, San Jose, California, USA, 2006. SPIE. [12] N. Maillot, J. Chevallet, V. Valea, and J. H. Lim. Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In Working Notes of the CLEF. CLEF, 2006. [13] H. Marin-Castro, L. E. Sucar, and E. F. Morales. Automatic image annotation using a semi-supervised ensemble of classifiers. In In Proceedings of the 12th Iberoamerican Congress on Pattern Recognition (CIARP 2007), To appear, 2007. [14] M. M. Rahman, V. Sood, B. C. Desai, and P. Bhattacharya. Cindi at imageclef 2006: Image retrieval and annotation tasks for the general photographic and medical image collections. In Working Notes of the CLEF. CLEF, 2006. [15] Y. Rui, T. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5):644–655, 1998. [16] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. [17] D. Zeimpekis and E. Gallopoulos. Tmg: A matlab toolbox for generating term-document matrices from text collections. In C. Nicholas J. Kogan and M. Teboulle, editors, Grouping Multidimensional Data: Recent Advances in Clustering, pages 187–210. Springer, 2005.