-

DAEDALUS at ImageCLEF Wikipedia Retrieval 2010: Expanding with Semantic Information from Context

Sara Lana-Serrano

0 2

Julio Villena-Román

0 1

José Carlos González-Cristóbal

josecarlos.gonzalez@upm.es 0 2 0 DAEDALUS - Data , Decisions and Language, S.A 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid

2010

This paper describes the participation of DAEDALUS at the ImageCLEF 2010 Wikipedia Retrieval task. The main focus of our experiments is to evaluate the impact in the image retrieval process of the incorporation of semantic information extracted only from the textual information provided as metadata of the image itself, as compared to expanding with contextual information gathered from the document where the image is referred. For the semantic annotation, DBpedia ontology and YAGO classification schema are used. As expected, the obtained results show that, in general, the textual information attached to a given image is not able to fully represent certain features of the image. Furthermore, the use of semantic information in the process of multimedia information extraction poses two hard challenges still to solve: how to automatically extract the high level features associated to a multimedia resource, and, once the resource has been semantically tagged, which features must be used in the retrieval process to best model the actual and complete meaning of the user query.

Image retrieval domain-specific vocabulary ontology semantic expansion information retrieval indexing topic expansion context

The basic goal of the ImageCLEF 2010 Wikipedia Retrieval task [ 1 ] was, similar to previous campaigns, given a textual query and/or sample images describing a user’s multimedia information need, find as many relevant images as possible from the Wikipedia images collection. Each image in the collection is tagged with both its user-provided annotation consisting of unstructured and noisy textual annotations in English, French, and German, and also links to the article(s) that contain the image.

This paper describes the participation of DAEDALUS team at the ImageCLEF 2010 Wikipedia Retrieval task. We are a research group led by and named after DAEDALUS, a small private company in the field of Information and Telecommunication Technologies and a leading provider of language-based solutions in Spain, and research groups of two universities, Universidad Politécnica de Madrid and Universidad Carlos III de Madrid. We have taken part in CLEF since 2003 in many different tracks and tasks, as part of the MIRACLE team till last year.

This year, the main objective of our experiments is to evaluate and compare the results achieved by the application of techniques that are based on the computational similarity between the metadata associated to the images and the query itself, as opposed to other techniques based on the semantic description of the image based on the contextual information provided by the Wikipedia article in which the image is referred. For this purpose, the DBpedia ontology [ 2 ] and the YAGO [ 3 ] classification schema have been used as the knowledge base to annotate the semantic content, 2

System Description

Based on our experience in previous campaigns in CLEF and other forums, we designed a flexible system in order to be able to execute a large number of runs that exhaustively cover many combinations of different techniques. Our system is composed of a set of small components that are easily combined in different configurations and executed sequentially to build the final result set. Specifically, our system is composed of four modules: • Linguistic processing module, which extract, parses and prepares the input text for subsequent modules. • Semantic module, which expands documents and/or topics with semantic information retrieved from knowledge base. • Textual (text-based) retrieval module, which indexes image annotations in order to search and find the list of images that are most relevant to the text of the topic. • Result combination module, which uses the OR operator to combine, if necessary, two different result lists.

A common baseline algorithm was used in all experiments to process the collection, following these steps: 1. Text Extraction: Ad-hoc scripts are run on the files that contain image annotations, on the Wikipedia articles and on the topics. The purpose of this process is to generate the different collections or topics that set up the different specific features of each experiment. 2. Tokenization: This process extracts the basic textual components in the annotations. Some basic entities are also detected, such as numbers, initials, abbreviations, and years. So far, compounds, proper nouns, acronyms or other types of entity are not specifically considered. The outcomes of this process are single words, multi-words, years in numbers and tagged entities resulting from the application of the semantic module. 3. Conversion to lowercase: All document terms are normalized by changing all letters to lowercase. 4. Filtering: All words recognized as stopwords are filtered out. Stopwords in the target languages were initially obtained from the University of Neuchatel’s resources page [ 4 ] and afterwards extended using our own developed resources. 5. Stemming: This process is applied to each one of the words to be indexed or used for retrieval. Standard Porter stemmers [ 5 ] for each considered language have been used. 6. Indexing and retrieval: Lucene [ 6 ] was used as the information retrieval engine for the whole textual indexing and retrieval task. 3

Experiments and Results

The main idea behind our experiments is to evaluate and compare the results achieved by the application of techniques that are based on the computational similarity between the metadata associated to the images and the query itself, opposed to other techniques based on the semantic description of the image using the contextual information provided by the Wikipedia article in which the image is referred.

The following fields have been considered as contextual information: • the metadata associated to the image itself (C), • the title of the article (T), and • the first paragraph in the article (S).

The core knowledge base for the semantic expansion is a subset of the DBpedia ontology [ 2 ], conveniently adapted and formatted to our purposes, and using the YAGO classification schema. YAGO [ 3 ] is a huge semantic knowledge base, part of the YAGO-NAGA project at the Max-Planck Institute for Informatics in Saarbrücken (Germany). It currently, holds more than 2 million entities (persons, organizations, cities, etc.) with over 20 million facts about these entities. YAGO has a manually confirmed accuracy of 95%, unlike many other automatically assembled knowledge bases.

Our resulting knowledge base contains 1,651,225 entities, 226,087 hierarchically related classes by means of 225,781 subClassOf relations and 4,121,043 typeOf relations among entities and classes.

Afterwards, an entity identification process is run using the information contained in the knowledge base using a parser specifically developed for this task. Last, the semantic information generated as the output is built by adding up the information about the entity itself, the information about its class(es) and all the ancestors of its class(es).

Finally we submitted 6 experiments to be evaluated, described in Table 1. .

NO C + T

C + T C + T + S

C + T + S

The results achieved after the evaluation of these experiments are shown in next Table 2. . The highest figures are highlighted in bold.

DAEDALUS_Bas DAEDALUS_NER_Bas DAEDALUS_W_CT DAEDALUS_NER_W_CT DAEDALUS_W_CTS DAEDALUS_NER_W_CTS MAP

0.1492 0.1249 0.1820 0.1610 0.1737 0.1593

NDCG 0.2377 0.2115 0.2662 0.2514 0.2478 0.2342

A first preliminary evaluation of these figures shows that, globally, the contextual expansion greatly helps to improve the retrieval results and the semantic expansion tends to make them worse. However, this conclusion is not completely true because of the fact that in the retrieval process, the semantic terms have been boosted with respect to the contextual terms by assigning the first ones a higher relevance factor. This was initially done to be able to better analyze the impact of the semantic expansion, but finally it turned out not to be a good idea. Actually, this issue causes that the final results for the semantic experiments don’t exactly reflect the behavior of an actual retrieval system.

If a deeper analysis is done, it is interesting to notice that, independently of the experiment, the precision levels are quite low when the queries include any reference to primitive features of the image (“white house with garden”, “red fruits”, “yellow buses”, “close up of antenna”) or to high-level semantic features such as actions (“people playing guitar”, “people laughing”) or perceptions.

Moreover, we can also notice that, regardless of the precision level of the results, in general, the incorporation of semantic information for a given topic always produces similar effects (improvements or reductions) independently of the type of contextual information that has been applied. Considering the impact that the use of semantic information has produced in the retrieval process, the following groups of queries haven been identified: • Queries that couldn’t be annotated with semantic information (“lightning in the sky”).

YES NO YES NO YES 0.3556 0.3255 0.4055 0.3801 0.4315 0.4036

Relevant

Retrieved 6088 5628 6453 6038 6871 6105 • Queries in which the use of semantic information produces slight improvements in the results (“horseman”, “civil airplane”). • Queries in which the use of semantic information significantly improves the results (see Table 3). In those queries, the weighting of the semantic terms with respect to the contextual ones, has turned out to be successful to model the specific semantic features that better represent the full meaning of the original query. • Queries in which the use of semantic information significantly reduces the precision of the results (see Table 4). The semantic representation of the query has extracted one or more features that are too general (mainly associated to the first levels of the ontology in the knowledge base), thus contributing with search terms that are not very precise, which in turn produce a very high volume of relevant documents. This fact, combined to the semantic terms boosting described before, have caused a significant decrease in the precision of the results.

W_CT

Topic 8: tennis player on court Topic 15: cyclist Topic 16: spider with cobweb

MAP R-prec.

MAP 164.25%

Topic 55: building site Topic 70: close up of trees

590.53% 390.76% 307.05% 329.05% 511.69% 400.54% 352.29% 110.53% 429.28%

W_CTS 37.88% 12.81% 50.42% 12.51% 140.14% 100.00% 402.13% 100.00% 213.17% After the detailed analysis of the achieved results for each of the topics, we can point out that the text-based information retrieval techniques applied to image retrieval only provide good results when the formulated queries exactly make reference to the semantic or contextual content of the image (images including something o located somewhere), but tend to be of no application for the extraction of primitive features (such as color, brightness, texture, shapes, corner points or its spatial distribution) or high-level semantic features about the meaning and purpose of the objects or scenes depicted (sentiments, emotions, actions, perceptions).

For the first case, the incorporation of semantic information, based on the contextual information of the article in which the image is referred, usually improves the results for those queries in which the semantic information contributes with specific terms that narrow the search. For instance, the semantic information corresponding to the “tennis player on court” topic may help to select images associated to the “tennis player” class; however, the semantic information corresponding to the “cities at night” topic broadens the search to all images that show any of the subclasses extending from “city”, which turns out to be very noisy.

Consequently, it seems that our future efforts should be focused first to study how to better apply any content-based image retrieval technique that helps us to extract the semantics of the image itself, and, on the other hand, to try and find the answers to the following open issues: 1) Should the semantic information be taken into account for all queries during the retrieval process? 2) In any case, should it have a specific processing depending on the query type? 3) Would it be a good idea to assign the same weight during the retrieval process to the semantic information associated to a given entity, or is it better to make this value dependent on the information class and/or the query type?

Acknowledgements

This work has been partially supported by the Spanish Center for Industry Technological Development (CDTI, Ministry of Industry, Tourism and Trade), through the CONTENIDOS A LA CARTA Project, INGENIO 2010 Programme, AVANZA I+D 2008. Other partners in the Project are Agencia EFE, Germinus XXI, 11870.com and Universidad Politécnica de Madrid.

1. Popescu . A.; Tsikrika . T. and Kludas . J. Overview of the Wikipedia Retrieval task at ImageCLEF 2010. Working Notes of CLEF 2010. Padova. Italy . 2010 .

2. The DBpedia Knowledge Base . http://wiki.dbpedia.org/

3. YAGO: A Core of Semantic Knowledge . http://www.mpi-inf.mpg.de/yagonaga/yago/

4. University of Neuchatel. IR Multilingual Resources at UniNE. http://members.unine.ch/jacques.savoy/clef/index.html

5. Porter . M. Snowball stemmers and resources page . http://www.snowball.tartarus.org

6. Apache Lucene project . http://lucene.apache.org