=Paper=
{{Paper
|id=Vol-1178/CLEF2012wn-ImageCLEF-MantziouEt2012
|storemode=property
|title=CERTH's Participation at the Photo Annotation Task of ImageCLEF 2012
|pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-MantziouEt2012.pdf
|volume=Vol-1178
|dblpUrl=https://dblp.org/rec/conf/clef/MantziouPPSK12
}}
==CERTH's Participation at the Photo Annotation Task of ImageCLEF 2012 ==
CERTH’s participation at the photo annotation task of ImageCLEF 2012 Eleni Mantziou, Georgios Petkos, Symeon Papadopoulos, Christos Sagonas, and Yiannis Kompatsiaris Information Technologies Institute Centre for Research and Technology Hellas, Thessaloniki, Greece {lmantziou, gpetkos, papadop, sagonas, ikom}@iti.gr Abstract. This paper describes the approaches and experimental set- tings of the five runs submitted by CERTH at the photo annotation task of ImageCLEF 2012. Two different approaches were used, the first using the Laplacian Eigenmaps of an image similarity graph for learning, and the second using a “same class” learning model. Four runs were submit- ted using the first, and one using the second approach. A multitude of textual and visual features were employed, making use of different aggre- gation (BoW, VLAD) and post-processing schemes (WordNet, pLSA). The best performance scores in the test set was achieved by Run 3 (first approach using all features), which amounted to 0.321 in terms of MiAP and 0.2547 in terms of GMiAP (7th out of 18 compteting teams), and Run 5 which led to an F-ex score of 0.495 (6th out of 18 teams). 1 Introduction This document describes the participation of CERTH at the photo annotation task of the 2012 ImageCLEF competition [1]. CERTH submitted five runs using two different approaches. The first approach, to be described in subsection 2.1, computes the similarity between test images and train images, constructs an im- age similarity graph, and trains concept detectors by using the graph Laplacian Eigenmaps (LE) [7] as features. This is done for each modality and the final result is obtained by performing late fusion using a linear classifier. The second approach, to be detailed in subsection 2.2, utilizes the concept of a “same class” model that takes as input the set of distances (as many as the number of used features) between the image to be annotated and a reference item that represents a target concept, and predicts whether the image belongs to the target concept. Section 3 outlines each of the submitted runs and presents the obtained test results. Section 4 presents some general remarks and conclusions. 2 Overview of methods 2.1 Concept detection using image similarity graphs The first approach used by CERTH is based on the construction of a similarity graph between the images. This graph is used to obtain a low-dimensional feature representation: we use the first eigenvectors of the graph Laplacian as features. These features correspond well to semantically coherent groups of images, and are thus used to train concept classifiers. The idea of utilizing the implicit relational structure that can be derived by computing similarities between the images of a collection has been proposed be- fore. In [8], an extended similarity measure is proposed that takes into account the local neighbourhood structure of images, i.e, the content and label infor- mation (if available) of images that are similar to the input image. The afore- mentioned measure is used in combination with two well-known semi-supervised learning methods [16] and is shown to improve their performance both in syn- thetic experiments and in benchmark video annotation task. Our work is mostly related to [6] that introduces the concept of ”social dimensions”, i.e. the top-k eigenvectors of a graph Laplacian, as an alternative to tackling the relational classification problem, [10], i.e. the classification of a graph node by taking into account information from neighbouring nodes. Here, we adopt a similar repre- sentation for graph structure features. Method overview: Given a set of K target concepts Y = {Y1 ....YK } and an annotated set L = {(xi , yi )}li=1 of training samples, where xi ∈