=Paper=
{{Paper
|id=Vol-1178/CLEF2012wn-ImageCLEF-Manzato2012
|storemode=property
|title=The Participation of IntermidiaLab at the ImageCLEF 2012 Photo Annotation Task
|pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-Manzato2012.pdf
|volume=Vol-1178
}}
==The Participation of IntermidiaLab at the ImageCLEF 2012 Photo Annotation Task==
The participation of IntermidiaLab at the ImageCLEF 2012 Photo Annotation Task Marcelo G. Manzato Mathematics and Computing Institute, University of São Paulo Av. Trabalhador Sancarlense, 400, PO Box 668 – 13560-970 São Carlos, SP – Brazil mmanzato@icmc.usp.br Abstract. This paper presents the results of our concept annotation tool in the ImageCLEF 2012 Photo Annotation Task. Our approach is based only on textual features, in particular collaborative user tags. It faces some of the challenges of using user-generated terms, such as noise and incompleteness. The first one is supported by a multi-phase filtering procedure that amends misspelled tags and considers only those semanti- cally related to the actual context of the image. The second one, in turn, is reduced with a tag enrichment activity which aggregates related terms to the actual list of tags in order to make annotation more effective. Keywords: Automatic annotation, concept classification, user tags, folk- sonomy, enrichment. 1 Introduction The overload of multimedia content available on the Web has introduced new challenges for search engines to retrieve, organize and classify the data according to the user’s needs. One important and common step of these tasks is the extrac- tion of meaningful information describing the content. Typically, the provision of such metadata is accomplished by content authors or providers, who create well-structured information about specific media items, helping search and anal- ysis tasks. Automatic approaches may also be adopted, which create multimedia descriptions without human intervention, analyzing low level visual features in order to infer semantic information. It is known that automatic and manual indexing techniques have limitations that may affect the process of gathering semantic information. Indeed, the lack of methods to provide meaningful descriptions has been named semantic gap [15], and over ten years researchers have been working on the development of strategies to overcome related problems. For instance, in case of automatic ap- proaches, usually they are highly dependent on the content domain, once they infer semantic information based on low-level visual features [2]. In manual ap- proaches, in turn, the description is considered a time-consuming and error prone task [13]. In face of the huge amount of multimedia indexing techniques proposed so far, the ImageCLEF competitions1 play an important role in the context of image retrieval. Each year, a set of specific challenging problems is proposed to researchers with the objective to push forward the state-of-art, allowing the resulting techniques to be discussed and compared against each other using a unique and standardized set of requirements. One of these problems is the Photo Annotation Task, which consists of a multi-label classification challenge, that aims to analyze a collection of Flickr2 photos in terms of their visual and/or textual features in order to detect the presence of one or more concepts. Particularly in the case of textual features, the dataset of images made avail- able by ImageCLEF to be annotated includes Flickr tags, which are a set of terms created by users to facilitate further search by themselves. Considering other benefits, some authors [9][20][1][21] have explored collaborative tagging to create folksonomies and semantic relationships based on co-occurrence of tags. With such structures, it is possible to gather semantic information from the con- tent in a collaborative way, reducing the main problems related to traditional indexing approaches. This paper presents the techniques developed by our group to handle the ImageCLEF 2012 Photo Annotation Task [18]. Unlike the majority of the par- ticipants who explore visual features or a combination of visual and textual features, we focus only on user tags. Although many of the defined concepts are difficult to be found using only text, we believe exploring textual information is a good start point to design a robust and multimodal annotation approach. This paper is structured in the following way. Section 2 presents the related work which depict other text-based approaches proposed in previous years of ImageCLEF. Section 3 provides the prior steps in our algorithm which consist of a preparation procedure to gather semantic information from the training set. Section 4 describes the concept annotator algorithm proposed in this paper. Section 5 presents the official results of the evaluation. Finally, Sections 6 and 7 present the final remarks, future work and acknowledgments. 2 Related Work Analyzing the last two years of ImageCLEF Photo Annotation Task, it can be noted that the majority of approaches is based on visual features or a com- bination of visual and textual features. In 2010, from 54 groups registered for the task, only two groups submitted runs in the textual category only. One example is the work of Li et al. [9], which used textual features (user tags and EXIF information) to perform automatic annotation. Their approach starts with a document expansion procedure which assigns additional content to concepts and image tags by consulting external information resources, such as DBpedia3 . After that, expanded textual metadata is compared to expanded concepts in 1 http://imageclef.org/ 2 http://www.flickr.com/ 3 http://dbpedia.org order to make concept assumptions. They also provide a method to assign con- cepts to images by analyzing EXIF data, such as the time a photo was shot. Finally, additional concepts are considered by infering affiliations and opposite relations among them. In 2011, 48 groups registered for the challenge, and seven submitted at least one run in the textual category only. One of the submissions of Daróczy et al. [6], for instance, was a text-based approach that uses Flickr tags to create a kernel weighting procedure based on similarity of images which is computed using the Jensen-Shannon divergence. In addition to this similarity calculation, other metrics were also used, such as the ones based on WordNet4 ontology. Indeed, Liu et al. [10] proposed a method to extract text features for concept detection based on a semantic dis- tance between words, which are expected to capture the semantic meanings from images. In total, ten features were extracted, which are based on different types of information, including semantic similarities according to WordNet ontology, affective norms for English words, etc. Znaidia et al. [21] submitted a text-based approach that uses two distances among tags and visual concepts: one based on Wordnet ontology and the other based on social networks. The correlation between tags and visual concepts is also explored by Amel et al. [1], who proposed an approach that uses Flickr tags to extract contextual relationships of tags and concepts. Two types of contextual graphs are modeled: an inter-concepts graph and a concept-tags graph. The similarity among tags and concepts is computed based on the principle of Google distance [5]. In order to use the Flickr tags, it is important to pre-process them to reduce the occurrence of noise, and give more importance to particular terms. These two steps are performed by Izawa et al. [7], who submitted an approach that first performs a morphological analysis on all terms of the training images. Next, a tf-idf score is assigned to each tag. Then, the system stores the correspondence information for the tags and concepts in a casebase. Finally, the annotation on a given test image is accomplished by matching the tags attached to the image to the tags of the casebase. The tf-idf score is also used by Nagel et al. [12], who submitted a text- only approach consisting of a pre-processing step on Flickr tags to reduce noise, followed by tf-idf assignment for each term in order to construct a text-based descriptor. Spyromitros-Xioufis et al. [16] proposed a textual model that uses the boolean bag-of-words representation, in addition to stemming, stop words removal, and feature selection using the chi-squared-max method [8]. After that, the extracted features were used in a multi-label learning algorithm based on Ensemble of Classifier Chains [14] The aforementioned text-based approaches differ from ours because we pro- vide a co-occurrence-based tag enrichment phase in order to overcome the in- completeness of terms in resources that do not have enough associated tags. In 4 http://wordnet.princeton.edu addition, based on the enriched tags list, we also calculate the co-occurrence of concepts and tags during training to highlight the most frequent concepts by given tags. 3 Training Our automatic annotator starts with a training procedure whose main objective is to gather semantic information from the available corpus already annotated. In addition, all user tags must be filtered and analyzed in order to remove in- complete and/or incorrect tags. Figure 1 illustrates the overall schema. Although in this participation we are annotating only images, our approach may be used with any other multimedia modality, such as video and audio, unless the content does not have any associated tags. Fig. 1. Training procedure. All available tags about an image (Figure 1 (a)) are submitted into a mod- ule which will filter incorrect and/or incomplete terms. This procedure has the objective to eliminate noise which can affect later the quality of concept classi- fication. After that, all terms are used to create a folksonomy of tags (Figure 1 (c)), so that semantically related terms can be obtained during the annotation of images with few associated tags. A folksonomy of concepts is also constructed based on their co-occurrence, as indicated in Figure 1 (d). The idea behind this process is to identify semantically related concepts, and then, assign those re- lated concepts based on the ones already found. For instance, if an image was annotated with the concept “combustion flames”, so it will also be annotated with “combustion smoke”. Finally, we use both concepts and terms of each im- age to create a table containing the most frequent concept for each tag (Figure 1 (e)). This table will be used during classification to predict possible concepts according to tags assigned by the user. Next subsections describe the training steps in details. 3.1 Filtering Tags The process of filtering noisy tags is divided into two phases. The first one is conducted to amend or discard misspelled, incorrect and/or incomplete terms, being most of the process supported by lexicon resources and syntactic analysis of the terms. The second phase consists of checking the meaning of the tags in order to decide if such terms are semantically related to the most frequent concepts in the training set. This subsection describes the first phase of the filtering procedure; the second one, because it is executed as a further step of the annotation algorithm, is depicted in details in Subsection 4.3. The synctatic analysis of the given tags is similar to the one proposed by Cantador et al. [3]. Figure 2 illustrates the main steps. It starts with a lexical checking (Figure 2 (a)) which will discard all stop-words and terms that contain digits and/or their length is less than 2 or bigger than 25 characters. Fig. 2. Filtering procedure. After that, the tags are checked against WordNet (Figure 2 (b)) in order to find existent words. If it was not found, the tag may be misspelled. Consequently, we check it against a local dictionary (Figure 2 (c)) to find possible correct lexical variations of that term. As opposite to the algorithm proposed by Cantador et al., which uses a Google connector to check lexical variations, we use the Levenshtein distance [11] among the tag and possible candidates, working similarly to Google Did You Mean, though the latter also considers proper names. The reason for not having used the Google version is that it restricts the number of queries per day. In our case, tags consisting of proper names will be investigated in future work through the use of Wikipedia ontologies, such as Yago25 or DBpedia. After the local dictionary module has indicated possible lexical variations of the term, they are checked again on WordNet in order to make sure it is a valid English term existent in such database (Figure 2 (d)). If it was found, the term will be considered in the system (Figure 2 (e)); otherwise, it will be discarded (Figure 2 (f)). 3.2 Statistics Calculation After syntactic analysis, the training images and their associated tags are used to create three statistic tables, as illustrated in Figure 1 (c), (d) and (e). The 5 http://www.mpi-inf.mpg.de/yago-naga/yago first two are the tag and concept folksonomies, and the last is the one which relates concepts and tags. As will be described in next section, they are used to infer related tags and concepts from the corpus, supporting the semantic classification. The folksonomies are based on the relatedness measure described by Cattuto et al. [4], who create a folksonomy based on co-occurrence of tags. They define it as a weighted undirected graph whose set of vertex is the set T of all tags from all images. Two tags t1 and t2 are connected by an edge if and only if t1 , t2 ∈ Ts , where Ts is the set of tags assigned to image s ∈ S. The weight of this edge is given by the number of times t1 and t2 co-occurred, i.e., w(t1 , t2 ) = |{s ∈ S | t1 , t2 ∈ Ts }|. For the concept folksonomy, the same process is adopted, but instead of con- sidering co-occurrence of tags, we check the co-occurrence of concepts assigned to each image from the training set. Two concepts c1 and c2 are connected by an edge if and only if c1 , c2 ∈ Cs , where Cs is the set of concepts assigned to image s ∈ S during training. The weight of this edge is given by the number of times c1 and c2 co-occurred, i.e., w(c1 , c2 ) = |{s ∈ S | c1 , c2 ∈ Cs }|. Another statistic table created from the training set is the one which relates concepts and tags. We calculate the number of times a concept and a tag co- occurred in an image. Given a pair (c, t), where c ∈ C is a concept and C is the set of all concepts, we assume they are related if and only if c ∈ Cs and t ∈ Ts . A weight between them is also defined, i.e., w(c, t) = |{s ∈ S | c ∈ Cs & t ∈ Ts }|. 4 Automatic Annotation This section describes the automatic annotation algorithm proposed in this pa- per. It consists of analyzing the available tags of each image in order to infer possible related concepts according to the folksonomies and statistical data con- structed during training. Figure 3 illustrates the overall schema. User tags associated to an image (Figure 3 (a)) are first filtered using the same procedure described in Subsection 3.1 (Figure 3 (b)). After that, some images may have not enough terms to support correct concept classification. Thus, we add a tag enrichment phase (Figure 3 (c)) which will expand the set of available tags with related ones obtained from the tag folksonomy created previously during training. When an image has enough terms, it can be annotated. This task is accom- plished by means of three strategies, illustrated in Figure 3: direct recognition (d), most frequent concepts by tag (e) and post processing (f). As a result of all three classification steps, the multimedia content is annotated with all obtained concepts (Figure 3 (g)). Next subsections describe in details all involved steps. 4.1 Tag Enrichment Although concepts are inferred based on the content of each tag, it is possible to improve the classification results by enriching the set of tags with related terms Fig. 3. Annotation procedure. that users usually use together to tag a resource. When more tags are available, the classifier will be able to better predict concepts because more information about the content will be analyzed. The tag enrichment procedure is implemented in our system by considering a number of related terms t′ to a given tag t. This relatedness between tags is defined proportionally to the edge weights of the tag folksonomy; consequently, given a tag t ∈ T , the tags that are most related to it are all the tags t′ ∈ T with t 6= t′ such that w(t, t′ ) is maximal. Let us consider, for instance, the image illustrated in Figure 4, whose set of tags are “pittsburgh”, “sky” and “clouds”. Applying the tag enrichment proce- dure will result in the three subsets illustrated in Figure 4 (a), where the terms that are most related to “pittsburgh” are “sky”, “clouds”, “yellow”, “downtown” and “tree”; the same for the original tags “sky” and “clouds”. Fig. 4. Tag enrichment procedure. As a result of gathering the related terms from all tags t ∈ Ts , we obtain a list Rs composed of related terms and associated weights. A weight value is calculated according to the corresponding term’s frequency of retrieval by using all tags t. Then, these weights are used to sort the list in descending order. Figure 4 (b) illustrates this step, which results in a list Rs composed of the most frequent terms occurred in the previous subsets. Following, we empirically define a minimal number of tags T̄ to allow each image to be classified. The value of |T̄ | includes the terms already associated by the user, plus the related terms obtained from the tag folksonomy. Thus, we consider the ns first terms from list Rs , whose value is given by: |T̄ | − |Ts | if |Ts | < |T̄ | , or ns = (1) 0 otherwise. Consequently, if we set |T̄ | = 5, the image illustrated in Figure 4 will be clas- sified using the original tags “pittsburgh”, “sky” and “clouds”, plus the extended ones “tree” and “blue”. 4.2 Direct Recognition The annotation task effectively starts with direct recognition (Figure 3 (d)), which will check whether a user has tagged an image with a term that explic- itly refers to a concept. In order to accomplish that, each predefined concept was manually designated a meaningful alias, which was chosen to be the most likely term used by an individual to tag a resource with that concept. For in- stance, for concept “flora flower”, its alias was defined “flower”; for concept “weather rainbow”, its alias was defined “rainbow”. Furthermore, with the ob- jective to improve the results of direct recognition, we also consider the fact that many tags may be in the plural or singular forms, and users may use a syn- onym word to refer to the same concept. Thus, we first use an inflector object to transform all tags and concepts to their singular form, in addition to a synonym vocabulary that contains different synonyms for each concept. Consequently, if a user has tagged an image with “kids”, so the system will first transform it to the singular form “kid”, which is a synonym of “child”. As a result, as “child” is the alias for the concept “age child”, this concept will be added into the annotation of the given image. 4.3 Most Frequent Concepts The next step of concept annotation is to obtain the most frequent concepts by tag using the concept-tag relation described in Subsection 3.2. Given a tag t ∈ Ts , we collect the Ct most frequent concepts related to it, i.e., Ct will have a subset of concepts c ∈ C such that w(c, t) is maximal. Considering the same example presented in Figure 4, in Figure 5 (a) we illustrate this step by listing the five most frequent concepts for each given tag: “pittsburgh”, “sky”, “clouds”, “tree” and “blue”. Fig. 5. Most frequent concepts classifier. Once we have some concepts candidates given by Ct , it is still necessary to check if they are semantically related to the tag t. This is because the concept-tag relation only provides concepts relying on statistic information, it does not ensure that the most frequent concepts have a semantic relation to the given tag. For instance, the tag “explore” is frequently used by users to tag resources in many contexts, though it is not semantically related to the image’s actual concepts. Consequently, if this tag is used to retrieve the most frequent concepts, the list will be prone to noise. To solve this problem, we use two semantic similarity measures to decide whether c′ ∈ Ct will be adopted in final annotation or not. The first measure, simwp , is the Wu & Palmer approach [19], which is based on the topology of the WordNet ontology. The second one, simres , is the Resnik semantic similarity measure [17], which is based on information content. Therefore, our algorithm will combine both similarity measures in order to find a concept c′ ∈ Ct that is semantically related to tag t, that is: ( ′ c if simwp ( max [simres (alias(c′ ), t)], t) ≥ Γ cs + = c ∈Ct ′ , (2) ⊘ otherwise where Γ is a threshold, and alias(c′ ) returns the alias of concet c′ as explained in Subsection 4.2. This process is executed for all tags associated to image s. Thus, the system will find a set of concepts Cs such that each element cs ∈ Cs is a concept that is most semantically similar to a tag t ∈ Ts . Considering the previous example, Figure 5 (b) illustrates the application of Resnik semantic similarity between each tag and the alias of each concept candidate. For each tag, we select the concept with highest similarity, which is computed again using the Wu & Palmer metric (Figure 5 (c)). Then, using a predefined value for Γ , we select the concepts to be included in the annotation of the given image (Figure 5 (d)). 4.4 Post Processing As the final step of the annotation task, we designed a post processing procedure whose goals are twofold: i) to include related concepts to the ones already found; and ii) to discard mutual exclusive concepts by comparing each pair of annotated concepts using a predefined set of heuristics. In the first case, we use a Naı̈ve Bayes approach [11] in order to determine whether, given an annotated concept c, other concepts should be included as well. Thus, we define: cj ∈ C if P (cj |c) ≥ P (!cj |c) cs + = , (3) ⊘ otherwise where P (cj |c) is the probability of including cj in the annotation given the annotated concept c; and P (!cj |c) is the probability of not including cj in the annotation given the annotated concept c. We estimate both probabilities as: |Scj | w(cj , c) P (cj |c) ≈ P (cj )P (c|cj ) = , (4) |S| X w(ci , cj ) ci ∈C |S!cj | w(!cj , c) P (!cj |c) ≈ P (!cj )P (c|!cj ) = , (5) |S| X w(ci , !cj ) ci ∈C where Scj is the subset of images from the training set annotated with concept cj ; and S!cj is the subset of images from the training set which were not annotated with concept cj . After including related concepts, the final procedure of our annotation tool is to discard mutual exclusive concepts. To do that, we first design manually a list of pairs of concepts which are mutual exclusive, such as “view indoor” versus “view outdoor”, “sentiment active” versus “sentiment inactive”, and so on. Then, given a pair of conflicting concepts (c1 , c2 ) in the preliminary annota- tion list, we apply the following heuristics in order to discard the concept(s) in conflict: 1. If c1 and c2 were found by direct recognition, then keep both concepts. 2. If c1 was found by direct recognition and c2 was not, then discard c2 . 3. If c1 and c2 were not found by direct recognition, discard the concept with less votes from all classifiers. 4. If c1 and c2 were not found by direct recognition and both have the same amount of votes, discard both concepts. plementation of the techniques described in this paper; and a simpler version sources. Furthermore, other particular concepts were also able to be correctly and “quality noblur”, being the first two recognized mostly by direct classifica- This section presents the official results obtained by our group in the Image- and “transport”, once users tend to use the concepts names to tag their re- note that better results were achieved for the concepts in the classes “fauna” classified, such as “combustion fireworks”, “setting fooddrink”, “quantity none” tion and most frequent concepts by tag, and the last two by the post processing ted at least one run in the textual category; comparing the best results from CLEF 2012 Photo Annotation Task [18]. Table 1 presents the overall results. Figure 6 presents the F1 results for each of the 93 predefined concepts. We We submitted two runs to be evaluated: the “proposed” one, which is the im- based only on tags enrichment and Naı̈ve Bayes. This year, ten groups submit- transport_air transport_water transport_rail transport_truckbus transport_car transport_cycle sentiment_funny sentiment_euphoric sentiment_active Naive Bayes Proposed sentiment_scary sentiment_unpleasant sentiment_melancholic Classifier MiAP GMiAP Micro-F1 Macro-F1 Micro-F1 Macro-F1 sentiment_inactive sentiment_calm 0.2152 0.2586 sentiment_happy setting_fooddrink By Concept setting_sportsrecreation setting_homelife setting_partylife setting_citylife view_outdoor each group, our technique was able to achieve the eighth place. view_indoor view_closeupmacro view_portrait style_overlay style_graycolor 0.3532 0.3389 style_circularwarp style_pictureinpicture quality_artifacts quality_motionblur quality_completeblur quality_partialblur quality_noblur relation_strangers relation_coworkers Fig. 6. Results by concept. relation_familyfriends Table 1. Overall Results. gender_female gender_male 0.3877 0.3460 age_elderly age_adult age_teenager age_child By Photo age_baby quantity_biggroup quantity_smallgroup quantity_three quantity_two quantity_one quantity_none fauna_rodent Naı̈ve Bayes 0.1521 0.0894 0.3532 0.1724 0.1140 0.3389 fauna_amphibianreptile fauna_spider fauna_insect fauna_bird fauna_fish fauna_horse fauna_dog fauna_cat flora_grass flora_flower flora_plant flora_tree water_other water_riverstream water_lake water_seaocean water_underwater scape_graffiti scape_city scape_rural scape_coast scape_forestpark scape_desert scape_mountainhill lighting_lenseffect lighting_silhouette lighting_reflection lighting_shadow combustion_fireworks combustion_smoke combustion_flames weather_snowice weather_fogmist Proposed weather_lightning weather_rainbow Results weather_cloudysky weather_overcastsky weather_clearsky celestial_stars celestial_moon celestial_sun timeofday_sunrisesunset timeofday_night timeofday_day 1 0.8 0.6 0.4 0.2 0 step. 5 6 Final Remarks This paper presented the results of the automatic annotation tool evaluated by ImageCLEF 2012 Photo Annotation Task. It uses different strategies based on co-occurrence of tags and concepts in order to reduce two problems related to the use of tags in classification: noise and incompleteness. The first one is supported by a multi-phase filtering procedure that amends misspelled tags and considers only those semantically related to the actual context of the image. The second one, in turn, is reduced with a tag enrichment activity which aggregates related terms to the actual list of tags in order to make annotation more effective. The results shown by this evaluation indicate that more research should be made in order to improve the quality of annotations. This shall be made in future work, as we plan to investigate how to recognize specific concepts which are difficult to be found using only textual features. 7 Acknowledgments The authors would like to thank the financial support from FAPESP, process number 2011/17366-2. References 1. K. Amel, A. Benammar, and C. B. Amar. REGIMvid at ImageCLEF2011: Inte- grating Contextual Information to Enhance Photo Annotation and Concept-based Retrieval. In V. Petras, P. Forner, and P. D. Clough, editors, Working Notes of CLEF 2011, Amsterdam, The Netherlands, 2011. 2. D. Brezeale and D. J. Cook. Automatic Video Classification: A Survey of the Literature. IEEE Transactions on Systems, Man, and Cybernetics, 38(3):416–430, 2007. 3. I. Cantador, M. Szomszor, H. Alani, M. Fernández, and P. Castells. Enriching On- tological User Profiles with Tagging History for Multi-Domain Recommendations. In 1st International Workshop on Collective Semantics: Collective Intelligence & the Semantic Web (CISWeb 2008), Tenerife, Spain, 2008. 4. C. Cattuto, D. Benz, A. Hotho, and G. Stumme. Semantic Grounding of Tag Relatedness in Social Bookmarking Systems. In A. P. Sheth, S. Staab, M. Dean, M. Paolucci, D. Maynard, T. W. Finin, and K. Thirunarayan, editors, The Seman- tic Web – ISWC 2008, volume 5318 of Lecture Notes in Computer Science, pages 615–631, Berlin/Heidelberg, 2008. Springer. 5. R. L. Cilibrasi and P. M. B. Vitanyi. The google similarity distance. IEEE Trans. on Knowl. and Data Eng., 19(3):370–383, 2007. 6. B. Daróczy, R. Pethes, and A. A. Benczúr. SZTAKI @ ImageCLEF 2011. In V. Petras, P. Forner, and P. D. Clough, editors, Working Notes of CLEF 2011, Amsterdam, The Netherlands, 2011. 7. R. Izawa, N. Motohashi, and T. Takagi. Annotation and Retrieval System Us- ing Confabulation Model for ImageCLEF2011 Photo Annotation. In V. Petras, P. Forner, and P. D. Clough, editors, Working Notes of CLEF 2011, Amsterdam, The Netherlands, 2011. 8. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A New Benchmark Collection for Text Categorization Research. J. Mach. Learn. Res., 5:361–397, 2004. 9. W. Li, J. Min, and G. J. F. Jones. A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task. In M. Braschler, D. Harman, and E. Pianta, editors, © Working Notes of CLEF 2010, Padua, Italy, 2010. 10. N. Liu, Y. Zhang, E. Dellandrà a, S. Bres, and L. Chen. LIRIS-Imagine at ImageCLEF 2011 Photo Annotation Task. In V. Petras, P. Forner, and P. D. Clough, editors, Working Notes of CLEF 2011, Amsterdam, The Netherlands, 2011. 11. C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. 12. K. Nagel, S. Nowak, U. Kuhhirt, and K. Wolter. The Fraunhofer IDMT at Image- CLEF 2011 Photo Annotation Task. In V. Petras, P. Forner, and P. D. Clough, editors, Working Notes of CLEF 2011, Amsterdam, The Netherlands, 2011. 13. S. N. Patel and G. D. Abowd. The ContextCam: Automated Point of Capture Video Annotation. In F. Khendek and R. Dssouli, editors, UbiComp 2004: Ubiq- uitous Computing, volume 3205 of Lecture Notes in Computer Science, pages 301– 318. Springer, 2004. 14. J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, ECML PKDD ’09, pages 254– 269, Berlin, Heidelberg, 2009. Springer-Verlag. 15. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content- Based Image Retrieval at the End of the Early Years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380, 2000. 16. E. Spyromitros-Xioufis, K. Sechidis, G. Tsoumakas, and I. Vlahavas. MLKD’s Participation at the CLEF 2011 Photo Annotation and Concept-Based Retrieval Tasks. In V. Petras, P. Forner, and P. D. Clough, editors, Working Notes of CLEF 2011, Amsterdam, The Netherlands, 2011. 17. P. R. Sun. Using information content to evaluate semantic similarity in a taxon- omy. In In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, 1995. 18. B. Thomee and A. Popescu. Overview of the ImageCLEF 2012 Flickr Photo An- notation and Retrieval Task. In CLEF 2012 working notes, Rome, Italy, 2012. 19. Z. Wu and M. Palmer. Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pages 133–138, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics. 20. S. Zhu, G. Wang, C.-W. Ngo, and Y.-G. Jiang. On the sampling of web images for learning visual concept classifiers. In Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR ’10, pages 50–57, New York, USA, 2010. 21. A. Znaidia and H. Le Borgne. CEA LISTs participation to Visual Concept Detec- tion Task of ImageCLEF 2011. In V. Petras, P. Forner, and P. D. Clough, editors, Working Notes of CLEF 2011, Amsterdam, The Netherlands, 2011.