=Paper=
{{Paper
|id=Vol-1739/MediaEval_2016_paper_5
|storemode=property
|title=Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation
|pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_5.pdf
|volume=Vol-1739
|dblpUrl=https://dblp.org/rec/conf/mediaeval/IonescuGZBLM16
}}
==Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation==
Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation Bogdan Ionescu Alexandru Lucian Gînscă Maia Zaharieva∗ LAPI, University Politehnica of CEA, LIST, France University of Vienna & Vienna Bucharest, Romania alexandru.ginsca@cea.fr University of Technology, bionescu@alpha.imag.pub.ro Austria maia.zaharieva@univie.ac.at Bogdan Boteanu Mihai Lupu Henning Müller LAPI, University Politehnica of Vienna University of HES-SO, University of Applied Bucharest, Romania Technology, Austria Sciences Western Switzerland bboteanu@alpha.imag.pub.ro lupu@ifs.tuwien.ac.at henning.mueller@hevs.ch ABSTRACT Search1 ). Participants are required, given a ranked list of This paper provides an overview of the Retrieving Diverse query-related photos retrieved from Flickr2 , to refine the Social Images task that is organized as part of the Media- results by providing a set of images that are at the same time Eval 2016 Benchmarking Initiative for Multimedia Evalua- relevant to the query and to provide a diversified summary tion. The task addresses the problem of result diversification of it. Compared to the previous editions, this year’s task in the context of social photo retrieval where images, meta- includes complex and general-purpose multi-concept queries. data, text information, user tagging profiles and content and The requirements of the task are to refine these results text models are available for processing. We present the task by providing a ranked list of up to 50 photos that are both challenges, the proposed data set and ground truth, the re- relevant and diverse representations of the query, according quired participant runs and the evaluation metrics. to the following definitions: Relevance: a photo is considered to be relevant for the query if it is a common photo representation of the query 1. INTRODUCTION topics (all at once). Bad quality photos (e.g., severely blurred, An efficient image retrieval system should be able to present out of focus, etc.) are not considered relevant in this sce- results that are both relevant and that are covering different nario; aspects, i.e., diversity, of the query. By diversifying the Diversity: a set of photos is considered to be diverse if pool of possible results, one can increase the likelihood of it depicts different visual characteristics of the query topics providing the user with the information needed. Relevance and subtopics with a certain degree of complementarity, i.e., was more thoroughly studied in existing literature than di- most of the perceived visual information is different from one versification [1, 2, 3], especially within the text community. photo to another. Even though a considerable amount of diversification litera- ture exists [8, 9, 10], the topic remains important, especially To carry out the refinement and diversification tasks, par- in the emerging fields of social multimedia [4, 5, 6, 7, 11]. ticipants may use the social metadata associated with the The 2016 Retrieving Diverse Social Images task is a fo- images, the visual characteristics of the images, informa- llowup of the 2015 edition [14, 13, 12, 15] and aims to foster tion related to user tagging credibility (an estimation of the new technology to improve both relevance and diversifica- global quality of tag-image content relationships for a user’s tion of search results with explicit emphasis on the actual contributions) or external resources (e.g., the Internet). social media context. The task was designed to support eval- uation of techniques emerging from a wide range of research 3. DATASET fields, such as image retrieval (text, vision, multimedia com- The 2016 data consists of a development set (devset) con- munities), machine learning, relevance feedback and natural taining 70 queries (20,757 Flickr photos — including 35 language processing, but not limited to these. multi-topic queries related to events and states associated with locations from the 2015 dataset [14]), a user annotation 2. TASK DESCRIPTION credibility set (credibilityset) containing information for ca. The task is built around the use case of a general ad-hoc 300 location-based queries and 685 users (different than the image retrieval system, which provides the user with diverse ones in devset and testset — updated version of the 2015 representations of the queries (see for instance Google Image dataset [14]), a set providing semantic vectors for general English terms computed on top of the English Wikipedia3 ∗ This task is partly supported by the Vienna Science and (wikiset), which could help the participants in developing Technology Fund (WWTF) through project ICT12-010. advanced text models, and a test set (testset) containing 65 1 https://images.google.com/. Copyright is held by the author/owner(s). 2 MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, https://www.flickr.com/. 3 Netherlands. https://en.wikipedia.org/. queries (19,017 Flickr photos). 4. GROUND TRUTH Each query is provided with the following information: Both relevance and diversity annotations were carried out query text formulation (the actual query formulation used by expert annotators. For relevance, annotators were asked on Flickr to retrieve all the data), a ranked list of up to 300 to label each photo (one at a time) as being relevant (value photos in jpeg format retrieved from Flickr using Flickr’s 1), non-relevant (0) or with “don’t know” (-1). For devset, 9 default “relevance” algorithm (all photos are Creative Com- annotators were involved, for credibilityset 9 and for testset mons licensed allowing redistribution4 ), an xml file contain- 8. The data was partitioned among annotators such that ing metadata from Flickr for all the retrieved photos (e.g., in the end each image has been marked by 3 different an- photo title, photo description, photo id, tags, Creative Com- notators. The final relevance ground truth was determined mon license type, the url link of the photo location from after a lenient majority voting scheme. For diversity, only Flickr, the photo owner’s name, user id, the number of times the photos that were judged as relevant in the previous step the photo has been displayed, etc), and ground truth for were considered. For each query, annotators were provided both relevance and diversity. with a thumbnail list of all relevant photos. After getting fa- Apart from the metadata, to facilitate participation from miliar with their contents, they were asked to re-group the various communities, we also provide the following content photos into clusters with similar visual appearance (up to descriptors: 25). Devset and testset were annotated by 5 persons, each - convolutional neural network based descriptors — generic of them annotating distinct parts of the data (leading to only CNN based on the reference convolutional neural network one annotation). An additional annotator acted as a master (CNN) model provided along with the Caffe framework5 annotator and reviewed once more the final annotations. (this model is learned with the 1,000 ImageNet classes used during the ImageNet challenge); and an adapted CNN based 5. RUN DESCRIPTION on a CNN model obtained with an identical architecture to that of the Caffe reference model. Adaptation is done only Participants were allowed to submit up to 5 runs. The for the 2015 location-based multi-topic queries (35 queries first 3 are required runs: run1 — automated using visual from the devset), i.e., the model is learned with 1,000 tourist information only; run2 — automated using text informa- points of interest classes of which the images were automati- tion only; and run3 — automated using text-visual fused cally collected from the Web [16]. For the other queries, the without other resources than provided by the organizers. descriptor is computed as the generic one, because queries The last 2 runs are general runs: run4 and run5 — every- are diverse enough and do not require any adaptation; thing allowed, e.g., human-based or hybrid human-machine approaches, including using data from external sources (e.g., - text information that consists as in the previous edition of Internet). For generating run1 to run3 participants are al- term frequency information, document frequency informa- lowed to use only information that can be extracted from tion and their ratio, i.e., TF-IDF, which is computed on per the provided data (e.g., provided descriptors, descriptors of image basis, per query basis and per user basis (see [17]); their own, etc). - user annotation credibility descriptors that give an au- tomatic estimation of the quality of the users’ tag-image 6. EVALUATION content relationships. These descriptors are extracted by Performance is assessed for both diversity and relevance. visual or textual content mining: visualScore (measure of The following metrics are computed: Cluster Recall at X user image relevance), faceProportion (the percentage of im- (CR@X) — a measure that assesses how many different clus- ages with faces), tagSpecificity (average specificity of a user’s ters from the ground truth are represented among the top tags, where tag specificity is the percentage of users hav- X results (only relevant images are considered), Precision at ing annotated with that tag in a large Flickr corpus), lo- X (P@X) — measures the number of relevant photos among cationSimilarity (average similarity between a user’s geo- the top X results and F1-measure at X (F1@X) — the har- tagged photos and a probabilistic model of a surrounding monic mean of the previous two. Various cut off points are cell), photoCount (total number of images a user shared), to be considered, i.e., X=5, 10, 20, 30, 40, 50. Official rank- uniqueTags (proportion of unique tags), uploadFrequency ing metric is the F1@20 which gives equal importance to (average time between two consecutive uploads), bulkPro- diversity (via CR@20) and relevance (via P@20). This met- portion (the proportion of bulk taggings in a user’s stream, ric simulates the content of a single page of a typical Web i.e., of tag sets that appear identical for at least two dis- image search engine and reflects user behavior, i.e., inspect- tinct photos), meanPhotoViews (mean value of the number ing the first page of results with priority. of times a user’s image has been seen by other members of the community), meanTitleWordCounts (mean value of the number of words found in the titles associated with users’ 7. CONCLUSIONS photos), meanTagsPerPhoto (mean value of the number of The 2016 Retrieving Diverse Social Images task provides tags users put for their images), meanTagRank (mean rank participants with a comparative and collaborative evalua- of a user’s tags in a list in which the tags are sorted in de- tion framework for social image retrieval techniques with scending order according the the number of appearances in a explicit focus on result diversification. This year in particu- large subsample of Flickr images), and meanImageTagClar- lar, the task explores the diversification in the context of a ity (adaptation of the Image Tag Clarity from [18] using as challenging, ad-hoc image retrieval system, which should be individual tag language model a tf/idf language model). able to tackle complex and general-purpose multi-concept queries. Details on the methods and results of each indi- 4 http://creativecommons.org/. vidual participant team can be found in the working note 5 http://caffe.berkeleyvision.org/. papers of the MediaEval 2016 workshop proceedings. 8. REFERENCES Applications, 2014. [1] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, [16] E. Spyromitros-Xioufis, S. Papadopoulos, A. Gı̂nscă, R. Jain, “Content-based Image Retrieval at the End of A. Popescu, I. Kompatsiaris, I. Vlahavas, “Improving the Early Years”, IEEE Transactions on Pattern Diversity in Image Search via Supervised Relevance Analysis and Machine Intelligence, 22(12), pp. 1349 - Scoring”, ACM International Conference on 1380, 2000. Multimedia Retrieval, ACM, Shanghai, China, 2015. [2] R. Datta, D. Joshi, J. Li, J.Z. Wang, “Image Retrieval: [17] B. Ionescu, A. Popescu, M. Lupu, A.L. Gı̂nscă, H. Ideas, Influences, and Trends of the New Age”, ACM Müller, “Retrieving Diverse Social Images at Computing Surveys, 40(2), pp. 1-60, 2008. MediaEval 2014: Challenge, Dataset and Evaluation”, [3] R. Priyatharshini, S. Chitrakala, “Association Based CEUR-WS, Vol. 1263, http://ceur-ws.org/ Image Retrieval: A Survey”, Mobile Communication Vol-1263/mediaeval2014_submission_1.pdf, Spain, and Power Engineering, Springer Communications in 2014. Computer and Information Science, 296, pp. 17-26, [18] A. Sun, S.S. Bhowmick, “Image Tag Clarity: in Search 2013. of Visual-Representative Tags for Social Images”, [4] R.H. van Leuken, L. Garcia, X. Olivares, R. van Zwol, SIGMM workshop on Social media, 2009. “Visual Diversification of Image Search Results”, ACM World Wide Web, pp. 341-350, 2009. [5] M.L. Paramita, M. Sanderson, P. Clough, “Diversity in Photo Retrieval: Overview of the ImageCLEF Photo Task 2009”, ImageCLEF 2009. [6] B. Taneva, M. Kacimi, G. Weikum, “Gathering and Ranking Photos of Named Entities with High Precision, High Recall, and Diversity”, ACM Web Search and Data Mining, pp. 431-440, 2010. [7] S. Rudinac, A. Hanjalic, M.A. Larson, “Generating Visual Summaries of Geographic Areas Using Community-Contributed Images”, IEEE Transactions on Multimedia, 15(4), pp. 921-932, 2013. [8] R. Agrawal, S. Gollapudi, A. Halverson, S. Ieong, “Diversifying Search Results”, ACM International Conference on Web Search and Data Mining, pp. 5-14, 2009. [9] Y. Zhu, Y. Lan, J. Guo, X. Cheng, S. Niu, “Learning for Search Result Diversification”, ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 293-302, 2014. [10] H.-T. Yu, F. Ren, “Search Result Diversification via Filling up Multiple Knapsacks”, ACM International Conference on Conference on Information and Knowledge Management, pp. 609-618, 2014. [11] D.-T. Dang-Nguyen, L. Piras, G. Giacinto, G. Boato, F.G.B. De Natale, “A Hybrid Approach for Retrieving Diverse Social Images of Landmarks”, IEEE International Conference on Multimedia and Expo, pp. 1-6, 2015. [12] B. Ionescu, A.-L. Radu, M. Menéndez, H. Müller, A. Popescu, B. Loni, “Div400: A Social Image Retrieval Result Diversification Dataset”, ACM MMSys, Singapore, 2014. [13] B. Ionescu, A. Popescu, M. Lupu, A.L. Gı̂nscă, B. Boteanu, H. Müller, “Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset”, ACM MMSys, Portland, Oregon, USA, 2015. [14] B. Ionescu, A.L. Gı̂nscă, B. Boteanu, M. Lupu, A. Popescu, H. Müller, “Div150Multi: A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries”, ACM MMSys, Klagenfurt, Austria, 2016. [15] B. Ionescu, A. Popescu, A.-L. Radu, H. Müller, “Result Diversification in Social Image Retrieval: A Benchmarking Framework”, Multimedia Tools and