=Paper= {{Paper |id=Vol-1739/MediaEval_2016_paper_5 |storemode=property |title=Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation |pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_5.pdf |volume=Vol-1739 |dblpUrl=https://dblp.org/rec/conf/mediaeval/IonescuGZBLM16 }} ==Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation== https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_5.pdf
       Retrieving Diverse Social Images at MediaEval 2016:
                Challenge, Dataset and Evaluation

               Bogdan Ionescu                  Alexandru Lucian Gînscă                    Maia Zaharieva∗
         LAPI, University Politehnica of              CEA, LIST, France              University of Vienna & Vienna
             Bucharest, Romania                      alexandru.ginsca@cea.fr           University of Technology,
             bionescu@alpha.imag.pub.ro                                                          Austria
                                                                                        maia.zaharieva@univie.ac.at

               Bogdan Boteanu                            Mihai Lupu                         Henning Müller
         LAPI, University Politehnica of             Vienna University of           HES-SO, University of Applied
             Bucharest, Romania                      Technology, Austria            Sciences Western Switzerland
             bboteanu@alpha.imag.pub.ro               lupu@ifs.tuwien.ac.at               henning.mueller@hevs.ch


ABSTRACT                                                           Search1 ). Participants are required, given a ranked list of
This paper provides an overview of the Retrieving Diverse          query-related photos retrieved from Flickr2 , to refine the
Social Images task that is organized as part of the Media-         results by providing a set of images that are at the same time
Eval 2016 Benchmarking Initiative for Multimedia Evalua-           relevant to the query and to provide a diversified summary
tion. The task addresses the problem of result diversification     of it. Compared to the previous editions, this year’s task
in the context of social photo retrieval where images, meta-       includes complex and general-purpose multi-concept queries.
data, text information, user tagging profiles and content and         The requirements of the task are to refine these results
text models are available for processing. We present the task      by providing a ranked list of up to 50 photos that are both
challenges, the proposed data set and ground truth, the re-        relevant and diverse representations of the query, according
quired participant runs and the evaluation metrics.                to the following definitions:
                                                                   Relevance: a photo is considered to be relevant for the
                                                                   query if it is a common photo representation of the query
1. INTRODUCTION                                                    topics (all at once). Bad quality photos (e.g., severely blurred,
   An efficient image retrieval system should be able to present     out of focus, etc.) are not considered relevant in this sce-
results that are both relevant and that are covering different      nario;
aspects, i.e., diversity, of the query. By diversifying the
                                                                   Diversity: a set of photos is considered to be diverse if
pool of possible results, one can increase the likelihood of
                                                                   it depicts different visual characteristics of the query topics
providing the user with the information needed. Relevance
                                                                   and subtopics with a certain degree of complementarity, i.e.,
was more thoroughly studied in existing literature than di-
                                                                   most of the perceived visual information is different from one
versification [1, 2, 3], especially within the text community.
                                                                   photo to another.
Even though a considerable amount of diversification litera-
ture exists [8, 9, 10], the topic remains important, especially       To carry out the refinement and diversification tasks, par-
in the emerging fields of social multimedia [4, 5, 6, 7, 11].      ticipants may use the social metadata associated with the
   The 2016 Retrieving Diverse Social Images task is a fo-         images, the visual characteristics of the images, informa-
llowup of the 2015 edition [14, 13, 12, 15] and aims to foster     tion related to user tagging credibility (an estimation of the
new technology to improve both relevance and diversifica-          global quality of tag-image content relationships for a user’s
tion of search results with explicit emphasis on the actual        contributions) or external resources (e.g., the Internet).
social media context. The task was designed to support eval-
uation of techniques emerging from a wide range of research        3.    DATASET
fields, such as image retrieval (text, vision, multimedia com-
                                                                     The 2016 data consists of a development set (devset) con-
munities), machine learning, relevance feedback and natural
                                                                   taining 70 queries (20,757 Flickr photos — including 35
language processing, but not limited to these.
                                                                   multi-topic queries related to events and states associated
                                                                   with locations from the 2015 dataset [14]), a user annotation
2. TASK DESCRIPTION                                                credibility set (credibilityset) containing information for ca.
  The task is built around the use case of a general ad-hoc        300 location-based queries and 685 users (different than the
image retrieval system, which provides the user with diverse       ones in devset and testset — updated version of the 2015
representations of the queries (see for instance Google Image      dataset [14]), a set providing semantic vectors for general
                                                                   English terms computed on top of the English Wikipedia3
∗
  This task is partly supported by the Vienna Science and          (wikiset), which could help the participants in developing
Technology Fund (WWTF) through project ICT12-010.                  advanced text models, and a test set (testset) containing 65
                                                                   1
                                                                     https://images.google.com/.
Copyright is held by the author/owner(s).                          2
MediaEval 2016 Workshop, October 20-21, 2016, Hilversum,             https://www.flickr.com/.
                                                                   3
Netherlands.                                                         https://en.wikipedia.org/.
queries (19,017 Flickr photos).                                     4.   GROUND TRUTH
  Each query is provided with the following information:               Both relevance and diversity annotations were carried out
query text formulation (the actual query formulation used           by expert annotators. For relevance, annotators were asked
on Flickr to retrieve all the data), a ranked list of up to 300     to label each photo (one at a time) as being relevant (value
photos in jpeg format retrieved from Flickr using Flickr’s          1), non-relevant (0) or with “don’t know” (-1). For devset, 9
default “relevance” algorithm (all photos are Creative Com-         annotators were involved, for credibilityset 9 and for testset
mons licensed allowing redistribution4 ), an xml file contain-      8. The data was partitioned among annotators such that
ing metadata from Flickr for all the retrieved photos (e.g.,        in the end each image has been marked by 3 different an-
photo title, photo description, photo id, tags, Creative Com-       notators. The final relevance ground truth was determined
mon license type, the url link of the photo location from           after a lenient majority voting scheme. For diversity, only
Flickr, the photo owner’s name, user id, the number of times        the photos that were judged as relevant in the previous step
the photo has been displayed, etc), and ground truth for            were considered. For each query, annotators were provided
both relevance and diversity.                                       with a thumbnail list of all relevant photos. After getting fa-
  Apart from the metadata, to facilitate participation from         miliar with their contents, they were asked to re-group the
various communities, we also provide the following content          photos into clusters with similar visual appearance (up to
descriptors:                                                        25). Devset and testset were annotated by 5 persons, each
- convolutional neural network based descriptors — generic          of them annotating distinct parts of the data (leading to only
CNN based on the reference convolutional neural network             one annotation). An additional annotator acted as a master
(CNN) model provided along with the Caffe framework5                 annotator and reviewed once more the final annotations.
(this model is learned with the 1,000 ImageNet classes used
during the ImageNet challenge); and an adapted CNN based            5.   RUN DESCRIPTION
on a CNN model obtained with an identical architecture to
that of the Caffe reference model. Adaptation is done only              Participants were allowed to submit up to 5 runs. The
for the 2015 location-based multi-topic queries (35 queries         first 3 are required runs: run1 — automated using visual
from the devset), i.e., the model is learned with 1,000 tourist     information only; run2 — automated using text informa-
points of interest classes of which the images were automati-       tion only; and run3 — automated using text-visual fused
cally collected from the Web [16]. For the other queries, the       without other resources than provided by the organizers.
descriptor is computed as the generic one, because queries          The last 2 runs are general runs: run4 and run5 — every-
are diverse enough and do not require any adaptation;               thing allowed, e.g., human-based or hybrid human-machine
                                                                    approaches, including using data from external sources (e.g.,
- text information that consists as in the previous edition of      Internet). For generating run1 to run3 participants are al-
term frequency information, document frequency informa-             lowed to use only information that can be extracted from
tion and their ratio, i.e., TF-IDF, which is computed on per        the provided data (e.g., provided descriptors, descriptors of
image basis, per query basis and per user basis (see [17]);         their own, etc).
- user annotation credibility descriptors that give an au-
tomatic estimation of the quality of the users’ tag-image           6.   EVALUATION
content relationships. These descriptors are extracted by
                                                                       Performance is assessed for both diversity and relevance.
visual or textual content mining: visualScore (measure of
                                                                    The following metrics are computed: Cluster Recall at X
user image relevance), faceProportion (the percentage of im-
                                                                    (CR@X) — a measure that assesses how many different clus-
ages with faces), tagSpecificity (average specificity of a user’s
                                                                    ters from the ground truth are represented among the top
tags, where tag specificity is the percentage of users hav-
                                                                    X results (only relevant images are considered), Precision at
ing annotated with that tag in a large Flickr corpus), lo-
                                                                    X (P@X) — measures the number of relevant photos among
cationSimilarity (average similarity between a user’s geo-
                                                                    the top X results and F1-measure at X (F1@X) — the har-
tagged photos and a probabilistic model of a surrounding
                                                                    monic mean of the previous two. Various cut off points are
cell), photoCount (total number of images a user shared),
                                                                    to be considered, i.e., X=5, 10, 20, 30, 40, 50. Official rank-
uniqueTags (proportion of unique tags), uploadFrequency
                                                                    ing metric is the F1@20 which gives equal importance to
(average time between two consecutive uploads), bulkPro-
                                                                    diversity (via CR@20) and relevance (via P@20). This met-
portion (the proportion of bulk taggings in a user’s stream,
                                                                    ric simulates the content of a single page of a typical Web
i.e., of tag sets that appear identical for at least two dis-
                                                                    image search engine and reflects user behavior, i.e., inspect-
tinct photos), meanPhotoViews (mean value of the number
                                                                    ing the first page of results with priority.
of times a user’s image has been seen by other members of
the community), meanTitleWordCounts (mean value of the
number of words found in the titles associated with users’          7.   CONCLUSIONS
photos), meanTagsPerPhoto (mean value of the number of                 The 2016 Retrieving Diverse Social Images task provides
tags users put for their images), meanTagRank (mean rank            participants with a comparative and collaborative evalua-
of a user’s tags in a list in which the tags are sorted in de-      tion framework for social image retrieval techniques with
scending order according the the number of appearances in a         explicit focus on result diversification. This year in particu-
large subsample of Flickr images), and meanImageTagClar-            lar, the task explores the diversification in the context of a
ity (adaptation of the Image Tag Clarity from [18] using as         challenging, ad-hoc image retrieval system, which should be
individual tag language model a tf/idf language model).             able to tackle complex and general-purpose multi-concept
                                                                    queries. Details on the methods and results of each indi-
4
    http://creativecommons.org/.                                    vidual participant team can be found in the working note
5
    http://caffe.berkeleyvision.org/.                               papers of the MediaEval 2016 workshop proceedings.
8. REFERENCES                                                       Applications, 2014.
 [1] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta,       [16] E. Spyromitros-Xioufis, S. Papadopoulos, A. Gı̂nscă,
     R. Jain, “Content-based Image Retrieval at the End of          A. Popescu, I. Kompatsiaris, I. Vlahavas, “Improving
     the Early Years”, IEEE Transactions on Pattern                 Diversity in Image Search via Supervised Relevance
     Analysis and Machine Intelligence, 22(12), pp. 1349 -          Scoring”, ACM International Conference on
     1380, 2000.                                                    Multimedia Retrieval, ACM, Shanghai, China, 2015.
 [2] R. Datta, D. Joshi, J. Li, J.Z. Wang, “Image Retrieval:   [17] B. Ionescu, A. Popescu, M. Lupu, A.L. Gı̂nscă, H.
     Ideas, Influences, and Trends of the New Age”, ACM             Müller, “Retrieving Diverse Social Images at
     Computing Surveys, 40(2), pp. 1-60, 2008.                      MediaEval 2014: Challenge, Dataset and Evaluation”,
 [3] R. Priyatharshini, S. Chitrakala, “Association Based           CEUR-WS, Vol. 1263, http://ceur-ws.org/
     Image Retrieval: A Survey”, Mobile Communication               Vol-1263/mediaeval2014_submission_1.pdf, Spain,
     and Power Engineering, Springer Communications in              2014.
     Computer and Information Science, 296, pp. 17-26,         [18] A. Sun, S.S. Bhowmick, “Image Tag Clarity: in Search
     2013.                                                          of Visual-Representative Tags for Social Images”,
 [4] R.H. van Leuken, L. Garcia, X. Olivares, R. van Zwol,          SIGMM workshop on Social media, 2009.
     “Visual Diversification of Image Search Results”, ACM
     World Wide Web, pp. 341-350, 2009.
 [5] M.L. Paramita, M. Sanderson, P. Clough, “Diversity
     in Photo Retrieval: Overview of the ImageCLEF
     Photo Task 2009”, ImageCLEF 2009.
 [6] B. Taneva, M. Kacimi, G. Weikum, “Gathering and
     Ranking Photos of Named Entities with High
     Precision, High Recall, and Diversity”, ACM Web
     Search and Data Mining, pp. 431-440, 2010.
 [7] S. Rudinac, A. Hanjalic, M.A. Larson, “Generating
     Visual Summaries of Geographic Areas Using
     Community-Contributed Images”, IEEE Transactions
     on Multimedia, 15(4), pp. 921-932, 2013.
 [8] R. Agrawal, S. Gollapudi, A. Halverson, S. Ieong,
     “Diversifying Search Results”, ACM International
     Conference on Web Search and Data Mining, pp. 5-14,
     2009.
 [9] Y. Zhu, Y. Lan, J. Guo, X. Cheng, S. Niu, “Learning
     for Search Result Diversification”, ACM SIGIR
     Conference on Research and Development in
     Information Retrieval, pp. 293-302, 2014.
[10] H.-T. Yu, F. Ren, “Search Result Diversification via
     Filling up Multiple Knapsacks”, ACM International
     Conference on Conference on Information and
     Knowledge Management, pp. 609-618, 2014.
[11] D.-T. Dang-Nguyen, L. Piras, G. Giacinto, G. Boato,
     F.G.B. De Natale, “A Hybrid Approach for Retrieving
     Diverse Social Images of Landmarks”, IEEE
     International Conference on Multimedia and Expo,
     pp. 1-6, 2015.
[12] B. Ionescu, A.-L. Radu, M. Menéndez, H. Müller, A.
     Popescu, B. Loni, “Div400: A Social Image Retrieval
     Result Diversification Dataset”, ACM MMSys,
     Singapore, 2014.
[13] B. Ionescu, A. Popescu, M. Lupu, A.L. Gı̂nscă, B.
     Boteanu, H. Müller, “Div150Cred: A Social Image
     Retrieval Result Diversification with User Tagging
     Credibility Dataset”, ACM MMSys, Portland, Oregon,
     USA, 2015.
[14] B. Ionescu, A.L. Gı̂nscă, B. Boteanu, M. Lupu, A.
     Popescu, H. Müller, “Div150Multi: A Social Image
     Retrieval Result Diversification Dataset with
     Multi-topic Queries”, ACM MMSys, Klagenfurt,
     Austria, 2016.
[15] B. Ionescu, A. Popescu, A.-L. Radu, H. Müller,
     “Result Diversification in Social Image Retrieval: A
     Benchmarking Framework”, Multimedia Tools and