=Paper=
{{Paper
|id=Vol-1263/paper39
|storemode=property
|title=Clustering and Retrieval of Social Events in Flickr
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_39.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/ZaharievaSFZ14
}}
==Clustering and Retrieval of Social Events in Flickr==
<pdf width="1500px">https://ceur-ws.org/Vol-1263/mediaeval2014_submission_39.pdf</pdf>
<pre>
          Clustering and Retrieval of Social Events in Flickr

       Maia Zaharieva1,2 Daniel Schopfhauser1 Manfred Del Fabro3 Matthias Zeppelzauer4
                      1
                         Interactive Media Systems Group, Vienna University of Technology, Austria
                           2
                             Multimedia Information Systems Group, University of Vienna, Austria
                           3
                             Distributed Multimedia Systems Group, Klagenfurt University, Austria
                4
                  Institute of Creative Media Technologies, St. Pölten Univ. of Applied Sciences, Austria
                               maia.zaharieva@tuwien.ac.at, schopfhauser@ims.tuwien.ac.at
                                manfred.delfabro@aau.at, matthias.zeppelzauer@fhstp.ac.at


ABSTRACT                                                           is assigned to a single item cluster. At each stage we per-
This paper describes our contributions to the Social Event         form refinement and merging of previously detected events
Detection (SED) task as part of the MediaEval Benchmark            by considering a different aspect of the available image in-
2014. We first present an unsupervised approach for the            formation, ranging from user and capture time information
clustering of social events that builds solely on provided         via location data to user-provided textual descriptions.
metadata. Results show that already the use of available              In the first stage, temporal-based clustering, we employ an
time and location information achieves high clustering pre-        adaptive approach to merge the initial single item clusters.
cision. In the next step, we focus on the retrieval of previ-      Since a user can only be present at a single event within
ously clustered social events from queries by using temporal,      a predefined time span, we explore the time difference be-
spatial, and textual cues.                                         tween consecutive images captured by the same user. If it
                                                                   is within a predefined threshold, the corresponding images
                                                                   are assigned to the same event cluster. In the next stage, we
1.    INTRODUCTION                                                 apply the same adaptive approach for location-based clus-
   The immense daily growth of publicly available photos           tering. If the minimum time and location distances between
introduces the need for approaches that are able to effi-          two event clusters are within the predefined thresholds, they
ciently mine large photo collections. A significant part of        are merged. As a result, detected events can vary strongly
shared content depicts a variety of different social event         in both their duration and size. A different approach for
types. Hence, a lot of recent research focuses on the detec-       location-based clustering is using a predefined fixed radius
tion, classification, and retrieval of social events. The Social   for the identification of social events. For every event clus-
Event Detection (SED) task of the MediaEval Benchmark              ter resulting from the first stage a representative location
provides a platform for the development and comparison of          is estimated by calculating the sum of distances from each
such approaches [2].                                               geo-tagged photo to all other geo-tagged photos in that clus-
   In 2014 we participated in subtasks 1 and 2 of the SED          ter. The location of the photo with the minimum distance
task [1]. The goal of the fist subtask is to build clusters        to all other photos is the representative location of the event
of photos belonging to the same social event in a large col-       cluster. If the estimated locations of two event clusters are
lection of Flickr images. We consider this task as an un-          within the predefined radius, these clusters are merged and
supervised data mining problem and propose a multi-stage           the representative location is updated. Event clusters with-
approach that uses available metadata only: beginning with         out location information remain unchanged in the second
the most reliable information (user, time, and GPS data)           stage of our approach.
to the less reliable one (user-provided textual descriptions).        The final stage of our approach is the text and topic-
The second subtask focuses on the retrieval of social events       based refinement of previously detected clusters. We extract
using higher-level information such as the type of the event,      term dictionaries and topics using Latent Dirichlet Alloca-
entities involved, and location information. We propose an         tion (LDA) from the textual metadata of the images. Tem-
approach that employs both available metadata and external         porally and spatially similar clusters with similar textual de-
sources for the identification of relevant events in a provided    scriptions are merged by a combined clustering scheme that
dataset.                                                           takes both topic and term similarity into account. Clus-
                                                                   ter merging and updating is performed iteratively to succes-
                                                                   sively grow clusters.
2.    APPROACH
2.1    Social Events Clustering                                    2.2    Social Events Retrieval
  We propose an unsupervised, three-stage approach for the            For each event cluster we build a TF-IDF representation
clustering of images into social events. Initially, each image     from the user-generated textual descriptions of the corre-
                                                                   sponding images. Temporal information is extracted from
                                                                   the metadata provided directly from the photo camera. The
Copyright is held by the author/owner(s).                          location in geo-coordinates of a cluster is mined from avail-
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain     able coordinates and from the textual descriptions by using
Table 1: Clustering results in terms of F1-score (F1)             Table 2: Retrieval results in terms of recall (R),
and Normalized Mutual Information (NMI).                          precision (P), and F1-score (F1), averaged over all
                                                                  queries.
                  Development set         Test set                            Development queries            Test queries
                    F1     NMI          F1      NMI                           R       P       F1         R        P       F1
          Run 1   0.9356 0.9873       0.9476 0.9886                 Run 1   0.4656 0.8990 0.5367       0.2242 0.4570 0.2287
          Run 2   0.9343 0.9872       0.9466 0.9884                 Run 2   0.5052 0.8974 0.6192       0.2365 0.3268 0.2109
          Run 3   0.9178 0.9840       0.9407 0.9872                 Run 3   0.4770 0.4391 0.3838       0.4057 0.4203 0.2877
          Run 4   0.9159 0.9836       0.9404 0.9871
          Run 5   0.9098 0.9822       0.9386 0.9866
                                                                  and in run 3 we do not use the pre-trained event type models
                                                                  (unsupervised run). Table 2 shows that run 3 yields the
the GeoNames1 database to convert location-specific strings       highest performance and best generalization ability over all
to geo-coordinates.                                               test queries with an average recall of 0.41 and an average
   As an optional step, additional topic models for the dif-      precision of 0.42. This is remarkable as this run is completely
ferent event types (e.g. music events) of the development         unsupervised. The performance for the best test query is
queries are generated and a one-class support vector machine      an F1-score of 74% (query 8). The lowest performance is
(SVM) is trained for each event type. For event retrieval,        obtained for the test query 4 (F1-score of 8%). The reason
a global weight (similarity) determines the importance of a       for the differences in the performance lies in the strongly
given cluster to the query. The global weight accounts for        varying complexity of the queries. Query 8 contains the
temporal, spatial (city, country, venue), and textual simi-       name of the band ”Mogwai” which is highly discriminative
larity (based on TF-IDF). Additionally, the similarity to a       and, thus, facilitates the identification of relevant clusters.
given event type model is considered if one is available for      Query 4 asks for ”community events” which is highly general
a given test query. Prior to retrieval, the queries are ex-       (without a more specific definition of this category) and,
panded by WordNet2 synsets. All events with an overall            thus, its performance is low.
weight above 1% of the maximum weight observed for all
clusters are returned as result.                                  4.   CONCLUSION
                                                                     In this paper we deal with two different aspects in the
3.     EXPERIMENTS AND RESULTS                                    context of social events mining in large media collections.
                                                                  We consider the first subtask of social events clustering as
3.1      Social Events Clustering                                 an unsupervised data mining problem and we additionally
   We submitted five runs for the evaluation of our approach      refrain from employing any external sources of information.
for social event clustering. Runs 1 and 2 are the result of the   Performed experiments demonstrate the strong generaliza-
complete system considering temporal-, location-, and text        tion ability of the proposed approach and the potential of
based clustering. The two runs differ in their location-based     fundamental metadata such as location and capture time
clustering only: run 1 is using the adaptive-approach and         information. The second subtask of social events retrieval
run 2 the radius-based one. Runs 3 and 4 are the product          indicates the challenge in the mapping between an arbitrary
of the combination of the temporal- and location-based ap-        user query and predefined event clusters. Experiments with
proaches. Eventually, run 5 shows the potential of the use of     optional query expansion and training models show that ac-
user and time information only. All runs employ the same          tually the unsupervised approach that considers available
parameter settings: time threshold of 24h, location thresh-       metadata only yields robust performance. The interpreta-
old of 1km, and textual similarity of either a term dictionary    tion of abstract queries without any additional information
intersection larger than 0.4 or more than two shared topics.      remains an open issue.
   Table 1 summarizes the results of the evaluation on both
the development and test datasets. Achieved results show          Acknowledgments
that the proposed approach generalizes well to the test data.     This work has been partly funded by the Vienna Science
The performance on both datasets is highly competitive            and Technology Fund (WWTF) through project ICT12-010
given the fact that we only rely on existing metadata. The        and the Carinthian Economic Promotion Fund (KWF) un-
differences between runs 1 and 2 and between runs 3 and           der grant KWF-20214/22573/33955.
4 respectively are negligible and, thus, both location-based
approaches deliver robust results for the employed datasets.      5.   REFERENCES
Noteworthy is run 5 where solely time and user information        [1] G. Petkos, S. Papadopoulos, V. Mezaris, and
is considered. The results are only slightly lower at signifi-        Y. Kompatsiaris. Social event detection at MediaEval
cantly lower computational costs in comparison to the text            2014: Challenges, datasets, and evaluation. In
mining stage (runs 1 and 2).                                          MediaEval 2014 Multimedia Benchmark Workshop,
                                                                      2014.
3.2      Social Events Retrieval                                  [2] G. Petkos, S. Papadopoulos, V. Mezaris, R. Troncy,
  We submitted three runs. Run 1 is the complete system               P. Cimiano, T. Reuter, and Y. Kompatsiaris. Social
without query expansion. In run 2 we add query expansion              event detection at MediaEval: a three-year retrospect
1
    http://www.geonames.org                                           of tasks and results. In ACM ICMR 2014 Workshop on
2
    http://wordnet.princeton.edu                                      Social Events in Web Multimedia (SEWM), 2014.

</pre>