Clustering and Retrieval of Social Events in Flickr Maia Zaharieva1,2 Daniel Schopfhauser1 Manfred Del Fabro3 Matthias Zeppelzauer4 1 Interactive Media Systems Group, Vienna University of Technology, Austria 2 Multimedia Information Systems Group, University of Vienna, Austria 3 Distributed Multimedia Systems Group, Klagenfurt University, Austria 4 Institute of Creative Media Technologies, St. Pölten Univ. of Applied Sciences, Austria maia.zaharieva@tuwien.ac.at, schopfhauser@ims.tuwien.ac.at manfred.delfabro@aau.at, matthias.zeppelzauer@fhstp.ac.at ABSTRACT is assigned to a single item cluster. At each stage we per- This paper describes our contributions to the Social Event form refinement and merging of previously detected events Detection (SED) task as part of the MediaEval Benchmark by considering a different aspect of the available image in- 2014. We first present an unsupervised approach for the formation, ranging from user and capture time information clustering of social events that builds solely on provided via location data to user-provided textual descriptions. metadata. Results show that already the use of available In the first stage, temporal-based clustering, we employ an time and location information achieves high clustering pre- adaptive approach to merge the initial single item clusters. cision. In the next step, we focus on the retrieval of previ- Since a user can only be present at a single event within ously clustered social events from queries by using temporal, a predefined time span, we explore the time difference be- spatial, and textual cues. tween consecutive images captured by the same user. If it is within a predefined threshold, the corresponding images are assigned to the same event cluster. In the next stage, we 1. INTRODUCTION apply the same adaptive approach for location-based clus- The immense daily growth of publicly available photos tering. If the minimum time and location distances between introduces the need for approaches that are able to effi- two event clusters are within the predefined thresholds, they ciently mine large photo collections. A significant part of are merged. As a result, detected events can vary strongly shared content depicts a variety of different social event in both their duration and size. A different approach for types. Hence, a lot of recent research focuses on the detec- location-based clustering is using a predefined fixed radius tion, classification, and retrieval of social events. The Social for the identification of social events. For every event clus- Event Detection (SED) task of the MediaEval Benchmark ter resulting from the first stage a representative location provides a platform for the development and comparison of is estimated by calculating the sum of distances from each such approaches [2]. geo-tagged photo to all other geo-tagged photos in that clus- In 2014 we participated in subtasks 1 and 2 of the SED ter. The location of the photo with the minimum distance task [1]. The goal of the fist subtask is to build clusters to all other photos is the representative location of the event of photos belonging to the same social event in a large col- cluster. If the estimated locations of two event clusters are lection of Flickr images. We consider this task as an un- within the predefined radius, these clusters are merged and supervised data mining problem and propose a multi-stage the representative location is updated. Event clusters with- approach that uses available metadata only: beginning with out location information remain unchanged in the second the most reliable information (user, time, and GPS data) stage of our approach. to the less reliable one (user-provided textual descriptions). The final stage of our approach is the text and topic- The second subtask focuses on the retrieval of social events based refinement of previously detected clusters. We extract using higher-level information such as the type of the event, term dictionaries and topics using Latent Dirichlet Alloca- entities involved, and location information. We propose an tion (LDA) from the textual metadata of the images. Tem- approach that employs both available metadata and external porally and spatially similar clusters with similar textual de- sources for the identification of relevant events in a provided scriptions are merged by a combined clustering scheme that dataset. takes both topic and term similarity into account. Clus- ter merging and updating is performed iteratively to succes- sively grow clusters. 2. APPROACH 2.1 Social Events Clustering 2.2 Social Events Retrieval We propose an unsupervised, three-stage approach for the For each event cluster we build a TF-IDF representation clustering of images into social events. Initially, each image from the user-generated textual descriptions of the corre- sponding images. Temporal information is extracted from the metadata provided directly from the photo camera. The Copyright is held by the author/owner(s). location in geo-coordinates of a cluster is mined from avail- MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain able coordinates and from the textual descriptions by using Table 1: Clustering results in terms of F1-score (F1) Table 2: Retrieval results in terms of recall (R), and Normalized Mutual Information (NMI). precision (P), and F1-score (F1), averaged over all queries. Development set Test set Development queries Test queries F1 NMI F1 NMI R P F1 R P F1 Run 1 0.9356 0.9873 0.9476 0.9886 Run 1 0.4656 0.8990 0.5367 0.2242 0.4570 0.2287 Run 2 0.9343 0.9872 0.9466 0.9884 Run 2 0.5052 0.8974 0.6192 0.2365 0.3268 0.2109 Run 3 0.9178 0.9840 0.9407 0.9872 Run 3 0.4770 0.4391 0.3838 0.4057 0.4203 0.2877 Run 4 0.9159 0.9836 0.9404 0.9871 Run 5 0.9098 0.9822 0.9386 0.9866 and in run 3 we do not use the pre-trained event type models (unsupervised run). Table 2 shows that run 3 yields the the GeoNames1 database to convert location-specific strings highest performance and best generalization ability over all to geo-coordinates. test queries with an average recall of 0.41 and an average As an optional step, additional topic models for the dif- precision of 0.42. This is remarkable as this run is completely ferent event types (e.g. music events) of the development unsupervised. The performance for the best test query is queries are generated and a one-class support vector machine an F1-score of 74% (query 8). The lowest performance is (SVM) is trained for each event type. For event retrieval, obtained for the test query 4 (F1-score of 8%). The reason a global weight (similarity) determines the importance of a for the differences in the performance lies in the strongly given cluster to the query. The global weight accounts for varying complexity of the queries. Query 8 contains the temporal, spatial (city, country, venue), and textual simi- name of the band ”Mogwai” which is highly discriminative larity (based on TF-IDF). Additionally, the similarity to a and, thus, facilitates the identification of relevant clusters. given event type model is considered if one is available for Query 4 asks for ”community events” which is highly general a given test query. Prior to retrieval, the queries are ex- (without a more specific definition of this category) and, panded by WordNet2 synsets. All events with an overall thus, its performance is low. weight above 1% of the maximum weight observed for all clusters are returned as result. 4. CONCLUSION In this paper we deal with two different aspects in the 3. EXPERIMENTS AND RESULTS context of social events mining in large media collections. We consider the first subtask of social events clustering as 3.1 Social Events Clustering an unsupervised data mining problem and we additionally We submitted five runs for the evaluation of our approach refrain from employing any external sources of information. for social event clustering. Runs 1 and 2 are the result of the Performed experiments demonstrate the strong generaliza- complete system considering temporal-, location-, and text tion ability of the proposed approach and the potential of based clustering. The two runs differ in their location-based fundamental metadata such as location and capture time clustering only: run 1 is using the adaptive-approach and information. The second subtask of social events retrieval run 2 the radius-based one. Runs 3 and 4 are the product indicates the challenge in the mapping between an arbitrary of the combination of the temporal- and location-based ap- user query and predefined event clusters. Experiments with proaches. Eventually, run 5 shows the potential of the use of optional query expansion and training models show that ac- user and time information only. All runs employ the same tually the unsupervised approach that considers available parameter settings: time threshold of 24h, location thresh- metadata only yields robust performance. The interpreta- old of 1km, and textual similarity of either a term dictionary tion of abstract queries without any additional information intersection larger than 0.4 or more than two shared topics. remains an open issue. Table 1 summarizes the results of the evaluation on both the development and test datasets. Achieved results show Acknowledgments that the proposed approach generalizes well to the test data. This work has been partly funded by the Vienna Science The performance on both datasets is highly competitive and Technology Fund (WWTF) through project ICT12-010 given the fact that we only rely on existing metadata. The and the Carinthian Economic Promotion Fund (KWF) un- differences between runs 1 and 2 and between runs 3 and der grant KWF-20214/22573/33955. 4 respectively are negligible and, thus, both location-based approaches deliver robust results for the employed datasets. 5. REFERENCES Noteworthy is run 5 where solely time and user information [1] G. Petkos, S. Papadopoulos, V. Mezaris, and is considered. The results are only slightly lower at signifi- Y. Kompatsiaris. Social event detection at MediaEval cantly lower computational costs in comparison to the text 2014: Challenges, datasets, and evaluation. In mining stage (runs 1 and 2). MediaEval 2014 Multimedia Benchmark Workshop, 2014. 3.2 Social Events Retrieval [2] G. Petkos, S. Papadopoulos, V. Mezaris, R. Troncy, We submitted three runs. Run 1 is the complete system P. Cimiano, T. Reuter, and Y. Kompatsiaris. Social without query expansion. In run 2 we add query expansion event detection at MediaEval: a three-year retrospect 1 http://www.geonames.org of tasks and results. In ACM ICMR 2014 Workshop on 2 http://wordnet.princeton.edu Social Events in Web Multimedia (SEWM), 2014.