=Paper=
{{Paper
|id=None
|storemode=property
|title=A Data-Driven Approach for Social Event Detection
|pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_8.pdf
|volume=Vol-1043
|dblpUrl=https://dblp.org/rec/conf/mediaeval/RafailidisSLSD13
}}
==A Data-Driven Approach for Social Event Detection==
A Data-Driven Approach for Social Event Detection ∗ Dimitrios Rafailidis, Theodoros Semertzidis, Michalis Lazaridis, Michael G. Strintzis , Petros Daras CERTH-Information Technologies Institute, 6th Km. Charilaou-Thermi, Thessaloniki, Greece {drafail,theosem,lazar,daras}@iti.gr, strintzi@eng.auth.gr ABSTRACT possible time by parallelizing the processes and performing In this paper, we present a data-driven approach for chal- a hierarchical-like clustering with single check merging of lenge 1 of the MediaEval 2013 Social Event Detection Task. clusters and without iterative procedures. Thus, we followed Our proposed approach consists of the following steps: (a) the proposed data-driven approach where photos of the same initialization based on the images’ spatio-temporal informa- time and place correspond to the same event. Moreover, in tion; (b) computation of clusters’ intercorrelations; and (c) case of missing time or the geolocation information the re- the final clusters’ generation. In the initialization step, the maining image information is used. images that have both geolocation and time information are clustered analogously, where few “anchored” clusters are gen- erated, while the rest of images with no geolocation or time 2. PROPOSED APPROACH information are considered as singleton (one image) clusters. In this Section, each step of our proposed approach is de- In the second step, all pairwise intercorrelations between the scribed in more detail. “anchored” and the singleton clusters are calculated with the help of an aggregated similarity measure based on the user, Initialization Step Given N images in the database, we title, description tag, and visual information of images. In retrieve A ≤ N images that have both geolocation and time the final step, the “anchored” and singleton clusters derived information, while the rest Csing = N − A images are con- by the initialization step are merged based on the calculated sidered as singleton clusters. The geolocation information is intercorrelations of the second step to generate the final clus- provided as longitude and latitude coordinates, whereas in ters. Our best run achieves a score of 0.5701, 0.8739 and our approach we consider the date that the photo has been 0.5592 for F1-Measure, NMI and Divergence (F1), respec- taken as the time information. Then, images A are clustered tively. in two steps. In the first step, two images I1 , I2 ∈ A that do not differ by more than a predefined threshold l are clus- tered together on the condition that both |I1long − I2long ≤ l| 1. INTRODUCTION and |I1lat − I2lat ≤ l| hold true, where Iilong and Iilat are We hereby present the data-driven approach followed by the the longitude and latitude coordinates of the i-th image. In Visual Computing Lab (http://vcl.iti.gr) of CERTH at the doing so, clusters C1 , . . . , Cr are generated. In the second MediaEval 2013 Social Event Detection Task for challenge step the generated clusters r are split based on the time 1. Details of the task are provided in the paper from Reuter condition that images of a cluster should be within a prede- et al. [3]. Previous works on the field use techniques such as fined time window w. All generated clusters from the two LDA of Vavliakis et al. [5] or spectral clustering of Petkos et steps that contain images A are called “anchored clusters” al. [1] that perform well on small sets but have high prepro- Canc , in the sense that these clusters must not be merged cessing requirements. Our initial motivation was to design together, since the time condition is never satisfied. The an approach that exploits the most of the available informa- final outcome of the initialization step is that (a) the Csing tion while avoiding complex training algorithms and classi- singleton clusters and (b) the Canc “anchored” clusters of fication schemes that do not scale well. Towards this end, the images A where each cluster Ci is associated with a our initial goal was to process the dataset in the shortest minimum Tmin Ci and a maximum Tmax Ci date based on the w ∗T. Semertzidis and M.G. Strintzis are also with the Infor- Ci time window, i.e. Tmax − TminCi ≤ w and Tmax Ci Ci / Tmin is the mation Processing Laboratory, Electrical and Computer En- maximum/minimum date of an image within cluster Ci . gineering Dept., Aristotle University of Thessaloniki, Greece Computation of Cluster Intercorrelations: In the sec- ond step of our approach, we only calculate all possible in- tercorrelations if two examined clusters Ci and Cj are (a) not both “anchored” clusters and (b) satisfy the time con- dition that at least one difference between the associated Ci Ci Cj Cj dates Tmin , Tmax , Tmin , Tmax is lower or equal than the time window w. In case that the time information of one singleton cluster Csing is missing, the aforementioned time Copyright is held by the author/owner(s). MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain constraint is ignored. Then, if the time condition is satis- fied, the intercorrelations between two clusters Ci and Cj are computed as follows. First, each cluster Ci is associated with the three textual vocabularies of tags, titles, descrip- Table 1: Results for challenge 1. tions as well as with a list of users, which are the owners Run F1-Score NMI DIV-F1 of the images of cluster Ci . For each cluster Ci , the dis- Req. Run 0.5698 0.8743 0.5049 tinct tags, titles, descriptions, and users form the respective Gen. Run textual vocabularies and the list of users. For the textual 1 (plain) 0.5631 0.8696 0.4921 information of tags, titles, descriptions we used a Jaccard 2 (plain) 0.5665 0.8727 0.5006 similarity measure to calculate the textual similarity mea- 3 (visual) 0.5662 0.8688 0.4917 sures S tags (Ci , Cj ), S desc (Ci , Cj ) and S title (Ci , Cj ) for each 4 (visual) 0.5701 0.8739 0.5025 pair Ci –Cj . In parallel, the cluster similarity S users (Ci , Cj ) based on the users is computed. between the results on the training and test set. All experi- Using the visual information, we have a set of the Ivis (Ci ) ments were conducted on our distributed environment in the distinct visual neighbors of all images that belong to the context of the EC funded project CUBRIK (see Section 5). same cluster Ci , by aggregating all the visual neighbors k. Thus, the visual similarity between two clusters Ci and Cj is calculated as: 4. DISCUSSION Based on the experimental results of Table 1, we observe |Ivis (Ci ) ∩ Ivis (Cj )| S visual (Ci , Cj ) = that the visual information slightly improves the perfor- |Ivis (Ci ) ∪ Ivis (Cj )| mance of the algorithm. It is not necessarily solving the Finally, the intercorrelations between two clusters Ci and sparsity problem, which is detected by the many zero and Cj are computed using the following aggregated similarity low values of the cluster intercorrelations. This happens be- measure: cause the k visual neighbors of each image are not definitely conceptually similar and thus add noise to the cluster in- S agg (Ci , Cj ) = a1 S users (Ci , Cj ) + a2 S tags (Ci , Cj ) + . . . tercorrelations. This is a very important challenge for many content-based tag propagation methods that try to solve the +a3 S desc (Ci , Cj ) + a4 S title (Ci , Cj ) + a5 S visual (Ci , Cj ) sparsity problem between less annotated images. This issue where each coefficient a1 , a2 , a3 , a4 , a5 expresses the respec- has been termed “learning tag relevance”, based on the se- tive weight of each similarity measure. mantic connections between the assigned tag (or any other textual information) and the content it represents. It must Final Cluster Generation: The final clusters are gen- be revealed to perform as accurate tag propagation. In the erated based on the calculated intercorrelations. For each future, we plan to evaluate the proposed data-driven ap- singleton cluster Csing the maximum intercorrelation with proach using our personalized content-based tag propaga- an “anchored” cluster Canc is computed. If the condition tion method [2], in order to solve the extreme sparsity that S agg (Csing , Canc ) ≤ Mthres , where Mthres is a merging thresh- may occur between the cluster intercorrelations. old, is satisfied then the two clusters are merged. After all pair-wise comparisons between the singleton cluster and the 5. ACKNOWLEDGEMENTS “anchored” ones, the non-merged singleton clusters are com- This work was partially supported by the EC FP7 funded pared with each other in the same way and merged analo- project CUBRIK, ICT- 287704 (www.cubrikproject.eu). gously which thus generates the final clusters. 6. REFERENCES 3. EXPERIMENTS [1] G. Petkos, S. Papadopoulos, and Y. Kompatsiaris. A grid selection strategy was used to compute the optimal Social event detection using multimodal clustering and values of l = 0.05, w = 24 hours, a1 = 0.5, and a2...5 = 0.125. integrating supervisory signals. In ICMR. ACM, 2012. In order to retrieve the visual neighbors (k), we used Op- [2] D. Rafailidis, A. Axenopoulos, J. Etzold, ponent SIFT [4] with a codebook of 1004 dimensions. The S. Manolopoulou, and P. Daras. Content-based tag number of visual neighbors has been set to 20. By varying propagation and tensor factorization for personalized the merging threshold Mthres our best run in the training item recommendation based on social tagging. ACM set with Mthres =0.4 achieves a F1-Measure of 0.8889, a NMI Trans. Interact. Intell. Syst., to appear. of 0.9771, and a DIV-F1 of 0.8076. [3] T. Reuter, S. Papadopoulos, V. Mezaris, P. Cimiano, C. de Vries, and S. Geva. Social Event Detection at The experimental results for the test set are presented in MediaEval 2013: Challenges, datasets, and evaluation. Table 1. For the required run the merging threshold Mthres In MediaEval 2013 Workshop, Barcelona, Spain, is set to 0.003. For the general runs (1-4) it is set to 0.01, October 18-19 2013. 0.005, 0.01 and 0.005, respectively. The remark “visual” de- [4] K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. notes that visual information was used in addition. The Evaluating color descriptors for object and scene reason for using low values of Mthres in our runs in the test recognition. IEEE Trans. on PAMI, 32(9):1582–1596, set is due to the majority of the calculated intercorrelations 2010. which were lower than 0.01, whereas the respective inter- [5] K. N. Vavliakis, F. A. Tzima, and P. A. Mitkas. Event correlations in the training set were lower than 0.4. Higher detection via lda for the mediaeval 2012 sed task. In values of Mthres in the test set generated many singleton MediaEval Workshop, 2012. clusters. The unknown number of clusters and the extremely low cluster intercorrelations can explain the high difference