=Paper=
{{Paper
|id=None
|storemode=property
|title=Event Clustering and Classification from Social Media: Watershed-based and Kernel Methods
|pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_36.pdf
|volume=Vol-1043
|dblpUrl=https://dblp.org/rec/conf/mediaeval/NguyenDMSNB13
}}
==Event Clustering and Classification from Social Media: Watershed-based and Kernel Methods==
Event Clustering and Classification from Social Media: Watershed-based and kernel methods Truc-Vien T. Nguyen Minh-Son Dao University of Lugano University of Information Technology 6900 Lugano, Switzerland Viet-Nam National University HCMC thi.truc.vien.nguyen@usi.ch sondm@uit.edu.vn Riccardo Mattivi, Emanuele Sansone, Francesco G.B De Natale, Giulia Boato mmLab - University of Trento, Italy 38123 Povo (TN), Italy {rmattivi, sansone, denatale, boato}@disi.unitn.it ABSTRACT tagging, and sharing that event over the network. From this In this paper, we present the methods for event clustering progress, two crucial clues are deduced: (1) people cannot and classification defined by MediaEval 2013. For event be involved in more than one event at the same time, and (2) clustering, the watershed-based method with external data people tend to introduce similar annotations for all images sources is used. Based on two main observations, the whole associated to the same event. From these two ideas, a user- metadata is turned into a user-time (UT) image, so that each centric data structure, namely UT-image, is introduced for row of an image contains all records that belong to one user; storing data in a special structure that can help to exploit and the records are sorted by time. For event classification, and explore all observations mentioned above. we use supervised machine learning and experiment with The UT-image (user-time image) is a 2D data structure Support Vector Machines. We present a composite kernel where row ith contains all time-ordered data related to user to jointly learn between text and visual features. The meth- ith (i.e. time taken of UT-image(i, j) is smaller or equal ods prove robustness with F-measure up to 98% in challenge to time taken of UT-image(i, j+1)) (for more details please 1, and the composite kernel yields competitive performance refer to [1]). The user-centric split-and- merge procedure is across different event types in challenge 2. described in algorithm 1. Categories and Subject Descriptors Data: data that need to be clustered Result: set of clusters H.3 [Information Storage and Retrieval]: Information 1. Translate the original data into UT-image format. Search and Retrieval 2. For each row ith of UT-image do (#Splitting stage) 2.1 repeat 2.1.1 Split data at column j th if |time-taken-of-UT-image(i, j) - 1. INTRODUCTION time-taken-of-UT-image(i, j + 1)| <= time threshold. This paper describes the social event detection method that until cannot split anymore; 3. repeat is specially built to meet challenge 1 and 2 of MediaEval 3.1. For each cluster, create time-taken boundary (e.g. 2013 [4]. We also report and discuss the advantages and [time-start, time-end]), and a union set of not null (longitude, productivity of the methods based on the result evaluated latitude), tags, titles, and descriptions, respectively. 3.2. For any pair of clusters do MERGING if the following by MediaEval 2013. conditions are hold - time-taken boundary intersection does NOT EMPTY or differ 2. CHALLENGE 1 not MORE THAN time threshold The method proposed for tackling the challenge 1 of SED - distance difference between two sets of not null (longitude, 2013 is mostly inherited from the method introduced in [1]. latitude) is SMALLER than distance threshold - Jaccard index of two sets of tags/title/description is LARGER The idea is based on the basic progress of how an event is than tag threshold populated on social networks: (1) the user takes pictures or until cannot merge anymore; records videos at the time that event happens; (2) next, the 4. End Algorithm 1: User-centric split-and-merge user uploads, annotates, and shares his/her media into one social network; (3) then, his/her friends start commenting, In order to increase the accuracy of merging stage, “com- mon sense” is taken into account to find the most “common pattern” in the tags field (e.g. the most common word users tend to tag for the same event). Here, TF-IDF method is applied on tags of each cluster to extract the most com- mon keywords. These keywords are used as the main clue to merge clusters. The common sense merging procedure is described in algorithm 2. Copyright is held by the author/owner(s). MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain We used algorithm 1 for run 1, algorithm 1 with different parameters for run 2, and both algorithm 1 and 2 for run 3. Data: set of clusters generated by algorithm 1 Result: set of new clusters 3.3 Combine features 1. For each cluster, process TF-IDF on tags set and select the most In run 5, we used a composite kernel to combine between common keywords to create a “new common sense tags” set. text and visual features CK = α · KT + (1 − α) · KV where 2. For each row ith of UT-image do (#Splitting stage) α is a coefficient, KT and KV is either the kernel applied to 2.1 repeat 2.1. For any two clusters, MERGING if Jaccard index of two text or visual features. We experimented with α = 0.5. “new common sense tags” sets is LARGER then tag threshold. 2.2. Process TF-IDF on “new common sense tags” set, select the 4. RESULTS AND CONCLUSIONS most common keywords, and update this set. until cannot merge anymore; The results are reported in tables 1 and 2. In general, 3. End the proposed method proves as very competitive whereas Algorithm 2: Common sense merging there is still room for improvement. With challenge 1, the watershed-based works well with the results being up to 3. CHALLENGE 2 98%. For challenge 2, the classification event vs. non-event is acceptable in almost every run, as well as the detection of To tackle the task event classfication in challenge 2, we use some classes. In the last run, with the composite kernel to supervised machine learning. We experiment with Support combine between text and visual features, we have 5 classes Vector Machines, and design a composite kernel to jointly out of 9 above 55%. learn between text and visual features. Obviously, we have followed the supervised machine learning 3.1 Text features for challenge 2, so it could not be learnt efficiently with only 1 The data is processed using GATE platform for tokeniza- 36 positive instances of the class “fashion”, it may be better tion, POS tagging and basic word features. We used Sup- if we used rule-based instead. Moreover, it is not trivial to port Vector Machines to train and test our binary classifier. provide a good detection on the class “other events”, which is Here, event classification is formulated as a multiclass clas- a rather undefined class. The combination did the best with sification problem. The One Vs. Rest strategy is employed class “theater dance”. Meanwhile, we also observe that the by selecting the instance with largest margin as the final class “exhibition” has 272 positive instances but could not answer. For experimentation, we use 5-fold cross-validation be learnt with any kind of features and should be studied in with the svm-light tool2 . The feature set for our learning more detail. framework is described as follow. Run F1 NMI Div F1 1. wi is text of the title, description, or the tag in each event 1 92.34 98.29 87.05 2. li is the word wi in lower-case 2 93.16 98.48 87.88 3 93.20 98.49 87.93 3. p1i , p2i , p3i , p4i are the four prefixes of wi 4. s1i , s2i , s3i , s4i are the four suffixes of wi Table 1: Results of Challenge 1 5. fi is the part-of-speech of wi Run F1 Div F1 Event/No F1 Event/No Div F1 6. gi is the orthographic feature that test whether a word contains 1 44.84 33.96 71.30 21.96 all upper-cased, initial letter upper-cased, all lower-cased. 2 44.95 34.08 71.32 21.96 3 36.31 25.31 88.54 39.08 7. ki is the word form feature that test whether a token is a word, 4 24.30 13.53 87.10 37.61 a number, a symbol, a punctuation mark. 5 42.20 31.45 72.18 22.42 8. oi is the ontological features. We used the ontology and knowl- edge base developed in [3], which contains 355 classes, 99 prop- Table 2: Results of Challenge 2 erties, and more than 100,000 entities. Given a full ontology, wi is be matched to the deepest subsumed child class. Run 1 was done without external resources, i.e., ontological 5. REFERENCES features whereas all the features were used in run 2. [1] M.-S. Dao, G. Boato, F. G. De Natale, and T.-V. T. 3.2 Visual features Nguyen. Jointly exploiting visual and non-visual For run 3, the image feature extraction was performed in information for event-related social media retrieval. In a similar manner as in [2], and the SVMs, with the same Proceedings of the 3rd ACM conference on settings as in [2], were trained with the data available in International conference on multimedia retrieval the SED training set. Since the training set was unbalanced (ICMR), pages 159–166. ACM, 2013. in the number of samples for each class, mainly towards [2] R. Mattivi, J. Uijlings, F. G. De Natale, and N. Sebe. a higher number of samples from the ’non-event’ type, we Exploitation of time constraints for (sub-)event balanced the training set samples used to train our SVM by recognition. In Proceedings of the 2011 joint ACM reducing the number of samples from the ’non-event’ class. workshop on Modeling and representing events, pages 7–12, New York, NY, USA, 2011. ACM. Run 4 used the same approach, but the classification fol- lowed a two-step classification procedure. Firstly, a classi- [3] B. Popov, A. Kiryakov, D. Ognyanoff, D. Manov, and fier was learnt with only ’event’ and ’non-event’ classes, and A. Kirilov. KIM a semantic platform for information secondly another classifier was trained with the remaining extraction and retrieval. Natural Language Engineering, eight classes belonging to the different type of events. Run 3 10(3-4):375–392, Sept. 2004. and run 4 did not use time information metadata associated [4] T. Reuter, S. Papadopoulos, V. Mezaris, P. Cimiano, with images. C. de Vries, and S. Geva. Social event detection at mediaeval 2013: Challenges, datasets, and evaluation. 1 http://gate.ac.uk/ In Proceedings of MediaEval 2013, Barcelona, Spain, 2 http://svmlight.joachims.org/ October 2013.