=Paper= {{Paper |id=None |storemode=property |title=Event Clustering and Classification from Social Media: Watershed-based and Kernel Methods |pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_36.pdf |volume=Vol-1043 |dblpUrl=https://dblp.org/rec/conf/mediaeval/NguyenDMSNB13 }} ==Event Clustering and Classification from Social Media: Watershed-based and Kernel Methods== https://ceur-ws.org/Vol-1043/mediaeval2013_submission_36.pdf

Event Clustering and Classification from Social Media:
Watershed-based and kernel methods

Truc-Vien T. Nguyen Minh-Son Dao
University of Lugano University of Information Technology
6900 Lugano, Switzerland Viet-Nam National University HCMC
thi.truc.vien.nguyen@usi.ch sondm@uit.edu.vn
Riccardo Mattivi, Emanuele Sansone,
Francesco G.B De Natale, Giulia Boato
mmLab - University of Trento, Italy
38123 Povo (TN), Italy
{rmattivi, sansone, denatale, boato}@disi.unitn.it

ABSTRACT tagging, and sharing that event over the network. From this
In this paper, we present the methods for event clustering progress, two crucial clues are deduced: (1) people cannot
and classification defined by MediaEval 2013. For event be involved in more than one event at the same time, and (2)
clustering, the watershed-based method with external data people tend to introduce similar annotations for all images
sources is used. Based on two main observations, the whole associated to the same event. From these two ideas, a user-
metadata is turned into a user-time (UT) image, so that each centric data structure, namely UT-image, is introduced for
row of an image contains all records that belong to one user; storing data in a special structure that can help to exploit
and the records are sorted by time. For event classification, and explore all observations mentioned above.
we use supervised machine learning and experiment with The UT-image (user-time image) is a 2D data structure
Support Vector Machines. We present a composite kernel where row ith contains all time-ordered data related to user
to jointly learn between text and visual features. The meth- ith (i.e. time taken of UT-image(i, j) is smaller or equal
ods prove robustness with F-measure up to 98% in challenge to time taken of UT-image(i, j+1)) (for more details please
1, and the composite kernel yields competitive performance refer to [1]). The user-centric split-and- merge procedure is
across different event types in challenge 2. described in algorithm 1.

Categories and Subject Descriptors Data: data that need to be clustered
Result: set of clusters
H.3 [Information Storage and Retrieval]: Information 1. Translate the original data into UT-image format.
Search and Retrieval 2. For each row ith of UT-image do (#Splitting stage)
2.1 repeat
2.1.1 Split data at column j th if |time-taken-of-UT-image(i, j) -
1. INTRODUCTION time-taken-of-UT-image(i, j + 1)| <= time threshold.
This paper describes the social event detection method that until cannot split anymore;
3. repeat
is specially built to meet challenge 1 and 2 of MediaEval 3.1. For each cluster, create time-taken boundary (e.g.
2013 [4]. We also report and discuss the advantages and [time-start, time-end]), and a union set of not null (longitude,
productivity of the methods based on the result evaluated latitude), tags, titles, and descriptions, respectively.
3.2. For any pair of clusters do MERGING if the following
by MediaEval 2013. conditions are hold
- time-taken boundary intersection does NOT EMPTY or differ
2. CHALLENGE 1 not MORE THAN time threshold
The method proposed for tackling the challenge 1 of SED - distance difference between two sets of not null (longitude,
2013 is mostly inherited from the method introduced in [1]. latitude) is SMALLER than distance threshold
- Jaccard index of two sets of tags/title/description is LARGER
The idea is based on the basic progress of how an event is than tag threshold
populated on social networks: (1) the user takes pictures or until cannot merge anymore;
records videos at the time that event happens; (2) next, the 4. End
Algorithm 1: User-centric split-and-merge
user uploads, annotates, and shares his/her media into one
social network; (3) then, his/her friends start commenting, In order to increase the accuracy of merging stage, “com-
mon sense” is taken into account to find the most “common
pattern” in the tags field (e.g. the most common word users
tend to tag for the same event). Here, TF-IDF method is
applied on tags of each cluster to extract the most com-
mon keywords. These keywords are used as the main clue
to merge clusters. The common sense merging procedure is
described in algorithm 2.
Copyright is held by the author/owner(s).
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain We used algorithm 1 for run 1, algorithm 1 with different
parameters for run 2, and both algorithm 1 and 2 for run 3.
Data: set of clusters generated by algorithm 1
Result: set of new clusters
3.3 Combine features
1. For each cluster, process TF-IDF on tags set and select the most In run 5, we used a composite kernel to combine between
common keywords to create a “new common sense tags” set. text and visual features CK = α · KT + (1 − α) · KV where
2. For each row ith of UT-image do (#Splitting stage) α is a coefficient, KT and KV is either the kernel applied to
2.1 repeat
2.1. For any two clusters, MERGING if Jaccard index of two text or visual features. We experimented with α = 0.5.
“new common sense tags” sets is LARGER then tag threshold.
2.2. Process TF-IDF on “new common sense tags” set, select the 4. RESULTS AND CONCLUSIONS
most common keywords, and update this set.
until cannot merge anymore; The results are reported in tables 1 and 2. In general,
3. End the proposed method proves as very competitive whereas
Algorithm 2: Common sense merging
there is still room for improvement. With challenge 1, the
watershed-based works well with the results being up to
3. CHALLENGE 2 98%. For challenge 2, the classification event vs. non-event
is acceptable in almost every run, as well as the detection of
To tackle the task event classfication in challenge 2, we use
some classes. In the last run, with the composite kernel to
supervised machine learning. We experiment with Support
combine between text and visual features, we have 5 classes
Vector Machines, and design a composite kernel to jointly
out of 9 above 55%.
learn between text and visual features.
Obviously, we have followed the supervised machine learning
3.1 Text features for challenge 2, so it could not be learnt efficiently with only
1
The data is processed using GATE platform for tokeniza- 36 positive instances of the class “fashion”, it may be better
tion, POS tagging and basic word features. We used Sup- if we used rule-based instead. Moreover, it is not trivial to
port Vector Machines to train and test our binary classifier. provide a good detection on the class “other events”, which is
Here, event classification is formulated as a multiclass clas- a rather undefined class. The combination did the best with
sification problem. The One Vs. Rest strategy is employed class “theater dance”. Meanwhile, we also observe that the
by selecting the instance with largest margin as the final class “exhibition” has 272 positive instances but could not
answer. For experimentation, we use 5-fold cross-validation be learnt with any kind of features and should be studied in
with the svm-light tool2 . The feature set for our learning more detail.
framework is described as follow.
Run F1 NMI Div F1
1. wi is text of the title, description, or the tag in each event 1 92.34 98.29 87.05
2. li is the word wi in lower-case 2 93.16 98.48 87.88
3 93.20 98.49 87.93
3. p1i , p2i , p3i , p4i are the four prefixes of wi
4. s1i , s2i , s3i , s4i are the four suffixes of wi Table 1: Results of Challenge 1
5. fi is the part-of-speech of wi
Run F1 Div F1 Event/No F1 Event/No Div F1
6. gi is the orthographic feature that test whether a word contains 1 44.84 33.96 71.30 21.96
all upper-cased, initial letter upper-cased, all lower-cased. 2 44.95 34.08 71.32 21.96
3 36.31 25.31 88.54 39.08
7. ki is the word form feature that test whether a token is a word, 4 24.30 13.53 87.10 37.61
a number, a symbol, a punctuation mark. 5 42.20 31.45 72.18 22.42
8. oi is the ontological features. We used the ontology and knowl-
edge base developed in [3], which contains 355 classes, 99 prop- Table 2: Results of Challenge 2
erties, and more than 100,000 entities. Given a full ontology,
wi is be matched to the deepest subsumed child class.

Run 1 was done without external resources, i.e., ontological 5. REFERENCES
features whereas all the features were used in run 2. [1] M.-S. Dao, G. Boato, F. G. De Natale, and T.-V. T.
3.2 Visual features Nguyen. Jointly exploiting visual and non-visual
For run 3, the image feature extraction was performed in information for event-related social media retrieval. In
a similar manner as in [2], and the SVMs, with the same Proceedings of the 3rd ACM conference on
settings as in [2], were trained with the data available in International conference on multimedia retrieval
the SED training set. Since the training set was unbalanced (ICMR), pages 159–166. ACM, 2013.
in the number of samples for each class, mainly towards [2] R. Mattivi, J. Uijlings, F. G. De Natale, and N. Sebe.
a higher number of samples from the ’non-event’ type, we Exploitation of time constraints for (sub-)event
balanced the training set samples used to train our SVM by recognition. In Proceedings of the 2011 joint ACM
reducing the number of samples from the ’non-event’ class. workshop on Modeling and representing events, pages
7–12, New York, NY, USA, 2011. ACM.
Run 4 used the same approach, but the classification fol-
lowed a two-step classification procedure. Firstly, a classi- [3] B. Popov, A. Kiryakov, D. Ognyanoff, D. Manov, and
fier was learnt with only ’event’ and ’non-event’ classes, and A. Kirilov. KIM a semantic platform for information
secondly another classifier was trained with the remaining extraction and retrieval. Natural Language Engineering,
eight classes belonging to the different type of events. Run 3 10(3-4):375–392, Sept. 2004.
and run 4 did not use time information metadata associated [4] T. Reuter, S. Papadopoulos, V. Mezaris, P. Cimiano,
with images. C. de Vries, and S. Geva. Social event detection at
mediaeval 2013: Challenges, datasets, and evaluation.
1
http://gate.ac.uk/ In Proceedings of MediaEval 2013, Barcelona, Spain,
2
http://svmlight.joachims.org/ October 2013.