<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantics-driven Event Clustering in Twitter Feeds</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cedric De Boom</string-name>
          <email>cedric.deboom@intec.ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Van Canneyt</string-name>
          <email>steven.vancanneyt@intec.ugent.be</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bart Dhoedt</string-name>
          <email>bart.dhoedt@intec.ugent.be</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ghent University - iMinds</institution>
          ,
          <addr-line>Gaston Crommenlaan 8-201, 9050 Ghent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ghent University - iMinds</institution>
          ,
          <addr-line>Gaston Crommenlaan 8-201, 9050 Ghent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ghent University - iMinds</institution>
          ,
          <addr-line>Gaston Crommenlaan 8-201, 9050 Ghent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>1395</volume>
      <fpage>2</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use di↵erent information sources-either textual, temporal, geographic or community features-have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Semantic information</kwd>
        <kwd>event detection</kwd>
        <kwd>clustering</kwd>
        <kwd>social media</kwd>
        <kwd>Twitter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>
        H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval
In the past researchers mostly used textual features as their
main source of information to perform event detection tasks
in social media posts. Next to the text itself, other
characteristic features such as the timestamp of the post, user
behavioural patterns and geolocation have been successfully
taken into account [
        <xref ref-type="bibr" rid="ref1 ref15 ref17 ref18 ref22 ref4">1, 4, 15, 17, 18, 22</xref>
        ]. Less used are
socalled semantic features, in which higher-level categories or
semantic topics are captured for every tweet and used as
input for the clustering algorithm. These semantic topics
can either be very specific—such as sports, politics,
disasters. . . —or can be latent abstract categories not known
beforehand; such an abstract topic is usually a collection of
semantically related words. In most applications semantics
are determined on event level after the actual event detection
process [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We however propose to use semantic
information on tweet level to drive the event detection algorithm.
After all, events belonging to di↵erent semantic categories—
and thus also its associated tweets—are likely to be
discerned more easily than semantically related events. For
example then it is relatively easy to distinguish the tweets
of a sports game and a concurrent politics debate.
The use case we address in this paper consists of dividing a
collection of tweets into separate events. In this collection
every tweet belongs to a certain event and it is our task to
cluster all tweets in such a way that the underlying event
structure is reflected through these clusters of tweets. For
this purpose we adopt a single pass clustering mechanism.
As a baseline we use a clustering approach which closely
resembles the algorithm proposed by Becker et al. to
cluster Flickr photo collections into events [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], and in which
we only use plain textual features. We then augment this
baseline algorithm, now incorporating semantic information
about the tweets as a second feature next to the text of
the tweet. As it turns out, solely using a semantic topic
per tweet only marginally improves baseline performance;
the attribution of semantic labels on tweet level seems to
be too fine-grained to be of any predictive value. We
therefore employ an online dynamic algorithm to assign semantic
topics on hashtag level instead of tweet level, which results
in a courser attribution of topic labels. As will be shown in
this paper, the latter approach turns out to be significantly
better than baseline performance.
      </p>
      <p>The remainder of this paper is structured as follows. In
Section 2 we shortly discuss the most appropriate related work
in recent literature, after which we describe the
methodology to extract events from a collection of Twitter posts in
Section 3. The collection of data and the construction of a
ground truth is treated in Section 4. Finally we analyse the
results of the developed algorithms in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        Since the emergence of large-scale social networks such as
Twitter and their growing user base, the detection of events
using social information has attracted the attention of the
scientific community. In a first category of techniques,
Twitter posts are clustered using similarity measures. These can
be either based on textual, temporal, geographical or other
features. Becker et al. were among the first to implement
this idea by clustering a Flickr photo collection [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. They
developed a single pass unsupervised clustering mechanism
in which every cluster represented a single event. Their
approach however scaled exponentially in the number of
detected events, leading to Reuter et al. improving their
algorithm by using a prior candidate retrieval step [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], thereby
reducing the execution time to linear scaling. Petrovi´c et
al. used a di↵erent technique based on Locality Sensitive
Hashing, which can also be seen as a clustering mechanism
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In this work, tweets are clustered into buckets by means
of a hashing function. Related tweets are more probable to
fall into the same bucket, which allows for a rapid
comparison between tweets to drive the event detection process.
The techniques in a second category of event detection
algorithms mainly use temporal and volumetric information
about the tweets being sent. Yin et al. for example use a
peak detection strategy in the volume of tweets to detect
fire outbreaks [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and Nichols et al. detect volume spikes
to identify events in sporting games [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. By analysing
communication patterns between Twitter users, such as peaks in
original tweets, retweets and replies, Chierichetti et al. were
able to extract the major events from a World Cup
football game or the Academy Awards ceremony [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Sakaki
et al. regarded tweets as individual sensor points to detect
earthquakes in Japan [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. They used a temporal model to
detect spikes in tweet volume to identify individual events,
after which a spatial tracking model, such as a Kalman
filter or a particle filter, was applied to follow the earthquake
events as they advanced through the country. Bursts of
words in time or in geographic location can also be
calculated by using signal processing techniques, e.g. a wavelet
transformation. Such a technique was successfully used by
Weng et al. in their EDCoW algorithm to detect Twitter
events [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], and by Chen and Roy to detect events in Flickr
photo collections on a geographic scale [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Semantic information is often extracted after the events are
detected to classify them into high level categories [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This
can be done in either a supervised way, using a classifier like
Naive Bayes or a Support Vector Machine, but most of the
times unsupervised methods are preferred, since they do not
require labelled data to train models and are able to discover
semantic categories without having to specify these
categories beforehand. Popular unsupervised techniques are
Latent Dirichlet Allocation (LDA), clustering, Principal
Component Analysis (PCA) or a neural auto-encoder. LDA was
introduced by Blei et al. in 2003 as a generative model to
extract latent topics from a large collection of documents
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Since then many variants of LDA have emerged tailored
to specific contexts. Zhao et al. created the TwitterLDA
algorithm to extract topics from microposts, such as tweets,
assuming a tweet can only have one topic. Using
community information next to purely textual information, Liu et
al. developed their own version of LDA as well, called
TopicLinkLDA [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. A temporal version of LDA, called TM-LDA,
was developed by Wang et al. to be able to extract topics
from text streams, such as a Twitter feed [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. By batch
grouping tweets in hashtag pools, Mehrotra et al. were able
to improve standard LDA topic assignments to individual
tweets [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. EVENT CLUSTERING</title>
      <p>In this section we will describe the mechanics to discover
events in a collection of tweets. In the dataset we use, every
tweet t is assigned a set of event labels Et. This set contains
more than one event label if the tweet belongs to multiple
events. The dataset itself consists of a training set Ttrain
and a test set Ttest. The details on the construction of the
dataset are found in Section 4. We will now try to recover
the events in the test set by adopting a clustering approach.
First the mechanisms of an existing baseline algorithm will
be expounded. Next we will extend this algorithm using
semantic information calculated from the tweets.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1 Baseline: Single Pass Clustering</title>
      <p>
        Our baseline algorithm will use single pass clustering to
extract events from the dataset. Becker et al. elaborated such
an algorithm to identify events in Flickr photo collections [
        <xref ref-type="bibr" rid="ref2 ref3">2,
3</xref>
        ]; their approach was criticized and improved by Reuter et
al. for the algorithm to function on larger datasets [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In
this paper we will adopt single-pass clustering as a baseline
that closely resembles the algorithm used by Becker et al.
As a preprocessing step, every tweet in the dataset is
represented by a plain tf-idf vector and sorted based on its
timestamp value. In the following we will use the same symbol
t for the tweet itself and for its tf-idf vector. As the
algorithm proceeds, it will create clusters of tweets, which are
the retrieved events. We denote the cluster to which tweet t
belongs as St; this cluster is also characterized by a cluster
center point st. We refer to a general cluster and
corresponding cluster center point as resp. S and s. The set A
contains all clusters which are currently active, i.e. being
considered in the clustering procedure. During execution of
the algorithm, a cluster is added to A if it is newly created.
After some time a cluster can become inactive by removing
this cluster from the set A. In Section 5 we will specify how
a cluster can become inactive.
      </p>
      <p>The baseline algorithm works as follows. When the current
tweet t is processed, the cosine similarity cos(t, s) between t
and cluster center s is calculated for all S in A. A candidate
cluster St0 (with cluster center s0t) to which t could be added,
and the corresponding cosine similarity cos(t, s0t), are then
calculated as
(1)
(2)
(3)
St0 = arg max cos(t, s),</p>
      <p>S2 A
cos(t, s0t) = max cos(t, s).</p>
      <p>
        S2 A
If St0 does not exist—this occurs when A is empty—we assign
t to a new empty cluster St, we set st = t and St is added
to A. If St0 does exist, we need to decide whether t belongs
to this candidate cluster or not. For this purpose we train
a logistic regression classifier from LIBLINEAR [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with a
binary output. It takes cos(s0t, t) as a single feature and
decides whether t belongs to St0. If it does, then we set St
to St0 and we update its cluster center st as follows:
st =
      </p>
      <p>P
t2 St t .
|St|
If t does not belong to St0 according to the classifier, then
as before we assign t to a new empty cluster St and we set
st = t.</p>
      <p>In the train routine we assign every tweet one by one to
a cluster corresponding to their event label. At every step
we calculate the candidate cluster St0 for every tweet t in
Ttrain and verify whether this cluster corresponds to one of
the event labels of t in the ground truth. If it does, we
have a positive train example, otherwise a negative example.
The number of positive and negative examples are balanced
by randomly removing examples from either the positive or
negative set, after which the examples are used to train the
classifier.</p>
      <p>In the original implementation by Becker et al. the
processing of a tweet is far from ecient since every event cluster
has to be tested. After a certain time period, the amount
of clusters becomes very large. The adjustments by Reuter
et al. chiefly aim at improving this eciency issue. We do
not consider these improvements here, since in Equation (1)
we only test currently active clusters, which is already a
performance gain.</p>
    </sec>
    <sec id="sec-5">
      <title>3.2 Semantics-driven Clustering</title>
      <p>
        To improve the baseline single pass clustering algorithm we
propose a clustering algorithm driven by the semantics of
the tweets. For example tweets that belong to the same
semantic topic—e.g. sports, disasters, . . . —are more likely to
belong to the same event than tweets about di↵erent topics.
Discerning two events can become easier as well if the two
events belong to di↵erent categories.
(4)
(5)
(7)
To calculate a semantic topic for each of the tweets in the
dataset, we make use of the TwitterLDA algorithm [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. It
is an adjustment of the original LDA (Latent Dirichlet
Allocation) algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for short documents such as tweets, in
which every tweet only gets assigned a single topic—instead
of a probabilistic distribution over all the topics—and
single user topic models are taken into account. After running
the TwitterLDA algorithm, every tweet t gets assigned a
semantic topic t.
      </p>
      <p>The actual clustering algorithm has the same structure as
the baseline algorithm, but it uses the semantic topic of the
tweets as an extra semantic feature during clustering. We
define the semantic fraction (t, S) between a tweet and an
event cluster as the fraction of tweets in S that have the
same semantic topic as t:
(t, S) = |{t0 : t0 2 S ^
|S|
t0 = t}| .</p>
      <p>To select a candidate cluster St0 (with cluster center s0t) to
which t can be added, we use the cosine similarity, as before,
as well as this semantic fraction:</p>
      <p>St0 = arg max cos(t, s) · (t, S).</p>
      <p>S2 A
We choose to multiply cosine similarity and semantic
fraction to select a candidate cluster since both have to be as
large as possible, and if one of the two factors provides
serious evidence against the candidate cluster, we want this to
be reflected. Now we use both cos(t, s0t) and (t, St0) features
to train a logistic regression classifier with a binary output.
The rest of the algorithm continues in the way the baseline
algorithm does.</p>
    </sec>
    <sec id="sec-6">
      <title>3.3 Hashtag-level Semantics</title>
      <p>
        As pointed out by Mehrotra et al. the quality of topic
models on Twitter data can be improved by assigning topics to
tweets on hashtag level instead of on tweet level [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. To
further improve the semantics-driven clustering, we
therefore use a semantic majority voting scheme on hashtag level,
which di↵ers from the approach by Mehrotra et al. in that
it can be used in an online fashion and that we consider
multiple semantic topics per tweet.
      </p>
      <p>In the training set we assign the same topic to all tweets
sharing the same event label by performing a majority vote:
8 t 2 Ttrain : t =
arg max t0 : t0 =
^ Et0 \ Et 6= ;
This way every tweet in the training set is represented by a
semantic topic that is dominated on the level of the events
instead of on tweet level, resulting in a much coarser
attribution of semantic labels. We cannot do this for the test set,
since we do not know the event labels for the test set while
executing the algorithm. We can however try to emulate
such a majority voting at runtime. For this purpose, every
tweet t is associated with a set of semantic topics t. We
initialize this set as follows:</p>
      <p>8 t 2 Ttest : t = { t}.</p>
      <p>
        Next to a set of topics for every tweet, we consider a
dedicated hashtag pool Hh for every hashtag h, by analogy with
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. With every pool H we associate a single semantic topic
H . As the algorithm proceeds, more and more hashtag
pools will be created and filled with tweets.
      </p>
      <p>When a tweet t is processed in the clustering algorithm, it
will first be added to some hashtag pools, depending on the
number of hashtags in t. So for every hashtag h in t, t is
added to Hh. When a tweet t is added to a hashtag pool H,
a majority vote inside this pool is performed:
new,H = arg max t0 : t0 2 H ^
t0 =
.</p>
      <sec id="sec-6-1">
        <title>We then update t for every tweet t in H:</title>
        <p>8 t 2 H : new,t = ( old,t \ { H }) [ {
new,H }.</p>
        <p>Finally new,H becomes the new semantic topic of H. Note
that every tweet t keeps its original semantic topic t.
What still needs adjustment in order for the clustering
algorithm to use this new information, is the definition of the
semantic fraction from Equation (4). We altered the
definition as follows:
(8)
(9)
0(t, S) = max |{t0 : t0 2 S ^ g 2
g2 t |S|
t0 }| .</p>
        <p>(10)
Since Equation (10) implies Equation (4) if t contains only
one element for every tweet t, this is a justifiable
generalization.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>4. DATA COLLECTION AND PROCESSING</title>
      <p>In the past many datasets have been assembled to
perform event clustering on social media. Unfortunately many
of these datasets are not publicly available; this is
especially true for Twitter datasets. We therefore choose to
build our own dataset, available at http://users.ugent.
be/~cdboom/events/dataset.txt. To speed up this task
we follow a semi-manual approach, in which we first collect
candidate events based on a hashtag clustering procedure,
after which we manually verify which of these correspond to
real-world events.</p>
    </sec>
    <sec id="sec-8">
      <title>4.1 Event Definition</title>
      <p>
        To identify events in a dataset consisting of thousands of
tweets, we state the following event definition, which
consists of three assumptions. Assumption 1 – a real-world
event is characterized by one or multiple hashtags. For
example, tweets on the past FIFA world cup football matches
were often accompanied by hashtags such as
#USAvsBelgium and #WorldCup. Assumption 2 – the timespan of
an event cannot transgress the boundaries of a day. This
means that if a certain real-world event takes place at several
days—such as a music festival—this real-world event will be
represented by multiple event labels. The assumption will
allow us to discern events that share the same hashtag, but
occur on a di↵erent day of the week, and will speed up the
eventual event detection process. The hashtag #GoT for
example will spike in volume whenever a new episode of Game
of Thrones is aired, which are thus di↵erent events according
to our definition. Assumption 3 – there is only one event
that corresponds to a certain hashtag on a given day.
Assumption 3 is not restrictive and can easily be relaxed.
For example if we would relax this Assumption and allow
multiple events with the same hashtags to happen on the
same day, we would need a feature in the event detection
process to incorporate time di↵erences, which is easily done.
Alternatively we could represent our tweets using df-idft
vectors, instead of tf-idf vectors, which also consider time
aspects of the tweets [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-9">
      <title>4.2 Collecting Data</title>
      <p>We assembled a dataset by querying the Twitter Streaming
API for two weeks, between September 29 and October 13
of the year 2014. We used a geolocation query and required
that the tweets originated from within the Flanders region
in Belgium, at least by approximation. Since only very few
tweets are geotagged, our dataset was far from a
representative sample of the tweets sent during this fortnight.
We therefore augment our dataset to make it more
representative for an event detection task. If a real-world event is
represented by one or more hashtags (Assumption 1), then
we assume that at least one tweet with these hashtags is
geotagged and that these hashtags are therefore already present
in the original dataset. We thus consider every hashtag in
the original dataset and use them one by one to query the
Twitter REST API.</p>
      <p>A query to the REST API returns an ordered batch of tweets
(ti)im=1, where m is at most 100. By adjusting the query
parameters—e.g. the maximum ID of the tweets—one can
use multiple requests to gather tweets up to one week in the
past. To make sure we only gather tweets from within
Flanders, the tokens in the user location text field of every tweet
in the current batch are compared to a list of regions, cities,
towns and villages in Flanders, assembled using Wikipedia
and manually adjusted for multilingual support. If the user
location field is empty, the tweet is not considered further.
We define a batch (ti)im=1 to be valid if and only if
|{ti : ti in Flanders}|
timestamp(tm) timestamp(t1)
&gt; ⌧ 1,
(11)
where ⌧ 1 is a predefined threshold. If there are ⌧ 2
subsequent invalid batches, all batches for the current considered
hashtag are discarded. If there are ⌧ 3 batches in total for
which less than ⌧ 4 tweets were sent in Flanders, all batches
for the current considered hashtag are discarded as well. If
none of these rules apply, all batches for the current hashtag
are added to the dataset. When the timestamp(·) function
is expressed in minutes, we set ⌧ 1 = 1, ⌧ 2 = 12, ⌧ 3 = 25 and
⌧ 4 = 10, as this yielded a good trade-o↵ between execution
time and quality of the data.</p>
    </sec>
    <sec id="sec-10">
      <title>4.3 Collecting Events</title>
      <p>Using the assembled data and the event definition of Section
4.1 we can assemble a ground truth for event detection in
three steps. Since events are represented by one or more
hashtags according to Assumption 1, we first cluster the
hashtags in the tweets using a co-occurrence measure. Next
we determine whether such a cluster represents an event, and
finally we label the tweets corresponding with this cluster
with an appropriate event label.</p>
      <p>To assemble frequently co-occurring hashtags into clusters,
a so-called co-occurrence matrix is constructed. It is a
threedimensional matrix Q that holds information on how many
times two hashtags co-occur in a tweet. Since events can
only take place on one day (Assumption 2), we calculate
co-occurrence on a daily basis. If hashtag k and hashtag `
co-occur ak,`,d times on day d, then
8 k, `, d : Qk,`,d =</p>
      <p>ak,`,d
Pi ak,i,d</p>
      <p>A lot of these clusters however do not represent a real-world
event. Hashtags such as #love or #followme do not exhibit
event-specific characteristics, such as an isolated,
statistically significant peak in tweet volume per minute, but can
rather be seen as near-constant noise in the Twitter feed. In
order to identify the hashtags that do represent events and
to filter out the noise, we follow a peak detection strategy.
For this purpose we treat each cluster of hashtags separately,
and we refer to the hashtags in these clusters as ‘event
hashtags’. With each cluster C we associate all the tweets that
were sent on the same day and that contain one or more
of the event hashtags in this cluster. We gather them in a
set TC . After sorting the tweets in TC according to their
timestamp, we calculate how many tweets are sent in
every timeslot of five minutes, which makes up for a sequence
(vC,i)in=1 of tweet volumes, with n the number of time slots.
We define that some vC,i⇤ is an isolated peak in the sequence
(vC,i) if and only if
vC,i⇤
✓ 1 ^ 8 i 6= i⇤ : vC,i⇤
vC,i + ✓ 2,
(14)
with ✓ 1 and ✓ 2 predefined thresholds. Only if one such
isolated peak exists (Assumption 3), we label all tweets t in TC
with the same unique event label et and add them to the
ground truth. Since we used the event hashtags from C to
construct this event, we have to remove all event hashtags
in C from the tweets in TC , otherwise the tweets themselves
would already reflect the nature of the events in the ground
truth.</p>
      <p>With this procedure it is however likely that some tweets
will belong to multiple events, but only get one event label.
This is possible if a tweet contains multiple event hashtags
that belong to di↵erent event hashtag clusters. We therefore
alter the ground truth in which every tweet t corresponding
to an event is associated with a set of event labels Et instead
of only one label. Of course, for the majority of these tweets,
this set will only contain one event label.</p>
      <p>In our final implementation we set minh = 1, ✏ = 0.3,
✓ 1 = 10 and ✓ 2 = 5. These values were chosen empirically,
such that, with these parameters, clusters of co-occurring
hashtags are rarely bigger than three elements. After
manual inspection and filtering, the final dataset contains 322
70,i
C
v
60e
m
u
50l
o
v
40t
e
e
30Tw
20
10
000
0</p>
      <sec id="sec-10-1">
        <title>Event 1</title>
        <p>#wearelosc</p>
        <p>Event 2
#wearelosc</p>
        <p>#ollosc
200
400
600</p>
        <p>Figure 1 shows a plot of the tweet volume in function of time
slot for two events in the dataset. The plot only covers the
first week in the dataset. The events are two football games
of the French team LOSC Lille—which is a city very near
Flanders, and therefore shows up in our dataset. The first
event is characterised by the single hashtag #wearelosc, and
the second event by two hashtags: #wearelosc and #ollosc.
Our algorithm detects the peaks in tweet volume during the
games, and since only one significant peak exists per day, we
assign the same event label to all tweets with the associated
hashtags sent during that day.</p>
        <p>The final dataset is made available at the earlier mentioned
URL. We provide for every tweet its tweet ID, timestamp,
corresponding event labels and event hashtags, and whether
it belongs to either the training or test set. Due to Twitter’s
restrictions, we cannot directly provide the text of all tweets.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>5. RESULTS</title>
    </sec>
    <sec id="sec-12">
      <title>5.1 Performance Measures</title>
      <p>
        To assess the performance of the clustering algorithms, we
report our results in terms of precision P , recall R and F1
measure, as defined in [
        <xref ref-type="bibr" rid="ref15 ref3">3, 15</xref>
        ], and restated here:
P =
R =
1
1
in which T stands for the total dataset of tweets. When
tweets can have multiple event labels, these definitions
however do not apply any more. We therefore alter them as
      </p>
    </sec>
    <sec id="sec-13">
      <title>5.2 Results</title>
      <p>We now discuss the results of the algorithms explained in
Section 3 with the use of the dataset constructed in
Section 4. In the algorithms we make use of a set A of active
event clusters, which become inactive after some time
period. We could for example use an exponential decay
function to model the time after which a cluster becomes inactive
since the last tweet was added. Using Assumption 2
however we can use a much simpler method: when a new day
begins, all event clusters are removed from A and thus
become inactive. This way we start with an empty set A of
active clusters every midnight.</p>
      <p>
        For the semantics-driven clustering algorithm we assign the
tweets to 10 TwitterLDA topics using the standard
parameters proposed in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and 500 iterations of Gibbs sampling.
Table 1 shows the results of the baseline algorithm, the
semantics-driven algorithm and the hashtag-level semantics
approach, both for one event label and multiple event labels
per tweet. Note that, since we have removed the event
hashtags from the tweets in the ground truth, the hashtag-level
semantics approach does not use any implicit or explicit
information about the nature of the events.
      </p>
      <p>We note that the hashtag-level semantics approach
outperforms the baseline clustering algorithm, with an increase of
1.6 percentage points in F1-measure for single event labels.
In terms of precision and recall, hashtag-level semantics
performs better in both metrics than baseline in the single label
case (significant improvement, p &lt; 0.001 in t-test). When
using multiple event labels per tweet, precision is decreased
by 0.9 percentage points, but raises recall with 1.4
percentage points, leading to an increase of F1-measure by 0.9
percentage points.</p>
      <p>Compared to the standard semantics-driven algorithm we
do 6 percentage points better in recall, but 4 percentage
point worse in precision for single event labels.
Hashtaglevel semantic clustering seems to manage to account for
the substantial loss in recall that occurs when using the
basic semantics-driven method, but lacks in precision; the
precision is however still 1.5 percentage points better than the
baseline algorithm. The plain semantics-driven approach is
1.7 percentage points worse than baseline in terms of
F1measure, but provides much more precision by sacrificing
in recall. For multiple event labels the di↵erences are even
more pronounced between the standard semantics approach
and the other algorithms. The former performs 3.3
percentage points worse in F1-measure compared to baseline, and
4.2 percentage points worse compared to hashtag semantics.
Using multiple event labels, the plain semantics-driven
algorithm however has a much higher precision than baseline
and hashtag semantics.</p>
      <p>
        To assess the significance of the di↵erences in F1 measure
between our three systems, we used a Bayesian technique
suggested by Goutte et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. First we estimated the true
positive, false positive and false negative numbers for the three
systems. Next we sampled 10,000 gamma variates from the
proposed distribution for F1 for these systems and calculated
the probability of one system being better than another
system. We repeated this process 10,000 times. Hashtag
semantics resulted in a higher F1 measure in 99.99% of the
cases; our results are thus a significant improvement over
baseline. By contrast, the plain semantics-driven approach
is significantly worse than baseline, also in 99.99% of the
cases. Concerning multiple event labels, the hashtag
semantics approach is better in 98.5% of the cases than baseline,
which is also a significant improvement—although less than
in the single event label case.
      </p>
      <p>We also compare our three approaches in terms of cluster
purity and the number of detected event clusters. These
numbers are shown in Table 2. We see that the purity of
the clusters in the plain semantics-driven approach is higher
than baseline and hashtag semantics, but the number of
detected event clusters is even substantially larger. This
explains the high precision and low recall of the
semanticsdriven algorithm. The purity of baseline and hashtag
semantics is almost equal, but the latter approach discerns more
events than baseline, thereby explaining the slight increase
in precision and recall for the hashtag semantics approach
compared to baseline. Concerning multiple event labels, the
purity increases significantly compared to single event labels.
Since the number of detected events remains the same, this
explains the substantial increase in precision for the
multilabel procedure.</p>
    </sec>
    <sec id="sec-14">
      <title>5.3 An Illustrative Example</title>
      <p>As a matter of example, consider the tweet “we are ready
#belgianreddevils via @sporza”. This tweet was sent on the
occasion of a football game between Belgium and Andorra—
the Belgian players are called Red Devils and the airing
television channel was Sporza. Since most tweets on this
football game were sent in Dutch or French, the baseline
clustering approach is not able to put this tweet in the correct
cluster, but rather in a cluster in which most tweets are in
English. This tweet is however related to a sports-specific
topic, so that in both the semantics approaches the tweet
is assigned to a correct cluster. It is clear that the
hashtag #belgianreddevils has something to do with sports—
and in particular a football game of the Belgian national
team—but there exist tweets that contain this hashtag and
that have not been categorized into the sports category by
the TwitterLDA algorithm. For example the tweet “met 11
man staan verdedigen, geweldig! #belgiumreddevils” (which
translates to “defending with 11 men, fantastic!”) belongs
to a more general category. This shows that calculating
semantic topics on tweet level results in a fine-grained, but also
more noisy assignment of these topics, which is reflected in
the number of detected events shown in Table 2. By
assigning the semantic topics on hashtag level however, all tweets
with the hashtag #belgianreddevils will eventually belong
to the sports category. It will result in a coarser, less
detailed assignment of the topics, resulting in a more accurate
event detection, and fewer detected events.</p>
    </sec>
    <sec id="sec-15">
      <title>6. CONCLUSION</title>
      <p>We developed two semantics-based extensions to the
singlepass baseline clustering algorithm as used by Becker et al. to
detect events in Twitter streams. In this we used semantic
information about the tweets to drive the event detection.
For this purpose we assigned a topic label to every tweet
using the TwitterLDA algorithm. To evaluate the performance
of the algorithms we semi-automatically developed a ground
truth using a hashtag clustering and peak detection
strategy, to aid the manual labelling of tweets with events. When
using the topic labels at the level of individual tweets, the
algorithm performs significantly worse than baseline. When
however gathering the semantic labels of the tweets on a
coarser, hashtag level we get a significant gain over
baseline. We can conclude that high-level semantic information
can indeed improve new and existing event detection and
clustering algorithms.</p>
    </sec>
    <sec id="sec-16">
      <title>7. ACKNOWLEDGMENTS</title>
      <p>Cedric De Boom is funded by a Ph.D. grant of Ghent
University, Special Research Fund (BOF).</p>
      <p>Steven Van Canneyt is funded by a Ph.D. grant of the
Agency for Innovation by Science and Technology in
Flanders (IWT).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Aiello</surname>
          </string-name>
          , G. Petkos,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Corney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Skraba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goker</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Kompatsiaris, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaimes</surname>
          </string-name>
          . Sensing Trending Topics in Twitter. Multimedia, IEEE Transactions on,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Gravano</surname>
          </string-name>
          .
          <article-title>Event Identification in Social Media</article-title>
          .
          <source>In WebDB 2009: Twelfth International Workshop on the Web and Databases</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Gravano</surname>
          </string-name>
          .
          <article-title>Learning similarity metrics for event identification in social media</article-title>
          .
          <source>In WSDM '10: Third ACM international conference on Web search and data mining</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Gravano</surname>
          </string-name>
          . Beyond Trending Topics:
          <article-title>Real-World Event Identification on Twitter</article-title>
          .
          <source>In ICWSM 2011: International AAAI Conference on Weblogs and Social Media</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Machine Learning</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Roy</surname>
          </string-name>
          .
          <article-title>Event detection from flickr data through wavelet-based spatial analysis</article-title>
          .
          <source>In CIKM '09: Proceeding of the 18th ACM conference on Information and knowledge management</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chierichetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mahdian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Pandey</surname>
          </string-name>
          .
          <article-title>Event Detection via Communication Pattern Analysis</article-title>
          .
          <source>In ICWSM '14: International Conference on Weblogs and Social Media</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.-E.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Hsieh</surname>
            ,
            <given-names>X.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>C.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>LIBLINEAR: A Library for Large Linear Classification</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Goutte</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Gaussier</surname>
          </string-name>
          .
          <article-title>A probabilistic interpretation of precision, recall and F-score, with implication for evaluation</article-title>
          .
          <source>In ECIR'05: Proceedings of the 27th European conference on Advances in Information Retrieval Research</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Niculescu-Mizil</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Gryc</surname>
          </string-name>
          .
          <article-title>Topic-link LDA: joint models of topic and author community</article-title>
          .
          <source>In ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Schu</surname>
          </string-name>
          <article-title>¨tze. An Introduction to Information Retrieval</article-title>
          . Cambridge University Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Buntine</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Xie</surname>
          </string-name>
          .
          <article-title>Improving lda topic models for microblogs via tweet pooling and automatic labeling</article-title>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nichols</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mahmud</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Drews</surname>
          </string-name>
          .
          <article-title>Summarizing sporting events using twitter</article-title>
          .
          <source>In IUI '12: Proceedings of the 2012 ACM international conference on Intelligent User Interfaces</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] S. Petrovi´c, M. Osborne, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          .
          <article-title>Streaming first story detection with application to Twitter</article-title>
          . In HLT '
          <volume>10</volume>
          :
          <string-name>
            <surname>Human Language Technologies</surname>
          </string-name>
          :
          <article-title>The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Reuter</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>Event-based classification of social media streams</article-title>
          .
          <source>In ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mausam</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Etzioni</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
          </string-name>
          .
          <article-title>Open domain event extraction from twitter</article-title>
          .
          <source>In KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sakaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Okazaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          .
          <article-title>Earthquake shakes Twitter users: real-time event detection by social sensors</article-title>
          .
          <source>In WWW '10: Proceedings of the 19th international conference on World wide web</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Stilo</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Velardi. Time Makes</surname>
          </string-name>
          <article-title>Sense: Event Discovery in Twitter Using Temporal Similarity</article-title>
          .
          <source>In Web Intelligence (WI) and Intelligent Agent Technologies (IAT)</source>
          ,
          <year>2014</year>
          IEEE/WIC/ACM International Joint Conferences on,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Van Canneyt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schockaert</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Dhoedt</surname>
          </string-name>
          .
          <article-title>Estimating the Semantic Type of Events Using Location Features from Flickr</article-title>
          .
          <source>In SIGSPATIAL '14</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Agichtein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Benzi</surname>
          </string-name>
          .
          <article-title>TM-LDA: ecient online modeling of latent topic transitions in social media</article-title>
          .
          <source>In KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          , E. Leonardi, and
          <string-name>
            <given-names>B.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Event Detection in Twitter</article-title>
          .
          <source>In ICWSM '11: International Conference on Weblogs and Social Media</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lampert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cameron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Robinson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Power</surname>
          </string-name>
          .
          <article-title>Using social media to enhance emergency situation awareness</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Achananuparp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.-P.</given-names>
            <surname>Lim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Topical keyphrase extraction from Twitter</article-title>
          .
          <source>In HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>