<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Finding Short Lived Events on Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Kilroy</string-name>
          <email>david.kilroy1@ucdconnect.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Caton</string-name>
          <email>simon.caton@ucd.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Graham Healy</string-name>
          <email>graham.healy@dcu.ie</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, University College Dublin</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computing, Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Companies have been looking to Social Media for event and trend detection for a number of years to assist in business decision making. Contemporary approaches typically consider speci c time windows within which to search for evidence of emerging topics, events or trends. Yet, in doing so, short(er) term events can be crowded out. In this paper, we show how subdividing time windows and recombining their results can signi cantly improve detection. To do this, we use three historical Twitter datasets that are prevalent in the literature. We evaluate 6 di erent methods common in the literature, and in most cases cases, observe a signi cant improvement through our approach. However, we also note that picking the correct subdivision of time is key to this improvement.</p>
      </abstract>
      <kwd-group>
        <kwd>Twitter</kwd>
        <kwd>Event Detection</kwd>
        <kwd>Temporal Windowing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Social media services such as Twitter, Reddit, Sina Weibo and Facebook have
gained a lot of traction in recent years allowing users to spread information at the
click of a nger. Users of these services can therefore be seen as social sensors [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
reporting real-world experiences and opinions about hot topics in real time. In
order to make sense of this massive amount of data, event detection techniques
can be used in order to identify the latent events in the continuous stream of
posts. Similarly to de ning a topic in topic modelling, an event can be de ned
as a \number of keywords" [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] adhering to the same subject.
      </p>
      <p>In the literature, it is common to run an event detection algorithm over a
time window of data and then extract events [3{5] e.g. extract 10 events every
hour. A problem with this approach is that if the time window of interest is too
large or has too high of an arrival rate of data, many event detection algorithms
cannot capture the \short lived" events if they are not prevalent enough. This is
due to the fact that many event detection algorithms do not have any intrinsic
notion of time in them, thus are not able to identify a group of posts occurring
in close proximity. Fig. 1 details this problem. If an algorithm is run over a single
time window (as in Fig. 1) it will not be able to di erentiate the burst of posts
(in orange) compared to the steady stream of posts (in blue).</p>
      <p>In order to deal with this problem we propose to subdivide the data into
shorter windows (in grey) from the original time window, thus allowing an event
detection algorithm to distinguish \short lived" events in shorter time windows.
At the end of the nal subdivided time window (when a subset of events have
been collected) we propose an ending event clustering and ranking mechanism.
This is used to group similar events between the short time windows, so that
detected events are not repeated in the nal ranking by an importance score.</p>
      <p>
        In our evaluation we benchmark various algorithms used in topic modelling
to more classical families of topic detection algorithms with and without the
presence of short time windows and an ending event clustering and ranking
mechanism. We run these algorithms on three di erent datasets [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which
contain ground truth event labels that allows the calculation of speci c key event
detection metrics and comparison with key approaches in the literature.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        We rst discuss the three categories of event detection loosely grouped into
(a) type of event (speci ed or unspeci ed ) (b) detection task (retrospective or
new ) and (c) detection algorithms [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. We then brie y discuss dynamic topic
modelling (DTM), with our approach being closely related to it. Finally we
discuss our approach and which categories it aligns to.
      </p>
      <p>
        Speci ed event detection di ers from unspeci ed event detection in that some
prior information is known about the event before performing detection. This
may be in the form of only searching for posts with speci c content/terms [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
or in a speci c geolocation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In contrast, Unspeci ed event detection uses no
prior information and instead nds events from unlabelled posts.
      </p>
      <p>
        Retrospective (or o ine) event detection (RED ) tries to nd events that have
already happened from historical collections of posts. New (or online) event
detection (NED ) tries to nd events from a continuous stream of posts. As
pointed out in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], RED provides better results than NED. This is due to the
fact that RED techniques can make better decisions than NED by having a view
of the corpus in its entirety. However, when dealing with a continuous stream of
documents in real-time only NED techniques can be performed.
      </p>
      <p>
        Generally speaking \the detection of unknown events" [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] can be solved in
three ways: the documents themselves are clustered (document-pivot) [
        <xref ref-type="bibr" rid="ref11 ref12 ref4">4, 11, 12</xref>
        ],
the terms from the documents are selected then clustered (feature-pivot) [
        <xref ref-type="bibr" rid="ref13 ref3 ref5">3,5,13</xref>
        ]
or events are seen as a probability distribution over documents/terms (topic
modelling) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In document-pivot approaches, the documents themselves are
represented as a bag-of-words which may be weighted by term frequency-inverse
document frequency. These documents are then grouped together using
clustering techniques with some similarity metric e.g. cosine. If an incoming document
is considered dissimilar to any existing cluster by a prede ned threshold, it is
considered a new event. Early approaches of document-pivot techniques were
proposed in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], where the objective was to nd the rst story (i.e. rst report
of an event) within an incoming stream of stories. However, this approach did
not scale to the large volume of data on twitter. In response to this problem [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
proposed an approach using locality sensitive hashing, allowing for a
considerable speed-up. In feature-pivot techniques there are two main stages in the event
detection process: (a) have some way to relate terms to each other, and (b) group
terms together to form events. Early approaches related terms based o their
signal cross correlation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], with other approaches using co-occurrence [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or
pairwise similarity from word embeddings [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ]. Grouping terms is usually
followed by clustering, with works such as [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and BnGrams in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] using
hierarchical clustering. Other approaches group terms by using modularity-based
graph partitioning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. It is also common for feature-pivot techniques to
identify bursty terms [
        <xref ref-type="bibr" rid="ref13 ref3 ref4">3, 4, 13</xref>
        ] (i.e. terms occurring at an unusually high rate) and
then only consider terms which meet this criteria. Topic models can be seen as
a dimensionality reduction technique by splitting a document-term matrix into
a document-topic matrix and a term-topic matrix. Some of the most well known
topic models include Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], Non-Negative
Matrix Factorization [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and Latent Semantic Analysis (LSA) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Much work on
topic modelling in social media focuses on variants of popular algorithms such
as LDA for short text mining [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], while other works focus purely on real-time
over online (NED) topic models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] with the subtle di erence that \in real-time
detection, time is crucial, so much so that no xed time window for detection
should be assumed" [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Our work is closely related to DTM in the sense that topics are obtained
by dividing the data into shorter time windows of equal length. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] proposed a
DTM based on two layers of NMF. The rst layer extracts the document-topic
matrix for each window topic model and stacks it onto a matrix B. The second
layer of NMF is then applied to B to obtain dynamic topics.
      </p>
      <p>Our approach uses all three families of detection algorithms mentioned, is
of unspeci ed detection type (although the datasets in our evaluation search
for posts with the presence of speci c lists of terms and hence is speci ed) and
only uses techniques that deal with an incoming stream of posts (NED). Many
algorithms miss out on many \short lived" events due to having no intrinsic
notion of time in them. To overcome this problem, our approach uses ideas from
DTM by subdividing the data into short(er) time windows and thus being able
to nd \short lived" events. Our approach di ers from DTM in the fact that
we recombine the events found at short(er) time windows in a way that focuses
on the e ectively ranking events rather than trying to combine events found at
short time windows to make dynamic topics.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Implementation</title>
      <p>
        The data pre-processing stage is split into three main stages: (i) general
preprocessing, (ii) term importance and (iii) aggressively ltering documents.
General Pre-processing: For each post, the raw text is tokenized and then each
token is reduced to its base form using lemmatization, so to avoid the poor
readability of events as can happen with stemming [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
        ]. When tokenizing the text
we only consider unigrams, which are then converted to lowercase. Terms with
certain part-of-speech (POS) tags such as determiners, conjunctions,
subordinating conjunctions, particles and punctuation are removed due to providing little
meaning into the documents. Twitter speci c #hashtags and @usermentions are
removed due to producing more \interesting" events [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. All words which are in
the Terrier list of stopwords as well as all non-English words are also removed.1
The python library spaCy was used to carry out the data pre-processing.
Term Importance: A term importance score is generated for each term following
on from [
        <xref ref-type="bibr" rid="ref13 ref2 ref26">2, 13, 26</xref>
        ] and many others as we only consider a subset of bursty terms.
The term importance score is calculated based on two criteria (a) how bursty
the term is in a time window and (b) term POS tag which acts as a scaling
factor. As used in [
        <xref ref-type="bibr" rid="ref14 ref25">14, 25</xref>
        ] to evaluate the signi cance of a term, we also make
use of z-scoring, however in our case we use it to de ne how bursty a term is in a
particular time window. We do this by comparing an individual term's frequency
to its frequency at previous time windows. We also normalize each term's window
frequency by the total term frequency for a window to arrive at the \relative
popularity" for each term [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], ensuring seasonal patterns don't a ect the term's
score. Similarly to how BnGram de nes how bursty a term is [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we also do
not compare a term's frequency in a particular window to its frequency in all
other windows but only to its frequency in the window's behind it. Failing to
comply with this would not make our approach compatible for NED tasks. To
increase the likelihood that more \event worthy" POS tags appear in events, we
also scale each z-score by an additional boost factor based on its POS tag, like
in [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
        ]. We assign this boost factor a value of 2.5 as in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and apply it to
the z-scores of all terms with nouns and proper noun POS tags to arrival at the
nal term importance score for each term. In order to comply with our original
      </p>
      <sec id="sec-3-1">
        <title>1 https://github.com/terrier-org/terrier-desktop/blob/master/share/stopword</title>
        <p>
          list.txt - last accessed 14/11/2020
objective of only considering a subset of bursty terms, we remove all terms from
the documents which don't surpass a term importance score of 0 i.e. a term must
be occurring more than usual for it to be considered. Similarly to [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], we also
set a minimum threshold frequency in order for a term to be considered in a
particular time window. In our tests a minimum frequency of 0.005 times the
number of total posts for nouns and pronouns and a minimum frequency of 0.015
times the number of total posts for all other POS tags worked well.
Aggressively Filtering Documents: [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] seeks to remove documents with a low
term count to improve performance. A minimum document length of 4 was used
in the experiments of [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and was also used in our experiments.
        </p>
        <sec id="sec-3-1-1">
          <title>3.2 Event Detection Algorithms</title>
          <p>With each of the following algorithms producing events, we can quantify how
important an event is (event importance) by nding its summed term importance
score divided by the total number of terms in the event. This event importance
score is used in the event clustering and ranking mechanism when the nal
ranking of events is calculated.</p>
          <p>
            In event detection, one important hyper-parameter is the number of events
to detect. There are many approaches for automatically nding this value [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ],
however in our evaluation the number of events for each full time window is
speci ed in advance [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. With the following algorithms all using topic models,
another hyper-parameter to be considered is the number of terms to extract for
each event. We choose this number to be 10, as in preliminary experimentation
this yielded the best performance; further optimisation is left for future work.
Non-Negative Matrix Factorisation (NMF): NMF [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] is used in topic modelling
to reduce the dimensions of a non-negative matrix X (document-term matrix)
by splitting it into a document-topic matrix W and a term-topic matrix H
whose product approximates the original non-negative matrix X. When applying
NMF, [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ] shows that the use of tf-idf vectors over a raw document-term matrix
can provide more coherent topics and thus was included in our implementation.
Latent Dirichlet Allocation (LDA): LDA is a \generative probabilistic model for
collections of discrete data such as text corpora" [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. Similarly to NMF, it can
be seen as a dimensionality reduction technique. When applying LDA, [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ] also
shows the use of a raw document-term matrix providing more coherent topics
than its tf-idf weighted counterpart and hence was used in our implementation.
Latent Semantic (LSA): LSA uses a singular value decomposition (SVD) to
approximate a matrix [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] (dimensionality reduction). In our implementation
the document-term matrix inputted into LSA is tf-idf weighted.
          </p>
          <p>NMF Applied to a Co-occurrence Matrix (COC): In this feature-pivot/topic
modelling approach we apply NMF to a co-occurrence matrix. A co-occurrence
matrix is a term-by-term matrix which signi es how many times two terms
cooccurred. In our experiments, co-occurrence is de ned as two terms co-occurring
in the same post rather than within some window length. It was de ned like this
due to the short character limit on Twitter (280), with words in the same post
likely being related to one another. So long as the number of documents is larger
than the number of terms in the corpus of interest, the rst step of this approach
can be seen as an additional dimensionality reduction step before running NMF
and therefore may provide the following bene ts: (a) more stable results as
approximating a small matrix leads to less varied results b) computational
speedup as approximating a smaller matrix takes less time.</p>
          <p>
            NMF Applied to a Term Embeddings Matrix (EMBED): Word embeddings try
to relate words based on their semantic similarity to one another. It would be
useful to know if word embeddings could capture the relationships between similar
events in order to provide accurate event detection as in [
            <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
            ]. We implement
this by constructing a term embeddings matrix by rst training a word2vec
model [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ] on the time window corpus. We then match each word to one another
using the cosine similarity between the vectors to arrive at the nal term
embeddings matrix. NMF is then applied to this matrix in order to extract events.
Document-pivot Approach Using NMF (DOCP): This topic
modelling/documentpivot approach uses the document-topic matrix (tf-idf weighted) returned by
NMF to construct events. It does this by retrieving the most activated
document for a particular given topic and then retrieves the terms in that document.
It does this until 10 terms are found.
3.3
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Event Clustering and Ranking Mechanism</title>
          <p>Given that overlapping and similar events may be detected in di erent short time
windows, a way to e ectively rank and combine these events is needed. A simple
ranking of events by the de ned event importance score would lead to similar
events with similar terms and thus similar event importance scores being grouped
together in the nal ranking of events (lacking diversity). In order to overcome
this problem we rst cluster similar events together that - along with some other
processing - eliminates them being grouped together (event clustering). This is
followed by a nal ranking of events (event ranking), with the aim of achieving
a diverse set of ranked events.</p>
          <p>Event Clustering: The event clustering stage is an incremental process which
groups together events emitted from each short time window at increasingly
higher distance thresholds (cosine) in order to group the most similar events
rst, followed by less similar events. In each iteration of the event clustering
stage, the events are rst represented as a document-term matrix. The events
(now considered documents) are then clustered using agglomerative clustering
using an incrementing distance threshold. The events which cluster below a
speci c distance threshold are grouped into event clusters. The events in each of
the grouped event clusters are also ranked between themselves by their event
importance score (de ned in Section 3.2). Each of the grouped event clusters
are also given an event cluster importance score which can be de ned as the
mean event importance scores in the event cluster. The events which aren't
clustered are passed onto the next stage of clustering with an incrementing distance
threshold. Events which never cluster are considered single event clusters. The
values used for the distance thresholds are 0.2 (most similar), 0.4 (less similar)
and 0.7 (just about similar). These values work well in our experiments however
other values which mimic a similar upwards progression could also be tried.
Event Ranking: The rst stage of the event ranking phase is to rank the grouped
event clusters between themselves. The event clusters are ranked by a)
whatever distance threshold they were found at (ascending), b) the number of events
in each event cluster (descending) and c) event cluster importance score
(descending) in that order. Thus, events which are highly similar to other events
are themselves important (ranking by distance threshold) and large numbers of
similar events represent an overall important event theme (ranking by number
of events). In order to obtain a nal ranking of events, the ranked event
clusters are iterated through one by one popping o the top event in each of them
into the nal ranking of events. This aims to achieve a diverse (obtaining events
at di erent event clusters) and popular (sorting event clusters by popularity)
ranking of events.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>In order to demonstrate the e ectiveness of our approach, we rst compare the
algorithms discussed in Section 3.2 with and without the presence of subdividing
the data into short time windows followed by an ending event clustering and
ranking mechanism. This is followed by an experiment showing the importance
of picking a good value for the number of short time windows variable.</p>
      <sec id="sec-4-1">
        <title>4.1 Datasets</title>
        <p>
          While other proposed approaches evaluating event detection algorithms make
use of human evaluators [
          <xref ref-type="bibr" rid="ref16 ref30">16, 30</xref>
          ], this approach leverages an existing automated
evaluation solution [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] that compares lists of submitted events to a subset of
ground truth labels for a prede ned full time window, allowing speci c event
detection metrics to be calculated. The datasets proposed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] each consist of a
collection of tweets from three major events in 2012; a) FA Cup Final, b) Super
Tuesday Primaries and c) US Elections. The fact that the three datasets
themselves have di erent characteristics, it makes it hard to achieve high accuracy
across all three as seen in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. These characteristics include a) length of full time
window to submit events (1 minute for FA Cup, 1 hour for Super Tuesdays and
10 minutes for US Elections), b) number of events to submit at each full time
window (20 for FA Cup and 100 for both Super Tuesdays and US Elections)
and c) dataset size/arrival rate of data at each full time window (148,652 for
FA Cup, 474,109 for Super Tuesdays and 1,247,483 for US Elections). These
three datasets are also used in the evaluation of many popular event detection
algorithms [3{5] (useful for benchmarking purposes).
        </p>
        <p>As there is a number of events to be submitted at each full time window for
these datasets, it is common practice to see the performance of algorithms when
using a reduced number of submitted events (assuming the submitted events are
ranked) [3{5] e.g. Super Tuesdays on 10, 20, 50 and 100 rather than 100 events.</p>
        <p>
          As described in the paper of [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and their evaluation script,2 in each ground
truth label there are three sets of terms: a) mandatory, b) optional and c)
prohibited. From these sets of terms it is possible to calculate \event recall", which
can be de ned as the number of ground truths \detected" by the algorithm
where in this case \detected" refers to all mandatory terms being matched by
the algorithms submitted terms (so long as no prohibited terms are matched). In
our evaluation we don't record term precision and term recall [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] instead
focusing on event recall. Also term precision and term recall can also be considered
misrepresentation metrics due to the fact that they're only recorded after an
event is considered \detected".
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Assessing the General Impact of Short Time Windows</title>
        <p>In order to assess the general impact of short time windows we de ne two
approaches: approach A which subdivides data into short(er) time windows
followed by the event clustering and ranking mechanism ; and approach B (i.e.
baseline approach) which runs an algorithm over a full time window with no
post-clustering or ranking of events. Both of these approaches use the same data
pre-processing as described in Section 3. With an important hyper-parameter
for approach A being the number of short time windows to use when extracting
events for a full time window, it includes a range of values for the number of short
time windows variable, speci cally 2, 3, 4, 5, 6, 10, 12 and 15. These values were
chosen as they result in equal time windows (all factors of 60) and the amount of
data at any short time window is not too small e.g with their being an average
of 492 posts in each full time window for the FA Cups dataset, dividing by more
than 15 would result in too little data at each short time window.</p>
        <p>We test the hypothesis H, that the mean ranks of pooled event recall
results generated by approach A are greater than the mean ranks of event recall
results generated by approach B. In order to do this, for approach A, 72 event
recall scores were calculated for each di erent value of the number of short time
windows variable (totalling 576 event recall scores). Similarly, 576 event recall
scores were calculated for approach B. For each event recall score which was
calculated a di erent random seed was used.3 A Mann-U-Whitney Rank Test
was run comparing approach A and approach B for each baseline algorithm at
a range of di erent submitted events across three di erent datasets. Speci cally,
the range of submitted events was 2, 5, 10 and 20 for FA Cup and 10, 20, 50 and
100 for Super Tuesdays and US Elections. The reason this test was used instead
of a t-test was because the results data was not normally distributed. Table 1
shows the p-value (rounded to 3 decimal places) of this test for the speci ed
number of events (in columns). If the test was in favour of approach B, a + was
post- xed to the result.</p>
        <sec id="sec-4-2-1">
          <title>2 http://socialsensor.iti.gr/use-cases/evaluation - last accessed 14/11/2020</title>
          <p>3 Multiple runs of the same algorithm were required due to the random element present
in each of the algorithms</p>
          <p>One possible reason for the embeddings model showing evidence of a
statistically signi cant di erence in favour of approach B for the FA Cups dataset
might be that the word2vec model wasn't able to capture accurate relationships
between words when run on a small subset of the data. The FA Cups dataset
has the lowest tweet count out of the three datasets, furthering this point. A
possible reason for the results being varied between models in the US Elections
dataset might be due to the fact that the ground truth events picked span the
full time period of interest, and not being considered \short lived" (something
which the short time windows in approach A try to capture). On the other hand,
our hypothesis was almost always supported by statistically signi cant results
for the FA Cups and Super Tuesdays datasets, perhaps due to the fact that they
contain many \short lived" events as per their ground truths.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Picking a \Reasonably" Good Value for the Number of Short</title>
      </sec>
      <sec id="sec-4-4">
        <title>Time Windows Variable and Comparisons with Other Methods</title>
        <p>Within the range of values for the number of short time windows used in testing
approach A (described in Section 4.2) it is important to pick a value for this
variable which yields \reasonably" high results (due to high variance of results
with di erent values of the number of short time windows variable). Table 2
shows the high variance of approach A's results compared to approach B 's results
(described in section 4.2) in a (maximum - minimum, median) tuple using the
same range of submitted events used in Table 1. A \ b" was post- xed to the
model name where approach B was used, otherwise approach A was used.</p>
        <p>Interestingly, if a value for the number of short time windows for approach
A is picked within a reduced consecutive range of values for each dataset, it
can yield better and less varied results (when compared to approach A). Table
3 shows the same information as Table 2 however only showing the results for
the number of short time windows within a reduced consecutive range for each
dataset. Note that these values were derived by looking at the best results for
each dataset. The range of values used for each dataset were: 2, 3 and 4 for the
FA Cup; 10, 12 and 15 for Super Tuesdays; and 3, 4 and 5 for the US Elections.</p>
        <p>It would be useful to automatically pick a value in a \reasonably" good range
for the number of short time windows variable (Table 3). One factor to consider
might be the granularity of the events required. The more or less short time
windows there are the more chance there is to capture \short" or \broad" lived
events retrospectively. Another factor to consider is the size of the full time
window, with more \short lived" events occurring in large full time windows
(e.g. a day). Similarly, if there is a high arrival rate of data at a particular full
time window, the more chance there is for \short lived" events.</p>
        <p>As seen in Table 3, it is clear to see the advantages short time windows. Using
short time windows seems to work especially well when the respective ground
truths contain \short lived" events, as in the Super Tuesdays dataset which has
a large full time windows (1 hour) allowing for more \short lived" events.</p>
        <p>
          The algorithms exhibit very competitive results in comparison to other
methods [3{5] in the presence of subdividing data into short time windows (when
picked in an optimal range of the number of short time windows variable)
followed by an ending event clustering and ranking mechanism as show in the
median \event recall" results in 3. However, the algorithms show poor results
when run over a full time window with no post-ranking of events, as seen in the
median \event recall" results in 2. This may allude to the fact that presence of
the short time windows along with the event clustering and ranking mechanism
contributes to the improved results rather than the algorithms themselves. With
this being said, it would be interesting to see how algorithms which already
perform well when run on full time windows [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] perform in the presence of short
time windows along with an ending event clustering and ranking mechanism.
Many \short lived" events on social media go undetected due to the nature of
many popular algorithms not considering their full time window(s) when
returning ranked lists of events. In this paper, we proposed how subdividing time into
shorter time windows, extracting events and then recombining the event results
using an event clustering and ranking mechanism can provide statistically
signi cant improvements compared to when no time-based subdivisions are used.
        </p>
        <p>We note, however, that to nd a \reasonable" value for this number of short
time windows variable in order to yield improved results is not always
straightforward. Picking this variable could be a function of a) granularity of the events
required, b) size of full time window and c) arrival rate of data. More research
is needed to identify robust methodologies for how to determine this value.</p>
        <p>In our approach, we proposed ranking events based on a \event clustering
and ranking mechansim". However in reality, any other way of e ectively ranking
events could be explored and thus would constitute an area of future work.</p>
        <p>In this paper, we have explored a selection of event detection methods. It
would be interesting to see how some algorithms perform (which already achieve
high results when not considering short(er) time windows perform) when run on
multiple subdivisions of the data before having their events recombined together.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was funded by Science Foundation Ireland through the SFI Centre
for Research Training in Machine Learning (18/CRT/6183).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Takeshi</given-names>
            <surname>Sakaki</surname>
          </string-name>
          , Makoto Okazaki, and
          <string-name>
            <given-names>Yutaka</given-names>
            <surname>Matsuo</surname>
          </string-name>
          .
          <article-title>Earthquake shakes Twitter users: real-time event detection by social sensors</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web</source>
          , pages
          <volume>851</volume>
          {
          <fpage>860</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Jon</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          .
          <article-title>Bursty and hierarchical structure in streams</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <volume>373</volume>
          {
          <fpage>397</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Luca</given-names>
            <surname>Maria</surname>
          </string-name>
          <string-name>
            <surname>Aiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Georgios</given-names>
            <surname>Petkos</surname>
          </string-name>
          , et al.
          <article-title>Sensing trending topics in Twitter</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          ,
          <volume>15</volume>
          (
          <issue>6</issue>
          ):
          <volume>1268</volume>
          {
          <fpage>1282</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Carmela</given-names>
            <surname>Comito</surname>
          </string-name>
          et al.
          <article-title>Bursty event detection in Twitter streams</article-title>
          .
          <source>ACM Transactions on Knowledge Discovery from Data (TKDD)</source>
          ,
          <volume>13</volume>
          (
          <issue>4</issue>
          ):1{
          <fpage>28</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Zafar</given-names>
            <surname>Saeed</surname>
          </string-name>
          , Rabeeh Ayaz Abbasi, Imran Razzak, Onaiza Maqbool,
          <string-name>
            <given-names>Abida</given-names>
            <surname>Sadaf</surname>
          </string-name>
          , et al.
          <article-title>Enhanced heartbeat graph for emerging event detection on Twitter using time series networks</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>136</volume>
          :
          <fpage>115</fpage>
          {
          <fpage>132</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Farzindar</given-names>
            <surname>Atefeh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Wael</given-names>
            <surname>Khreich</surname>
          </string-name>
          .
          <article-title>A survey of techniques for event detection in twitter</article-title>
          .
          <source>Computational Intelligence</source>
          ,
          <volume>31</volume>
          (
          <issue>1</issue>
          ):
          <volume>132</volume>
          {
          <fpage>164</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Mario</given-names>
            <surname>Cordeiro</surname>
          </string-name>
          et al.
          <article-title>Online social networks event detection: a survey</article-title>
          .
          <source>In Solving Large Scale Learning Tasks. Challenges and Algorithms</source>
          , pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          41. Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Rui</given-names>
            <surname>Li</surname>
          </string-name>
          , Kin Hou Lei, Ravi Khadiwala, and
          <string-name>
            <surname>Kevin</surname>
            <given-names>Chen-Chuan</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>Tedas: A twitter-based event detection and analysis system</article-title>
          .
          <source>In 2012 IEEE 28th International Conference on Data Engineering</source>
          , pages
          <volume>1273</volume>
          {
          <fpage>1276</fpage>
          . IEEE,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Yiming</surname>
            <given-names>Yang</given-names>
          </string-name>
          , Tom
          <string-name>
            <surname>Pierce</surname>
          </string-name>
          , et al.
          <article-title>A study of retrospective and on-line event detection</article-title>
          .
          <source>In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>28</volume>
          {
          <fpage>36</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>James</surname>
          </string-name>
          Allan et al.
          <source>Topic detection and tracking pilot study nal report</source>
          .
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Georgiana</surname>
            <given-names>Ifrim</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Bichen</given-names>
            <surname>Shi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Igor</given-names>
            <surname>Brigadir</surname>
          </string-name>
          .
          <article-title>Event detection in twitter using aggressive ltering and hierarchical tweet clustering</article-title>
          .
          <source>In Second Workshop on Social News on the Web (SNOW)</source>
          , Seoul, Korea, 8
          <article-title>April 2014</article-title>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sasa</surname>
            <given-names>Petrovic</given-names>
          </string-name>
          , Miles Osborne, and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          .
          <article-title>Streaming rst story detection with application to twitter. In Human language technologies: The 2010 annual conference of the north american chapter of the association for computational linguistics</article-title>
          , pages
          <volume>181</volume>
          {
          <fpage>189</fpage>
          . Association for Computational Linguistics,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Yu</surname>
            Zhang and
            <given-names>Zhiyi</given-names>
          </string-name>
          <string-name>
            <surname>Qu</surname>
          </string-name>
          .
          <article-title>A novel method for online bursty event detection on Twitter</article-title>
          .
          <source>In 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)</source>
          , pages
          <fpage>284</fpage>
          {
          <fpage>288</fpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wei</surname>
          </string-name>
          Xie et al.
          <article-title>Topicsketch: Real-time bursty topic detection from twitter</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>28</volume>
          (
          <issue>8</issue>
          ):
          <volume>2216</volume>
          {
          <fpage>2229</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>James</surname>
            <given-names>Allan</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ron</given-names>
            <surname>Papka</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Victor</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          .
          <article-title>On-line new event detection and tracking</article-title>
          .
          <source>In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>37</volume>
          {
          <fpage>45</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Jianshu</given-names>
            <surname>Weng</surname>
          </string-name>
          and
          <string-name>
            <surname>Bu-Sung Lee</surname>
          </string-name>
          .
          <article-title>Event detection in twitter</article-title>
          .
          <source>In Fifth international AAAI conference on weblogs and social media</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Mathioudakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nick</given-names>
            <surname>Koudas</surname>
          </string-name>
          .
          <article-title>TwitterMonitor: trend detection over the twitter stream</article-title>
          .
          <source>In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data</source>
          , pages
          <volume>1155</volume>
          {
          <fpage>1158</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Carmela</surname>
            <given-names>Comito</given-names>
          </string-name>
          , Agostino Forestiero, and
          <string-name>
            <given-names>Clara</given-names>
            <surname>Pizzuti</surname>
          </string-name>
          .
          <article-title>Word Embedding based Clustering to Detect Topics in Social Media</article-title>
          .
          <source>In 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)</source>
          , pages
          <fpage>192</fpage>
          {
          <fpage>199</fpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Ali Mert Ertugrul, Burak Velioglu, and
          <string-name>
            <given-names>Pinar</given-names>
            <surname>Karagoz</surname>
          </string-name>
          .
          <article-title>Word embedding based event detection on social media</article-title>
          .
          <source>In International Conference on Hybrid Arti cial Intelligence Systems</source>
          , pages
          <fpage>3</fpage>
          <lpage>{</lpage>
          14. Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>David M Blei</surname>
            , Andrew Y Ng, and
            <given-names>Michael I</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of machine Learning research</source>
          ,
          <volume>3</volume>
          (Jan):
          <volume>993</volume>
          {
          <fpage>1022</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Daniel</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            and
            <given-names>H Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Seung</surname>
          </string-name>
          .
          <article-title>Learning the parts of objects by non-negative matrix factorization</article-title>
          .
          <source>Nature</source>
          ,
          <volume>401</volume>
          (
          <issue>6755</issue>
          ):
          <volume>788</volume>
          {
          <fpage>791</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>Scott</given-names>
            <surname>Deerwester</surname>
          </string-name>
          ,
          <string-name>
            <surname>Susan T Dumais</surname>
          </string-name>
          , et al.
          <article-title>Indexing by latent semantic analysis</article-title>
          .
          <source>Journal of the American society for information science</source>
          ,
          <volume>41</volume>
          (
          <issue>6</issue>
          ):
          <volume>391</volume>
          {
          <fpage>407</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. Xueqi Cheng, Xiaohui Yan, et al.
          <article-title>Btm: Topic modeling over short texts</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>26</volume>
          (
          <issue>12</issue>
          ):
          <volume>2928</volume>
          {
          <fpage>2941</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Derek</surname>
          </string-name>
          Greene et al.
          <article-title>Exploring the political agenda of the european parliament using a dynamic topic modeling approach</article-title>
          .
          <source>arXiv preprint arXiv:1607.03055</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Erich</surname>
            <given-names>Schubert</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Weiler</surname>
          </string-name>
          , and
          <string-name>
            <surname>Hans-Peter Kriegel</surname>
          </string-name>
          .
          <article-title>Signitrend: scalable detection of emerging topics in textual streams by hashed signi cance thresholds</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>871</volume>
          {
          <fpage>880</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26. David A Shamma,
          <string-name>
            <given-names>Lyndon</given-names>
            <surname>Kennedy</surname>
          </string-name>
          , and Elizabeth F Churchill.
          <article-title>Peaks and persistence: modeling the shape of microblog conversations</article-title>
          .
          <source>In Proceedings of the ACM 2011 conference on Computer supported cooperative work</source>
          , pages
          <volume>355</volume>
          {
          <fpage>358</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Jonathan</surname>
          </string-name>
          Chang et al.
          <article-title>Reading tea leaves: How humans interpret topic models</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>288</volume>
          {
          <fpage>296</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Derek O'Callaghan</surname>
          </string-name>
          et al.
          <article-title>An analysis of the coherence of descriptors in topic modeling</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>42</volume>
          (
          <issue>13</issue>
          ):
          <volume>5645</volume>
          {
          <fpage>5657</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Je</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Symeon</surname>
            <given-names>Papadopoulos</given-names>
          </string-name>
          , David Corney, and
          <article-title>Luca Maria Aiello</article-title>
          .
          <article-title>SNOW 2014 Data Challenge: Assessing the Performance of News Topic Detection Methods in Social Media</article-title>
          .
          <source>In SNOW-DC@ WWW</source>
          , pages
          <volume>1</volume>
          {
          <issue>8</issue>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>