=Paper=
{{Paper
|id=Vol-1696/paper2
|storemode=property
|title=Catching Events in the Twitter Stream:
A Showcase of Student Projects
|pdfUrl=https://ceur-ws.org/Vol-1696/paper2.pdf
|volume=Vol-1696
|authors=Tim Kreutz,Malvina Nissim
|dblpUrl=https://dblp.org/rec/conf/lrec/KreutzN16
}}
==Catching Events in the Twitter Stream:
A Showcase of Student Projects==
<pdf width="1500px">https://ceur-ws.org/Vol-1696/paper2.pdf</pdf>
<pre>
                                  Catching Events in the Twitter Stream:
                                      A showcase of student projects
                                             Tim Kreutz and Malvina Nissim
                            Center for Language and Cognition Groningen – Language Technology
                                         Rijksuniversiteit Groningen, The Netherlands
                                t.j.kreutz@student.rug.nl, m.nissim@rug.nl
                                                               Abstract
A group of bachelor students in information science at the University of Groningen applied off-the-shelf tools to the detection of events
on Twitter, focusing on Dutch. Systems were built in four socially relevant areas: sports, emergencies, local life, and news. We show that
(i) real time event detection is a feasible and suitable way for students to learn and employ data mining and analysis techniques, while
building end-to-end potentially useful applications; and (ii) even just using off-the-shelf resources for such applications can yield very
promising results.


                     1.    Introduction                                                     2.    Applications
                                                                           The University of Groningen extracts Dutch tweets
    The availability of a constant flow of information in the          from the Twitter Firehose and provides access to the stream
form of short texts makes it in theory possible to collect real        for students and employees (Tjong Kim Sang 2011). Stu-
time information about virtually all sorts of events that are          dents used the backlog of available tweets to select a spe-
being written about. The attractiveness of this is evident,            cific snapshot or demonstrated their systems using the most
and so is its potential social utility. ReDites, an event de-          recent tweets.
tection and visualisation system (Osborne et al. 2014), is a               The definition of an event was very application depen-
prime example of this, as it was developed in order to help            dent, but always informed by some secondary information,
information analysts identify security-related events.                 be that peaks in Twitter usage, shared time and location of
    However, how the twitter stream can be successfully ex-            tweets or overlap with news headlines. Practically and gen-
ploited to this end isn’t straightforward, neither conceptu-           erally speaking, tweets about a single event overlap in some
ally nor practically. First, specific events must be detected,         way, and finding sophisticated ways to detect this overlap
and related tweets clustered. This step must rely on a def-            was a core challenge for all projects.
inition of what an event is, which is often application de-                For none of the developed applications, typical noise
pendent. Second, the retrieved tweets must be filtered and             in tweets was of any particular concern, although hashtags
processed to minimise noise, both in terms of pertinence as            and URLs were often removed or replaced by placeholders
well as in terms of the noise typical to the nature of tweets.         in preprocessing.
Third, in order to produce a meaningful tool, the output                   Visualization for end users was provided by a few of the
must be evaluated at a development stage, and made as us-              projects (Kreutz 2015; de Kleer 2015; Pool 2015), where
able as possible for the end user in its final form, for exam-         live demonstrations of their results consisted for example
ple providing customisation and visualisation features.                in a website listing relevant tweets per news headline, or
                                                                       maps where categorised events are clustered by location.
    In this short paper, we report a series of efforts within a
                                                                           The students’ systems that we describe showcase four
bachelor programme in information science, where a group
                                                                       different application areas: sports, emergencies, local
of nine students developed different systems with different
                                                                       events, and news.
specific aims, all exploiting real time tweet-derived infor-
mation, and all socially relevant. The systems work with               2.1.    Sports
Dutch tweets, but their architecture is virtually language-                Sports fans like to stay up to date with real time scores
independent, as long as tweets and basic language process-             by accessing websites such as livescore.com, which
ing tools are available. Suggesting novel methodologies or             provides live overviews of football matches for major
applications for event detection wasn’t our primary concern            Leagues.1 The overview consists of tables including major
when developing the systems and when writing this con-                 events in the game, like goals and yellow/red cards. It takes
tribution. Instead, by describing this collection of differ-           a lot of time to manually input in-game events, which is
ent student projects in the area of real time event detection          why obtaining reliable real-time updates can be expensive
towards social utility, we have a twofold aim. First, we               and automatising this process leveraging real time Twitter
show that even leveraging off-the-shelf tools and basic pre-           data becomes attractive. Three of the projects were con-
processing can yield interesting, promising, and even un-              cerned with automatically reproducing such tables by using
expected results, with a variety of applications. Second, as           the stream of Twitter data to automatically detect and clas-
we have observed that event detection on Twitter has been a            sify in-game events. We describe here one of the developed
useful and suitable task for students, helping them gain fa-           systems, where matches of the Dutch national team in the
miliarity with data mining and data analysis while putting
together end-to-end systems, we hope to inspire others to                  1
                                                                             Companies such as Livescore.com buy real-time information
embark on similar exercises.                                           for prices that vary according to the prestige of the League.
2014 World Cup of soccer are used as a case to demonstrate
                                                                   Table 1: Features for Kuiper (2015)’s sports event detection
the approach (Kuiper 2015).
                                                                   system.
    The selected five matches featured a total of 19 goals, 15
yellow cards and no red cards, so only the first two types
of events were predicted. For each match, two hours of
Dutch Twitter data from the first minute of the match was
collected, resulting in a total of 4.376 relevant tweets.
    Kuiper (2015) makes use of past sports event and au-
tomatic annotation to save on the effort that would go into
annotating such a large set of tweets. Firstly, only tweets
that contain hashtags referring to a certain match are con-
sidered. The convention of such a hashtag is using the first
three characters of the involved teams (#SpaNed for Spain
versus the Netherlands). Secondly, the timestamp of each
tweet is compared to the timeline of the actual match. As
such, all tweets posted up to three minutes after an actual
event took place will be annotated as being about the event.
In the training data, distribution of the classes no event, goal
and yellow card were respectively 55%, 37% and 8%.
                                                                   cies presented in Kuiper (2015).
    The data was then modified to allow for a more general
application of the system. This involved replacing specific        2.2.   Emergencies
scores and players with placeholders. Occurrence of spe-
                                                                       Twitter allows for detection of real-time sub-events in
cific keywords that denote an event were used as features,
                                                                   sports because relevant tweets follow these events almost
along with the length of the tweet and the tf-idf term vector.
                                                                   instantly. The delay between a real-time occurrence and its
    Beyond detecting events in single tweets, detecting            social resonance are thus minimal. This adds to the social
events has to do with grouping relevant tweets in the right        relevance of detection of events that are particularly time-
way. Detection of peaks in Twitter activity has been used          sensitive, such as emergency situations. This section will
to detect events (Corney, Martin, and Goker 2014; Van              look at two different emergency scenarios: earthquakes in
Oorschot, Van Erp, and Dijkshoorn 2012). Specifically,             the Dutch province of Groningen, and detection of context
Chakrabarti and Punera (2011) demonstrate how tweet vol-           for events reported by Dutch emergency services.
ume signifies important events in sport matches. However,
in soccer it is more likely that two important events occur        2.2.1. Earthquakes
in close proximity which is problematic for peak detection,            The detection of earthquakes on Twitter has been ex-
since the events will be grouped as one event. To more             tensively documented (Sakaki, Okazaki, and Matsuo 2010)
accurately distinguish between events, Kuiper (2015) im-           for Japan, where the tweet density is high and earthquakes
plements a rule-based system that looks at tweet content           occur relatively frequently. The research focuses on detec-
during peeks. If at least fifteen% of tweets are classified as     tion of earthquakes and extraction of the time that it oc-
goal-tweets, the rule based system determines whether the          curred, along with the location. Earthquakes with a magni-
mentioned score is logically probable (a match with score          tude of 3.0 or higher on the Richter scale were successfully
1-1 logically progresses to either 1-2 or 2-1) and updates         detected in 96% of the cases, and real-time detection led
the score. This way, the score is updated before a poten-          to notifying civilians faster than the Japan Meteorological
tial second goal, allowing consecutive score updates to be         Agency could, in most cases.
detected.                                                              Detection of earthquakes in Groningen has only re-
    Using the Multinomial Naive Bayes implementation in            cently become relevant since gas extraction in the province
Scikit learn (Pedregosa et al. 2011), with the set of fea-         led to a 200% increase in earthquakes over the past ten
tures in Table 1, classification of individual tweets yields       years (Kuipers 2015). This has lately sparked debate in
an f-score of .843. For the matches of the Dutch national          politics and media and increased public involvement. De-
team, sixteen out of seventeen goals were detected in the          tection of earthquakes using Twitter can thus contribute to
right minute (f-score .940), but detection of yellow cards         timely updates, but it may also map public sentiments.2
was harder (f-score .500).                                             To develop his system for detecting earthquake events
    Beyond achieving good results for a case of very spe-          via Twitter, Kuipers (2015) used data from the Dutch Mete-
cific matches in a very specific sport, Kuiper (2015) demon-       orological Institute (KNMI) from January 2014 until April
strates the feasibility of automatic sub event detection in        2015. The data contained 60 earthquakes with a magnitude
sports in general, specifically with regards to grouping and       of 1.2 or higher on the Richter scale, their timestamp and
distinction of isolated events. However, since no two sports       location of the epicenter. Weaker earthquakes are generally
are the same, consideration has to be made of the volatil-         considered intangible for humans, and hence not useful for
ity of events. Detecting subevents in the stream of twitter
data may be a lot harder for the faster-paced basketball for          2
                                                                        In the context of earthquakes in the Groningen area, this is
example. More importantly, a substantial amount of tweets          interesting also in the context of NAM’s compensation duties for
that discuss specific events are needed to reach the accura-       earthquake-caused damage to local properties.
the research as there would be no tweets about them. A pre-
                                                                          Table 2: Most indicative words for relevancy.
selection of Twitter data was made by selecting tweets con-
                                                                        Token         Proportion (relevant to irrelevant)
taining the words ‘beving’ or ‘aardschok’ that were tweeted
up to four hours after the occurrence of an earthquake.                 Gaat          13.8 : 1
    Using Weka (Hall et al. 2009), a Naive Bayes classifier             1             11.3 : 1
was trained on the annotated tweets and tested via cross-               Weer          1 : 8.2
validation. Results show that tweets are correctly classi-              @             1: 7.8
fied as relevant or irrelevant to a given earthquake in over            !             1 : 7.5
91% of the cases. Among the most distinguishing features                Niet          1: 7.0
are the mention of a location in the Groningen or Drenthe               Brandweer 6.9 : 1
province (boolean) which usually signals an actual earth-               /             6.9 : 1
quake, and the mention of political terms (boolean) which               Maar          1 : 6.4
usually signals no actual earthquake. Further features used             (             6.0 : 1
as potential indicators of relevant tweets are mentions of
numbers, which can signal a specific time or magnitude,          draw from a larger pool of tweets, indicating more severe
and certain signal words that are used to signal the sensa-      cases or emergencies that occur in more densely populated
tion of experiencing an earthquake (’voel’, ’tril’, ’knal’).     locations.
2.2.2. Emergency services                                        2.3.    Local events
    P20003 is a live repository of all emergency services ac-
tive in a given area. All reports are publicly available and         The meta-data attached to tweets can be useful for cer-
real-time updated communications of and between Dutch            tain instances of event detection. Pool (2015) and de Kleer
police, ambulance and fire department services are avail-        (2015) show that using the relatively low frequency of geo-
able. You can think of the first reports of a fire, the cars     tagged tweets, it is possible to cluster various sorts of events
inbound to the location of the reports and the way the dis-      on the local scene, classify them and map where they occur
tress is handled.                                                in real-time.
    The work described in (Louwaars 2015) is concerned               Detecting events using geo-locations from Twitter has
with matching user tweets to reports from emergency ser-         previously been done by Walther and Kaisser (2013) and
vices in the same area. The rationale behind this is that such   applying a similar approach to Dutch tweets is plausible
matches could be used to diminish delay in notifying stake-      because the Netherlands has one of the highest twitter ac-
holders, or adding context to official, quantitative reports.    counts to population ratio (Pool 2015).
The real-time nature of Twitter makes it particularly suit-          All geo-tagged tweets from a month of Dutch Twit-
able for detecting time-sensitive events like emergencies.       ter data were used for training. This resulted in a to-
    One month of emergency reports and tweets were               tal of 566.549 geo-tagged tweets. The geo-information
downloaded from the P2000 website in April 2015 for the          was translated into a geoHash that denotes a specific area,
larger Groningen area. This resulted in 700 ‘matches’ of         and tweets with a similar geoHash and comparable times-
reports to one or more tweets with a similar location and        tamp were grouped and added to a list of event candidates.
time. A Naive Bayes model was trained on 80% of the an-          To handle the hard borders of the geohash area, candi-
notated data, using simply word occurrences as features, to      dates with matching timestamps in adjacent areas were then
classify tweets as ‘relevant’ or ‘irrelevant’. Testing on the    merged (Figure 1).
remainder 20% resulted in a significant improvement over
the baseline (75% of tweets were annotated as irrelevant)
with a final accuracy of 91%. The limited amount of data,
and the skewedness between relevant and irrelevant tweets
does not make this result generalizable to a real-time ap-
plication for detecting emergency situations, but successful
matches do add some context to otherwise formal reports.
    The approach in which words are used to predict rele-
vancy allows for an overview of the most indicative words
(Table 2).
    A critical reflection on these results can be that
Louwaars (2015) observes that for some events the amount
of tweets is too low to draw any solid conclusion. He fur-
ther indicates that few of the relevant tweets comes from                 Figure 1: A border case in (de Kleer 2015)
‘real’ twitter users, with substantial data coming from au-
tomated emergency service accounts. This is also reflected           Two judges annotated the event candidates in the train-
in the list of most indicative words, which for a large part     ing data and the test data with the following labels: No
contains random tokens (apart from ’Brandweer’). As a so-        event, Meeting, Entertainment, Incident, Sport and Other,
lution, the system could be trained on emergencies that can      with the most frequent category being No event (triggering
                                                                 a 46% baseline). Inter-annotator agreement was measured
   3
       http://www.p2000-online.net/groningenf.html               via Cohen’s Kappa (Cohen 1960). Features that were found
                                                                  Figure 3: Visualisation of relevant tweet commentary on
                                                                  news events on nieuwstwiets.nl (Kreutz 2015).
Figure 2: Visualisation of automatically classified events in
The Netherlands, in April 2015 (de Kleer 2015).
                                                                  are too similar to the title (a retweet for example), too dis-
                                                                  similar (are not about the article) and actual relevant tweets
to be most useful for this task were the most frequent words,     that add meta-commentary. It is the latter type that one
the location using only the first five characters of the geo-     would want to detect and use, while discarding the former
Hash, the average word overlap between tweets in an event         two as non-relevant.
candidate and the average word overlap between different              To distinguish between relevant and non-relevant
users in an event candidate.                                      tweets, four non linear machine learning algorithms were
    Several models were built and tested on development           trained and tested. Four news articles from the 21st of
data, with the final system being a Naive Bayes model             May 2015 were selected to be compared to 24 hours of
which yielded an accuracy of 84% on test data. Especially         Dutch Twitter data from the same day. For training and test-
considering the inter annotator agreement was measured at         ing, 250 candidate tweets were selected using the approach
K = 0.87, this is a very good result. This research also          mentioned above. This cut off point was chosen because
shows that meta-information from tweets can successfully          after annotation it became clear that for each of the articles,
be used to detect events. The end-user output of the system       the number of relevant tweets that could be found after rank
is a map with classified events (Figure 2).                       250 was negligible.
                                                                      In the 1,000 candidates, 593 were annotated as relevant
2.4.   News                                                       to the articles which results in a baseline of 59.3%. The
    Twitter data has also been used to detect real time com-      system was trained on six features: (1) the difference in
mentary on news events. News media websites often fea-            timestamp between the publication of the article and the
ture their own social plugins which allow readers to discuss      publication of the tweet, (2) the cosine similarity between
news items. These discussions are relatively structured and       the title of the article and the tweet, (3) the difference in
easy to relate to the news article. However, when people          length between the title of the article and the tweet, (4) the
post their commentary to Twitter, it becomes problematic          cosine similarity between the abstract of the article and the
to link the tweet back to the article and to group all relevant   tweet, (5) the amount over overlapping Named Entities, (6)
discussion together.                                              the cosine similarity between bigrams in the abstract of the
    In (Kreutz 2015), RSS feeds of the three most popular         article and the tweet.
Dutch news websites are used to detect similar content on             Named entities were extracted by means of a Named
Twitter. The RSS feeds give access to 41 headlines and            Entity Recognizer trained on the CoNLL2002 Dutch cor-
related abstracts at the same time. Each of the news items        pus using NLTK (Bird, Klein, and Loper 2009). A Random
is then compared to the last hour of Dutch Twitter data to        Forest classifier performed the best on the test data with an
extract reaction, opinions and other meta-commentary that         F-score of 0.874. It also helped to determine the most im-
users posted.                                                     portant features in the task. The list of features mentioned
    To deal with the computational effort involved in mak-        above adheres to this order, timestamp difference being the
ing this many comparisons (an hour of Dutch twitter data          most predictive feature.
often contains more than 30,000 tweets), a first module of            The selection of viable candidates before automatic
the system makes a pre-selection of candidates to be con-         classification proves successful in reducing computational
sidered. The candidates are made up of the 25 tweets with         effort, while still keeping the detection of relevant commen-
the highest cosine similarity compared to the title of the        tary possible. This approach is therefore suitable for a real-
news items. Generally, these 25 tweets contain tweets that        time application of the system. demonstrates this by apply-
ing the system to a specifically dedicated website that up-      Corney, David, Carlos Martin, and Ayse Goker (2014).
dates its news articles and tweets hourly (http://www.             “Spot the Ball: Detecting Sports Events on Twitter”. In:
nieuwstwiets.nl, Figure 3). Rather than represent-                 Proceedings of Advances in Information Retrieval: 36th
ing the results of the research with an F-score, it provides       European Conference on IR Research, ECIR 2014, Am-
insight by showing users the resulting tweets.                     sterdam, The Netherlands. Ed. by Maarten et al. Rijke.
                                                                   Springer, pp. 449–454.
         3.    Discussion and Conclusions                        de Kleer, David (2015). “EventDetective: detectie, verrijk-
    In this overview we have reported efforts of bachelor          ing en visualisatie van Twitter events”. Bachelor Thesis
students in the field of automatic event detection exploiting      in Information Science. University of Groningen.
the (Dutch) Twitter stream.                                      Hall, Mark et al. (2009). “The WEKA Data Mining Soft-
    Besides the differences in fields of application, this         ware: An Update”. In: SIGKDD Explor. Newsl. 11.1,
overview gives insight in the considerations that were made        pp. 10–18. ISSN: 1931-0145.
in dealing with the inherent challenges in event detec-          Kreutz, Tim (2015). “Detecting news event commentary on
tion. For the emergency detection and the detection of             Twitter”. Bachelor Thesis in Information Science. Uni-
local events, geo-information of tweets was used. Since            versity of Groningen.
this information is not always available, this sometimes re-     Kuiper, Jacco (2015). “Real-time automatic detection of
sulted in very little data to work with. For the detection of      soccer match events using Twitter”. Bachelor Thesis in
subevents in soccer, peaks of tweets with certain key pat-         Information Science. University of Groningen.
terns were used. This worked well for important subevents        Kuipers, Rolf (2015). “’En we schudden weer’”. Bachelor
(goals) and worse for minor subevents (yellow cards).              Thesis in Information Science. University of Groningen.
    The students used similar ways to remove noise from          Louwaars, Olivier (2015). “P2000 locatiedata als classifier
tweets, by removing or replacing URLs and hashtags. Even           voor tweets”. Bachelor Thesis in Information Science.
when hashtags were crucially used to detect events, such as        University of Groningen.
in (Kuiper 2015), they were then normalised at a second          Osborne, Miles et al. (2014). “Real-time detection, track-
stage in order to make the approach general and portable.          ing, and monitoring of automatically discovered events
The local event detection and news event detection sought          in social media”. In: Proceedings of ACL 2014: System
to extract as much possible information from tweets by nor-        Demostrations. Association for Computational Linguis-
malizing the hashtags. In the data selection for the soc-          tics, pp. 37–42.
cer events, hashtags had a leading role in selecting tweets      Pedregosa, Fabian et al. (2011). “Scikit-learn: Machine
only when they contained a hashtag that referred to a cer-         Learning in Python”. In: J. Mach. Learn. Res. 12,
tain match.                                                        pp. 2825–2830. ISSN: 1532-4435.
    Finally, evaluation showed good results in all the theses.   Pool, Chris (2015). “Detecting local events in the Twitter
Some students chose to apply their findings in a real time         stream”. Bachelor Thesis in Information Science. Uni-
setting by visualizing them. This lead to demonstration of         versity of Groningen.
the systems by Pool (2015) and de Kleer (2015) on Event-         Sakaki, Takeshi, Makoto Okazaki, and Yutaka Matsuo
Detective and Kreutz (2015) on nieuwstwiets.nl.                    (2010). “Earthquake Shakes Twitter Users: Real-time
    With this exercise we observed that real time event de-        Event Detection by Social Sensors”. In: Proceedings of
tection on social media is a field that students can suc-          WWW ’10. New York, NY, USA: ACM, pp. 851–860.
cessfully experiment with. Although the aim was not to           Tjong Kim Sang, Erik (2011). “Het gebruik van twitter voor
build the next generation event detection applications, the        taalkundig onderzoek”. In: TABU: Bulletin voor Taal-
choices that the students made in the course of such a re-         wetenschap 39.1/2, pp. 62–72.
search reflect some of the core challenges and considera-        Van Oorschot, Guido, Marieke Van Erp, and Chris Dijk-
tions central to this task and we believe are useful lessons       shoorn (2012). “Automatic extraction of soccer game
for future endeavors, both from a research and a teaching          events from Twitter”. In: Proceedings of Detection, Rep-
perspective. It also reflects that there are no ready-made         resentation and Exploitation of Events in the Semantic
best practices when it comes to defining events and select-        Web (DeRiVE 2012). Boston, MA, USA. (CEUR Pro-
ing data, and that each socially useful task will have its own     ceedings, 902).
needs and therefore strategies.                                  Walther, Maximilian and Michael Kaisser (2013). “Geo-
                                                                   spatial event detection in the twitter stream”. In: Ad-
                       References                                  vances in Information Retrieval. Springer, pp. 356–367.
Bird, Steven, Ewan Klein, and Edward Loper (2009). Nat-
  ural Language Processing with Python. O’Reilly Media.
Chakrabarti, Deepayan and Kunal Punera (2011). “Event
  Summarization Using Tweets.” In: ICWSM 11, pp. 66–
  73.
Cohen, J. (1960). “A Coefficient of Agreement for Nomi-
  nal Scales”. In: Educational and Psychological Measure-
  ment 20.1, p. 37.

</pre>