Personal Life Event Detection from Social Media

                    Smitashree Choudhury                                            Harith Alani
                     Knowledge Media Institute                               Knowledge Media Institute
                       The Open University                                     The Open University
                         United Kingdom                                          United Kingdom
           smitashree.choudhury@open.ac.uk                                    h.alani@open.ac.uk


ABSTRACT                                                          tool, nonetheless most popular online systems are carrying
Creating video clips out of personal content from social me-      huge amount of data created by individual users in the form
dia is on the rise. MuseumOfMe, Facebook Lookback, and            of texts, videos, and photos. While technology for data cre-
Google Awesome are some popular examples. One core chal-          ation and storage has significantly matured and efficiently
lenge to the creation of such life summaries is the iden-         managed, accessing, managing and processing of such data
tification of personal events, and their time frame. Such         is still a challenge and can be done by fews experts. Due to
videos can greatly benefit from automatically distinguishing      the lack of efficient data access mechanism available to nor-
between social media content that is about someone’s own          mal users, most of the historical data tend to be forgotten
wedding from that week, to an old wedding, or to that of a        or will remain unused.
friend. In this paper, we describe our approach for identi-
fying a number of common personal life events from social         Access and reuse of such information trove will provide greater
media content (in this paper we have used Twitter for our         insight about the individual user, their preferences, and sit-
test), using multiple feature-based classifiers. Results show     uational dynamics and result in many useful applications
that combination of linguistic and social interaction features    e.g. personalised healthcare, customised training and edu-
increases overall classification accuracy of most of the events   cation, social and community engagement application and
while some events are relatively more difficult than others       life stories. To this end, mining and analysing such con-
(e.g. new born with mean precision of .6 from all three mod-      tent could help identifying one’s life milestones and salient
els).                                                             events. Identifying interesting and important moments in
                                                                  one’s timeline on social media is valuable to services such as
Keywords                                                          Facebook Lookback and Google Awesome, which generates
Social Web, social media, event detection, personal life events   short video clips for users to summarise and visualise their
                                                                  timelines.
1.   INTRODUCTION                                                 In realisation of the importance of events on social media,
With the wide spread of social media sites (e.g. Twitter,         Facebook 2 has recently generated millions of 1 minute look-
Facebook, YouTube), millions of of people use them on daily       back videos of content from users’ timelines. Over 270 mil-
basis to communicate and share information on a wide vari-        lion video rendered and over 200 million users watched their
ety of events, ranging from world events (e.g. World Cup),        look back movie in the first two days and more than 50%
to personal events (e.g., Wedding, Graduation). Use of these      shared their movie. A project like Intel’s Museum of Me3
systems serves the multitude of purposes of knowledge shar-       follows a similar line to collect data from user’s Facebook
ing, information communication, event organisation, profes-       profile and generate a short video. Purpose of our work (per-
sional collaboration, political expression, as well as social-    sonal life event detection) is a sub-objective of the broader
isation. To put in perspective, more than 500 million of          research objective in similar direction i,e, automatic creation
tweets generated in a day1 , millions of photos are uploaded      of digital documentaries from social media content including
to Facebook every day. There may be differences in terms          interesting and relevant life moments and events.
of content volume created on different platforms depending
on the personal preferences and the perceived purpose of the      Event detection from social media content has so far been fo-
1
 https://blog.twitter.com/2013/new-tweets-per-second-             cused on detecting world events such as earthquakes [Chile,
record-and-how                                                    japan], political protests, elections (US, Germany, UK ) and
                                                                  planned public events such as entertainment award func-
                                                                  tions (Oscar, Golden Globe), academic events (conferences),
                                                                  sports event (Olympic). However, detection of personal life
                                                                  events have been mostly overlooked, and only mildly inves-
                                                                  tigated for content recommendation [cite]. Objective of this
                                                                  piece of is to automatically identify interesting and impor-
                                                                  2
                                                                    https://code.facebook.com/posts/236248456565933/looking-
                                                                  back-on-look-back-videos
                                                                  3
                                                                    http://www.intel.com/museumofme/r/index.htm
tant life events of individual users from their social media      resurgence of interest in detecting social topics and events
content, which can be part of their personal digital story-       in this new domain[?]. We have been motivated by the need
book or memory archive. In this work, we have taken Twit-         to identify personal life events, which have a great personal
ter as the test platform and will extend our research to other    value when aggregated over time and location. One of the
systems such as Facebook, Instagram, Pininterest in our fu-       prerequisites of such a system is the identification of content
ture work.                                                        reporting a real event. Events can be planned events such
                                                                  as cultural events, tech conferences, music award functions,
Detecting personal events is non-trivial and may require a        elections or sports event or unplanned events for example,
combination of multiple approaches for a robust detection         natural disasters, earthquack [?] and even generic events
result. Unlike public events or events concerning celebrities     such as breaking news events are subject of few studies [?][?].
and well-known personalities, personal events may not be          Existing studies cover both planned and unplanned events
characterised by high activity volume and additional sources      with varying degrees using both machine learning and text
of information e.g. blogs or Wikipedia. These events are          analysis techniques. Benson et.al.[?] reported detecting con-
limited to the concerned person and to her immediate social       cert events from social media stream using city calendar as
network (friends and family). In addition to the above prob-      a target list. Agarwal et. al.[?] detected events such as
lems, microblog sites like Twitter bring its own complexi-        factory fire, labor strike from Twitter stream using a com-
ties with short, informal and noisy content. Any meaning-         bination of local sensitive hashing and location dictionary.
making task on these content has to deal with these idiosyn-      Weng and Lee[?] proposed event detection with cluster-
crasies. Next, we will briefly delve into the concept of a per-   ing of word bursts from tweets. Authors in [?] proposed a
sonal event before going into the details of the experimental     natural disaster alert system using Twitter users as virtual
work.                                                             sensors. In their work, they were able to calculate the epi-
                                                                  centre of an earthquake by analyzing the delays of the first
1.1     Personal Life Events                                      messages reporting the shock. Social media centric event
Personal life events range from recurring events such as birth-   detection also covers non textual data such as photos and
days and anniversaries, to very occasional and uncommon           videos, Chen et al.[?] discovered social event from Flickr
events, such as work promotions, and relocation. Events can       photos by using both user tags and other metadata including
also be further categorised on an affective scale, from highly    time and location (latitude and longintude). Firan et.al[?]
positive and pleasant events to to unpleasant events, such as     explored tags, title and description to classify pictures into
illnesses or accidents and deaths of loved ones. In this pa-      event categories. Some of the popular approaches used for
per, we focus on 5 life events (4 positive and 1 negative) i.e.   event detection are spatio-temporal segmentation[?], burst
graduation, marriage/engagement, new job, birth of child,         analysis in word signals, clustering as well as topic detection
and surgery. Our motivation to start with these events in-        techniques.
spired by a study [?] which lists 6 important memorable life
events are ”Beginning school”, ”first full time job”, ”Falling    To the best of our knowledge, we found no prior studies on
in love”, ”Marriage”, ”Having children;”, ”Parent’s death”.       personal life event detection from social media except one
                                                                  reported in [?] where authors tried to detect two life events
The main contributions of this paper are not on algorithm         ”marriage” and ”employment” and bears some similarity to
and its efficiency, but rather on presenting evidence that        our work. Our focus is on user level event detection that can
with effective combination of existing methods and social         be used to build individual digital storyboards form histor-
media data, we can analyse and detects important and criti-       ical data.
cal moments of individuals life., hence the contributions are:


     • a thorough study of five personal life events and their
                                                                  3.   PERSONAL EVENTS ON TWITTER
                                                                  We now define the concept of personal life event in the con-
       idiosyncrasies as reported in social media especially in
                                                                  text of Twitter message stream and provide a definition of
       Twitter .
                                                                  the problem that we address in this work.
     • detection of life events using both content and inter-
       action features.                                           Definition of term ”event” differs from domain to domain
                                                                  ranging from Philosophy to cognitive psychology to com-
                                                                  puting. Despite a lack of uniform definition of the term it
This paper is organised as follows: In section 2 we review re-    embeds a few generic characterstics such as time, partici-
lated work in the field of event detection in social media and    pating objects and a location. In this context, we define
in section three, we briefly describe how personal life events    an event as a real world occurrence with an associated time
are reported on twitter and their characterisation. Section 4     period and one or more participating objects/agents at a
describes our approach which includes feature selection a nd      certain location which may or may not be explicitly appar-
model construction followed by discussion and conclusion in       ent in tweet messages. According to this definition a tweet
section 5.                                                        needs to reflect a time interval when the event has occurred
                                                                  involving either the user or someone connecting to the user
2.    RELATED WORK                                                as the participating agent. Based on this abstract notion,
Event detection is now a new research subject, and has been       we looked into the real data to confirm or re-arrange the def-
part of studies on topic detection in news stories and other      inition and devise a strategy for detecting personal events.
text documents [?]. Social media bought multi modal con-
tent created by both professional and amateurs leading to a
3.1      Dataset                                                     The second dimension where the event reporting differs is on
As a first step, we collected tweets using Twitter streaming         participating agent or affected subject. Event tweets are ei-
API4 which allows to crawl some portion of public tweets as          ther about the user who created the tweet or about someone
and when it comes. We restricted tweets to English language          else known to the user and in some cases, about an undefined
only and crawled for 3-4 hours per day for three weeks. The          group of people e.g. group of students. Since our focus is on
entire dataset contained around 4 million tweets. Ratio of           personal events, ideally we should target self-reported tweets
event tweets to non-event tweets is expected to be extremely         and ignore the rest. But resolving an event to a participat-
skewed as the targeted events are very specific and user cen-        ing agent needs advanced semantic role labelling which will
tric. So the next logical step is to use a filter mechanism to       be our next step of this ongoing work. For this paper, we
segregate the event related tweets from the rest and process         restricted our attention to generic event detection, hence in-
further. For this initial segregation, we extended the event         cluded all the tweets irrespective of who the affected subject
query with synonyms and related terms and phrases (shown             is.
in Table 1). These related terms are mainly synonyms and
terms commonly known and used to describe the event of               Based on this generic definition, we proceed with our actual
interest. Use of related terms with the main event terms             experiment task that starts with feature extraction.
were intended to widen the coverage where users might not
be using the exact terms to describe the main events. After          4.     FEATURE EXTRACTION
filtering we got 9168 tweets for marriage event, 2570 tweets         After filtering event related tweets from the non-event tweets,
for graduation, 3192 tweets for surgery, 3661 for new job and        we extracted different types of features [?] to be used for
2954 tweets for new born. A question may arise about those           building event classifiers. We examined several feature cate-
tweets where the event term may be absent yet the implicit           gories describing different aspects of tweets and users. Specif-
semantics reflects a real event for example. ”Welcome to the         ically we considered lexical, sentimental and social interac-
new member of our family”. However, we agree such kind               tion features.
of possible omissions with the present approach and intend
to capture them with contextual and historical information
as part of our future work. The resulting filtered datasets
                                                                     4.1      Textual Features
                                                                     Event term: The basic lexical feature of an event is the
still contain many irrelevant tweets. For example, ”family
                                                                        event term itself and most closely related terms or
have brought a 2nd lawsuit against her, this time to try to
                                                                        its synonym ”#graduation, convocation” for the event
annul her marriage” is not about a marriage event though it
                                                                        graduation. The synonyms are extracted from Word-
contains the keyword. Our task is identify such tweets from
                                                                        net5
genuine event tweets by means of binary classification.
                                                                     Co-occurring textual Features are the features of a term
                                                                         that co-occur significantly along with the event term
         Table 1: Events are their related words.                        for example, ”cap”, ”dress”, ”present”, ”prom”,”party”
     Event terms   Related Terms                                         are some of the frequently occurred terms for gradu-
     Marriage      ”Wedding”,”Tied the knot”,”married”                   ation, while ”prayer”, ”hospital” for surgery. Presence
     Graduation    ”Convocation”,”commencement ”                         of these terms along with the main event term is ex-
     New Job       ” new position”,”first day at work”,”job offer”       pected to boost the detection process. Co-occuring
     New Born      ”Baby boy”,”baby girl”, ”new born”                    terms were extracted from various tag based social me-
     Surgery       ”Operation”
                                                                         dia sites such as Flickr, instagram where terms are de-
                                                                         scribed with highly related terms. are These features
                                                                         are event specific and treated as binary values i.e. 1
Manual inspection of these tweets revealed that event re-                for presence otherwise 0.
porting tends to happen at three time spans; part, present,
and future. We also noticed three categories of participating        Temporal terms: This feature reflects the presence of time
agents (self, others individual and general public). Examples           terms in a tweet. Since the content are about an event,
of such diversities are shown in table 2.                               it is intuitive to assume that some reference to time is
                                                                        natural and required by definition. For this feature,
In light of these findings, defining a personal event seems to          we used LIWC’s time category which includes 68 time
be more tricky and imprecise. Two pertinent questions here              terms.
are how to resolve the time reference associated with the            Person reference terms: Since these events are about per-
event and how to associate the right subject (participating              sonal life event one or more reference terms reflect-
agent) with the event. In this study we are only focusing on             ing social relation is expected when the event is about
the events where the time reference can be resolved to a spe-            somebody other than the poster, or self reference if the
cific time point within a month time interval by automatic               event is about the user.
means. One such example is ”I graduated yesterday”, ” 26
days to graduation”. In both cases, the time of the event            Sentiment: personal events are expressed with rich emo-
can be resolved with help from the timestamp attached to                  tions both for pleasant or unpleasant events. Senti-
the message. However, ambiguous time references such as                   ments are detected by Sentistrength [?] library and
”graduation is so close yet so far”, ”marriage in few weeks               proved to be good for social media sentiment detec-
time” are ignored.                                                        tion. Value of this feature ranges from -5(negative) to
                                                                          +5(positive) while +1 to -1 considered as neutral.
4                                                                    5
    https://dev.twitter.com/docs/api/streaming                           http://wordnet.princeton.edu/
                                   Table 2: Events and their examples from Twitter.
            Event       Examples
           Marriage     Kansas City here we come! It’s happening! My sister’s marriage this weekend!! :)
                        8 years ago this day , married to the most loving man on this earth.
                        Congratulations to my beautiful friend, @SheridanMillls, who tied the knot today! ???
         Graduation     Happy graduation day, bebe! Congrats cutie pie! http://t.co/YqgNgK9WMw
                        Graduation is just around the corner. Time to start planning programs and certificates.
                        Talk to our print consultants today!
                        3 sets of graduation picture next week! Hahaha. At last! :)
           New Job      First day of a new job.... Kind of dreading it. #officeassistant
                        Starting my new position today. Ayy lmao.
                        Shout out to my cuz Quincy Johnson aka Q. On his new Executive Chef position! ???
          New Born      My baby girl is here! Introducing: Halen born naturally May 3rd @ 4:43 pm.
                        Exactly 3 weeks till my babyshower & almost 7 weeks till my baby boy Is born ?
           Surgery      Good luck on your surgery today
                        @chloebieber ear surgery ??it went well
                        Everyone please continue to pray for Karlie these next 5 hours. She just went back for her
                        brain surgery. #PrayersForKarlie


Non-Textual and punctuation Features relating to punc-          In this work, we have used the last two interaction features
    tuation and emoticons such as presence of ”!/?” are ex-     only for comparison study, while other features are part of
    pected to add the discriminating qualities of a learning    an extension work primarily focusing on iteration specific
    model.                                                      models in identifying life events.

4.2   Interaction and Social Feature                            5.    EXPERIMENTAL RESULT
Unigram is a basic model for classification and the result      In this step, we analyse the experimental steps and present
shows a reasonable accuracy including a poor performance        the results of classifications. We started with the ground-
for the new born event. This motivated us to further ex-        truth annotation process followed by classification steps and
plore the feature space and extract more defining attributes    their results.
of an event in terms of activity and interactions based on
the simple logic that important events are bound to gener-
ate more attention and activity within the immediate per-
                                                                5.1    Ground Truth Annotation
                                                                In the absence of any benchmark data for personal event
sonal network of an individual. Accordingly, we computed
                                                                detection prepared a gold standard dataset with manual an-
the following Twitter specific features concerning to a tweet
                                                                notation of 2 users with computing background . Annotators
and the user. These features can be broadly classified into
                                                                were given 1000 tweets per event for annotation. These 1000
two categories: 1) Activity and 2) Attention. Activity
                                                                tweets are randomly selected from the filtered dataset. In-
features (first four in the list below) are based on userÕs
                                                                struction for annotation was to annotate a tweet as event
activity (tweets, re-tweet and replies) while attention fea-
                                                                positive (presence of event) if they consider the tweet de-
tures are the measures of engagement between the user and
                                                                scribes an event happening (present e.g. today) or about to
his/her network (last four features in the list below)
                                                                happen with certainty (e.g. 4 days to graduation) within a
                                                                month’s time window. It is difficult to precisely define an
                                                                event as most of the tweets are not reported exactly during
  1. Tweets per day: Number of tweets per day a user posts
                                                                the event but pre and post event. Since our objective is to
  2. Re-tweets per day: Number of tweets per day a user         identify the event from userÕs timeline with definitive time
     posts.                                                     stamp attached to the event, we opted for a 1 month time
                                                                interval. We retained those tweets (304) as event positive
  3. Replies per day: Number of replies given by the user       tweets whenever both the annotators agreed on the label.
     to other users.                                            It is imperative to mention that event negative tweets are
                                                                simply those where annotators felt that a particular event is
  4. Unique mentions per day: Number of unique mention          not occurring despite the presence of event related keyword.
     (users addressed) in a day by the user.

  5. Number of times the user is mentioned in a day
                                                                5.2    Event Detection: Unigram Model(UNI)
                                                                Our first model is the simplest bag-of-word model where
  6. Number of times a user is replied to, by other users       word frequencies are used as features for document classifi-
                                                                cation. In our case, each tweet is considered 1 document.
  7. Number of times a tweet is re-tweeted by other users       We first applied a String to word vector filter that coverts
     **                                                         the strings into numerical features. Then we trained our
                                                                model with 10-fold cross validation using four different types
  8. Number of times a tweet is marked as ”favourite” by        of classifiers: Naive Bayes (NB), Multinomial Naive Bayes
     other users.**                                             (MNB), Support Vector Machine (SVM) and Decision Tree
                                                                    cial relation terms( my friend, sister etc.), temporal terms
                                                                    (today, week, morning etc.), sentiment strength of a tweet.
                                                                    POS tagging was done using Stanford tagger6 and sentiment
                                                                    was derived using the Sentistrength java library[?].

                                                                    Recognizing Temporal Expression:Temporal features
                                                                    tend to be implicit, diverse, and informal (e.g. last week,
                                                                    hourly, around the corner). Identifying these references within
                                                                    the vicinity of an event term occurrence increases the likeli-
                                                                    hood of accurate detection. Moreover, we need to resolve the
                                                                    tense of the verb as well to know weather the tweet is about
                                                                    some future event, or past. In this paper, we are using the
                                                                    time terms of LIWC dictionary which has 68 time inducing
      Figure 1: AUC curve for different events.                     terms (e.g. forever,week,until etc.). This feature also used
                                                                    as a binary feature in the second classification model.
(J48) implemented in machine learning library Weka [?]. We
                                                                    Average accuracy of the second model showed an average im-
evaluated our model on the test set (100 from each event)
                                                                    provement of 4-5 % in precision score over the initial model
and performance of these classifiers reported in terms of Re-
                                                                    for all the events, showing that simple lexical features are
call (is the number of correct results divided by the number
                                                                    able to capture some of the diversity. For brevity purpose
of results that should have been returned) Precision (is the
                                                                    we are only showing the results of the top classifier (SVM).
number of correct results divided by the number of all re-
turned results) and F-score (harmonic mean). Table 3 (fig.
2) shows the average precision, recall and F score for all the      Table 4: Precision, Recall and F-measure for
events. However SVM performed best in 4 out of 5 followed           (UNI+META) Model (SVM).
by Naive Bayes. Graduation (.8) has highest precision score
whereas ”New job” has the highest recall (.95) score. The                    Event         Precision   Recall   F-Measure
most difficult event is the ”New born” across all the classifiers            Graduation    0.83        0.81     0.819
with lowest precision score (.55).                                           Marriage      0.77        0.83     0.798
                                                                             New Job       0.818       0.93     0.865
Examining the ROC curves which plots the true positives
                                                                             New Born      0.61        0.92     0.733
(TP) vs false positives(FP) and indicates the area under
                                                                             Surgery       0.77        0.87     0.816
curve (figure 1) (AUC: probability that a classifier will rank
a randomly chosen positive instance higher than a randomly
chosen negative example) ranges from .71 to .75 giving a
reasonable quality of the learners. NB performs better than         5.4      Event Detection: Model with Interaction
SVM with an average of .77 against .72 across all events.                    Features (UNI+META+INT)
                                                                    Inherent in social media and social networks, it is intuitive
                                                                    to hypothesise that interesting events will stimulate inter-
Table 3: Average precision, recall and f-Measure                    esting and increased interaction among the friend circle of
from all classifiers based on unigram model.                        the user in the form of replies and sharing. The third and
                                                                    the final model takes advantage of these interaction features
       Event           Precision    Recall   F-Measure
                                                                    embedded in microblogging sites through mechanisms like
       Graduation      0.80         0.80     0.73                   retweet and favourites. Each tweet is now represented with
       Marriage        0.75         0.87     0.79                   two more features besides the above lexical features for clas-
       New Job         0.78         0.95     0.80                   sification. We used only SVM as the classifier because of its
       New Born        0.55         0.92     0.68                   superior performance in previous two occasions. Results of
       Surgery         0.72         0.87     0.76                   the final model (table 5) are reported by means of precision
                                                                    score per event. A final comparison of four models (UNI,
                                                                    UNI+META, UNI+META+INT and INT) is shown in fig-
Analysis of error classification mainly showed the diversity of     ure 3. The result shows that, although the hybrid model
language constructs among the misclassified tweets. Since           performed better than the unigram-based one (UNI), the
the model is purely content based, any variation not cap-           improvement was marginal. On the other hand, the model
tured by the model are missed from the result.                      based only on interaction features (INT) performed worst,
                                                                    where accuracy dropped to 53-61%. .
5.3    Event Detection: Model with Contextual
       Lexical Patterns (UNI+META)                                  6.     CONCLUSION
                                                                    This paper describes event detection from personal timeline
Bag-of-words or unigram model is the basic approach yet
                                                                    of a user in Twitter. Existing detection tasks predominantly
proved to have reasonable accuracy though with lots of false
                                                                    focused on public events and events concerning celebrities
positives. This led us to refine the model with more lexical
                                                                    both from news articles and social media whereas personal
features and features such as sentiment. We considered fea-
                                                                    life events are mostly overlooked. We started with 5 life
tures (described in sec. 4) such as co-occurring terms (e.g.
                                                                    6
prayers, hospital for surgery), POS tagging, presence of so-            http://nlp.stanford.edu/software/tagger.shtml
                                                                     of the 18th ACM Conference on Information and
Table 5: Precision, Recall and F-measure for                         Knowledge Management, CIKM ’09, pages 523–532,
(UNI+META+INT) Model (SVM).                                          New York, NY, USA, 2009. ACM.
                                                                 [4] B. D. Eugenio, N. Green, and R. Subba. Detecting
       Event         Precision   Recall   F-Measure
       Graduation    0.85        0.83     0.839                      Life Events in Feeds from Twitter. pages 274–277.
       Marriage      0.79        0.83     0.809                      Ieee, 2013.
       New Job       0.82        0.91     0.862                  [5] C. S. Firan, M. Georgescu, W. Nejdl, and R. Paiu.
       New Born      0.64        0.92     0.754                      Bringing order to your photos: Event-driven
                                                                     classification of flickr images based on social
       Surgery       0.78        0.87     0.822
                                                                     knowledge. In Proceedings of the 19th ACM
                                                                     International Conference on Information and
                                                                     Knowledge Management, CIKM ’10, pages 189–198,
                                                                     New York, NY, USA, 2010. ACM.
                                                                 [6] J. Gl§ck and S. Bluck. Looking back across the life
                                                                     span: A life story account of the reminiscence bump.
                                                                     Springer, 2007.
                                                                 [7] Q. He, K. Chang, and E.-P. Lim. Analyzing feature
                                                                     trajectories for event detection. In Proceedings of the
                                                                     30th Annual International ACM SIGIR Conference on
                                                                     Research and Development in Information Retrieval,
                                                                     SIGIR ’07, pages 207–214, New York, NY, USA, 2007.
                                                                     ACM.
                                                                 [8] A. Jackoway, H. Samet, and J. Sankaranarayanan.
                                                                     Identification of live news events using Twitter. In
Figure 2: A comparative performance of four differ-
                                                                     book1, page 1, New York, New York, USA, 2011. ACM
ent models.
                                                                     Press.
                                                                 [9] A. Java, X. Song, T. Finin, and B. Tseng. Why we
events and trained 5 different binary classifiers based on           twitter: Understanding microblogging usage and
bag-of-word features which gave 55 to 80% precision on a             communities. In Proceedings of the 9th WebKDD and
test dataset with an average AUC of 77%. The learning                1st SNA-KDD 2007 Workshop on Web Mining and
models were further streamlined with meta features such as           Social Network Analysis, WebKDD/SNA-KDD ’07,
sentiment, temporal, social relation terms, emoticons and            pages 56–65, New York, NY, USA, 2007. ACM.
punctuations features, which improved the classification per-   [10] S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, and
formance by 4-5%, however addition of interaction feature            A. Vakali. Cluster-based landmark and event detection
in the third classifier did not yield substantial improvement        for tagged photo collections. In book1, volume 18,
contrary to the expectation. This final result is a stronger         pages 52–63, Los Alamitos, CA, USA, Jan. 2011.
motivation for an in-depth analysis of these features in our         IEEE Computer Society Press.
future work. We also aimed to adopt an unsupervised ap-         [11] S. Phuvipadawat and T. Murata. Breaking news
proach to detect life events as there may be many more un-           detection and tracking in twitter. In Proceedings of the
expected events happening in one’s life bearing substantial          2010 IEEE/WIC/ACM International Conference on
influence in life and eligible to be included .                      Web Intelligence and Intelligent Agent Technology -
                                                                     Volume 03, WI-IAT ’10, pages 120–123, Washington,
7.   ACKNOWLEDGMENT                                                  DC, USA, 2010. IEEE Computer Society.
This work was supported by EPSRC project ReelLives              [12] T. Sakaki. Earthquake shakes twitter users :
(EP/L004062/1).                                                      Real-time event detection by social sensors. In
                                                                     Proceedings of the 19th International Conference on
8.   REFERENCES                                                      World Wide Web, 2009.
 [1] P. Agarwal, R. Vaithiyanathan, S. Sharma, and              [13] M. Thelwall, K. Buckley, G. Paltoglou, and D. Cai.
     G. Shroff. Catching the Long-Tail : Extracting Local            Sentiment strength detection in short informal text,
     News Events from Twitter. In book1, pages 379–382,              2010.
     2012.                                                      [14] C. L. Wayne. Topic detection tracking (tdt). In In
 [2] E. Benson, A. Haghighi, and R. Barzilay. Event                  Proceedings DARPA Broadcast News Transcription
     Discovery in Social Media Feeds. In book1, volume 3,            and Understanding Workshop, page 98, 1998.
     pages 389–398. Association for Computational               [15] J. Weng, Y. Yao, E. Leonardi, F. Lee, and B.-s. Lee.
     Linguistics, 2011.                                              Event detection in twitter. In book1, pages 401–408.
 [3] L. Chen and A. Roy. Event detection from flickr data            Ieee, 2011.
     through wavelet-based spatial analysis. In Proceedings