StreamGrid: Summarization of Large Scale Events using
       Topic Modelling and Temporal Analysis

                  Emmanouil Schinas                                               Symeon Papadopoulos
  1
      Dept. of Electrical & Computer Engineering                            Information Technologies Institute
         Aristotle University of Thessaloniki                            Centre for Research & Technology Hellas
         2
           Information Technologies Institute                                      Thessaloniki, Greece
       Centre for Research & Technology Hellas                                       papadop@iti.gr
                   manosetro@iti.gr

            Yiannis Kompatsiaris                                                 Pericles A. Mitkas
                                                                 1
     Information Technologies Institute                              Dept. of Electrical & Computer Engineering
  Centre for Research & Technology Hellas                               Aristotle University of Thessaloniki
                                                                        2
            Thessaloniki, Greece                                          Information Technologies Institute
                ikom@iti.gr                                           Centre for Research & Technology Hellas
                                                                                 mitkas@eng.auth.gr


                                                                     1    Introduction
                        Abstract                                     Due to their increasing popularity, micro-blogging
                                                                     platforms, and especially Twitter, have evolved into a
      Due to the increasing popularity of microblog-                 powerful means for getting connected with real world
      ging platforms, the amount of messages re-                     events. In large scale public events, ranging from sport
      lated to large scale public events reach impres-               events, such as football matches, to political events
      sive levels. Although such messages can be                     and festivals, the users that are somehow involved in
      quite informative regarding different aspects                  the event use social media to share their experiences
      of the main event, there is a lot of spam and re-              and express their opinions. In many cases, these mes-
      dundancy that makes it challenging to extract                  sages are quite informative and provide real-time cov-
      insights regarding the event of interest. In this              erage of the ongoing event and may be correlated with
      work we describe a summarization framework                     important variables related to the event, e.g. film rat-
      that captures the important moments of an                      ings [13]. Thus, not surprisingly, the amount of event-
      event by using a combination of topic mod-                     related messages has reached impressive levels [1].
      elling and bursty activity detection. We pro-                      However, a significant percentage of micro-blogging
      pose a data structure named StreamGrid, that                   messages can be considered as non-informative or
      maintains the information of active topics in                  spam. This fact combined with the huge number
      regular time intervals at several scales. This                 of messages, makes it very challenging for interested
      structure is used for the creation of concise                  stakeholders, such as event organizers and enthusiasts,
      summaries for any time interval. Finally, the                  to monitor the evolution of the event and understand
      evaluation on a large Twitter dataset around                   its important moments. In case of long-running events,
      the Sundance Film Festival demonstrates the                    this becomes even more difficult due to the existence of
      potential of the proposed framework.                           numerous sub-events occurring within the main event.
                                                                     Such sub-events have different durations and impact
Copyright c by the paper’s authors. Copying permitted only
for private and academic purposes.
                                                                     on the main event. In addition, a large portion of the
In: S. Papadopoulos, P. Cesar, D. A. Shamma, A. Kelliher, R.
                                                                     messages contain conversations about other entities of
Jain (eds.): Proceedings of the SoMuS ICMR 2014 Workshop,            interest associated with the event. In other words, an
Glasgow, Scotland, 01-04-2014, published at http://ceur-ws.org       event-related stream of messages is quite diverse and
noisy, with different associated topics, conversations         of information.
among users, and spam messages. Thus, there is a                   Shen et al. [16] present a participant-based ap-
profound need for event-based summarization methods            proach for event summarization. A mixture model is
that can produce concise multi-document summaries              proposed to detect sub-events at participant level, and
for any time interval of the event, covering its main          the tf · idf centroid approach is used to create a sum-
aspects.                                                       mary of each sub-event. Similarly, Chakrabarti and
   The framework we propose in this work aims to cre-          Punera [4] propose the use of a Hidden Markov Model
ate topic-based summaries of large-scale events for ar-        to obtain a time-based segmentation of the stream that
bitrary time durations by applying post-analysis on            captures the underlying sub-events. Alonso and Shiells
the stream of event related messages. First, we ap-            [2] create timelines for football games, annotated with
ply LDA topic modelling to discover the underlying             the key aspects of the event. Dork et al. [5] propose
aspects of the event. To support summarization, we             an interface for large scale events that employs several
create a 2D-array structure named StreamGrid. This             visualizations for interactive presentation of the event.
maintains the information of each topic at each time               A different problem is tackled by Wang et al. [19].
interval. To create the grid we assign messages to the         Unlike other methods, that method aims to create a
detected topics and divide topic-associated messages           storyline from a set of event-related objects. A multi-
using regular time intervals. Next, we create timelines        view graph of objects is constructed, where the two
for the set of topics and use them to detect the set           type of edges capture the contextual similarity and
of active topics at each time interval by finding the          the temporal proximity among objects. Then a time-
bursty activity periods in them. A greedy algorithm            ordered sequence of important objects is obtained via
is used to obtain a set of representative messages that        graph optimization. Lin et al. [7] extends the previous
maximize the coverage of the event by selecting the            work to generate storylines from a set of micro-blog
maximum possible number of active topics and min-              messages for arbitrary queries. To achieve this, they
imize redundancy across messages at the same time.             use query expansion techniques to retrieve the query-
Finally, to demonstrate the potential of the proposed          related messages and then apply the same method as
framework, we perform an experimental evaluation on            [19] to create the storyline.
a real-world dataset consisting of tweets around the               Another approach for summarizing evolving tweet
Sundance Film Festival 2013.                                   streams is proposed by the Sumblr framework [17].
   The paper is organized as follows. Section 2 con-           This relies on an online clustering algorithm for tweets
tains a brief survey of related methods and applica-           and on maintaining distilled statistics of the clusters at
tions. Section 3 describes in detail the proposed frame-       specific time snapshots using a structure, named Pyra-
work. Section 4 presents an experimental case study            midal Time Frame. Then, a summarization technique
on the Sundance 2013 dataset. We conclude the paper            is employed for generating summaries of arbitrary time
and describe future work in Section 5.                         durations based on the LexRank method [6].


2    Related Work                                              3     Proposed Method
                                                               An overview of the proposed method is illustrated in
A substantial body of work exists in literature on
                                                               Figure 1. The proposed framework processes a stream
the problem of micro-blogging summarization. A no-
                                                               of online messages around an event and extracts infor-
table method for multi-document summarization re-
                                                               mative summaries for any requested time duration. In
lies on the computation of centroids based on content.
                                                               other words, the proposed framework identifies a set
Namely, the summary of a set of documents, repre-
                                                               of topics and then selects related messages based on
sented as tf · idf vectors, consists of those documents
                                                               their importance.
that are closest to the centroid of the set [12]. Sharifi et
al. [15] propose a method for the generation of a single
                                                               3.1   Topic Modelling
sentence from a set of tweets, by using a graph-based
technique. Nichols et al. [11] describe an algorithm           Topic modelling is based on the assumption that each
that generates a summary of sports events. They use            document can be described as a random mixture of
a peak detection algorithm to detect important mo-             topics and each topic as a multinomial distribution
ments and then apply the method of [15] to extract             over terms. In our approach we employ topic mod-
summary sentences from the tweets around these mo-             elling by using the well known Latent Dirichlet Allo-
ments. The work of [8] uses linear-programming opti-           cation model [3] across the whole stream of messages.
mization to select summary sentences from tweets re-           This process is applied after the end of the event, when
lated to trending topics. Notably, they also make use          all the messages are available. However, topic mod-
of linked Web content to extend the original sources           elling in micro-blog messages is problematic due to the
                                                                                             P
                                                                                                  −logP (d|θ, φ, G)
                                                                                            d∈D
                                                                perplexity(Dtest ) = exp             P                (2)
                                                                                                        Ld
                                                                                                      d∈D


                                                                For the similarity between two topics, we calculate
                                                             the Jaccard coefficient on the sets of top N terms of
                                                             each topic.

        Figure 1: The StreamGrid framework                   3.2   StreamGrid Creation

short length of their text. To overcome this, a lot of       After the detection of topics we have to associate mes-
approaches have been proposed. To avoid changes on           sages with topics. We use the LDA model, estimated
standard LDA, a relative simple solution is message          from the merged documents, to infer the probabili-
pooling, in which messages are pooled together to form       ties of each message over the set of topics. We assign
larger documents. We experimented with four meth-            each message to the topic with the highest probabil-
ods of message pooling in a similar way as [10]. First,      ity under the condition that this probability exceeds
we tried to merge messages using constant length time        a predefined threshold. Although thresholding in this
bins. Then, we merged messages of the same author to         step leaves some messages unassigned, this is a de-
form a single document. As a third option, we pooled         sirable feature of the procedure as most of the unas-
messages together based on their hashtags. Messages          signed messages are of low quality. In other words
with multiple hastags assigned to multiple documents         these mesages can be considered as spam messages
and messages without any hashtag were assigned to            that cannot contribute any valuable information in the
the document with the highest textual similarity. As         summary. Next, assignments are used for the creation
a fourth option, we used a 1NN clustering algorithm to       of a data structure named StreamGrid. The first di-
cluster messages with high textual similarity. Each of       mension of this grid comprises the detected topics and
those clusters formed a single document for the LDA          the second corresponds to time, divided into regular
method. In addition, for all of the pooling methods          time intervals. Each cell c(i, j) of StreamGrid con-
we filtered out messages having only one term and re-        tains the set of messages Mij associated with topici ,
moved standard stopwords to discard the non infor-           at time interval j. Each message m is represented as a
mative terms.                                                tf · idf vector. The idf components are pre-computed
                                                             over the whole set of messages. The tf part is the
   Another drawback of LDA is that the number of             frequency of a term in the message normalized by the
topics must be defined; obviously, the number of top-        maximum frequency. Due to the short length of the
ics in not known a priori in the context of large events.    documents in micro-blogging platforms, this compo-
To determine the optimal number of topics for a given        nent often equals to one. Using the set of associated
set of documents D we calculate two metrics, perplex-        messages in each cell, we calculate a merged tf · idf
ity and average similarity across topics for different       vector vij . In addition, we calculate a weight for each
number of topics and choose a value that minimizes           message and rank them according to it. The weight of
both metrics. For the calculation of perplexity we slit      a message m, associated with topici , in a specific time
D into training and test documents, we estimate LDA          window j is defined as the sum of the weights of the
over a range of possible numbers of topics using Dtrain      terms contained in m. To calculate the weight of each
and calculate the total perplexity of the documents in       term t, we use the following tf · idf scheme:
the test dataset Dtest [18]. The perplexity of a docu-
ment d given a trained model is defined as follows:
                                                                            W (t, i, j) = tfij (t) · idf (t)          (3)

                               −logP (d|θ, φ, G)                                           X
         perplexity(d) = exp                          (1)                    W (m, j) =          W (t, i, j)          (4)
                                     Ld
                                                                                           t∈m

where Ld is the number of terms in document d, θ is          where tfij (t) is the frequency of term t ∈ vij into the
the document-specific topic distribution, φ is the word      cell c(i, j) of StreamGrid, and idf (t) is the inverse doc-
distribution for topics, and G is the set of topics in the   ument frequency over the whole corpus, W (t, i, j) is
trained model. The total perplexity over dataset Dtest       the weight of term t in c(i, j), and W (m, j) the weight
is defined as                                                of message m in time interval j.
    To detect the time intervals that a specific topici of    use an adapted version of the greedy algorithm used in
StreamGrid is active, we create a topic timeline by us-       [17]. The algorithm selects messages that are associ-
ing time intervals as bins, and counting the associated       ated with different topics and that simultaneously have
messages of topici in bin j. Then, we apply the peak          low degree of textual similarity between each other.
detection algorithm used in [9] to detect time frames in      The selection process is detailed by Algorithm 1. For
the timeline that exhibit bursty behaviour. The algo-         an arbitrary time frame F = [a, b], we first find the
rithm identifies windows with high activity by finding        sequence of time intervals in StreamGrid that covers
significant increases in the timeline, compared to the        F. Then we get the set of active topics. A topici is
historical mean value of activity. The time windows           active in F if any cell c(i, j) contained in F is active.
reported by the algorithm are used to set the active          Also, the significance score of an active topic in F is
topics of each time interval. For example, if for a spe-      defined as the maximum significance score across all
cific topic i, the algorithm identifies a time window         time intervals in F. The weight W (t, i, F) of a term
[a, b] with high activity, then we define all the time in-    t for topici in F is defined as the sum of the weights
tervals a ≤ j ≤ b as active moments of topici . After         in each cell c(i, j) ∈ F. In a similar way, we define
this step, the cells of StreamGrid, have a flag that in-      the weight W (m, F) of message m over F. Note that
dicates whether a specific cell is active or not. We use      although a message belongs to a specific time interval,
this flag to select a summary subset of messages, as          we use the term weights across the whole time frame
described in the next paragraph. Also for each active         to calculate the weight of m.
topici in a specific time interval j, we calculate a score
that captures its significance over the rest of the active    Algorithm 1 Topic-Time summarization
topics A in the same time interval.                           Input: StreamGrid, a time frame F, length of sum-
                                                              mary L
                                                              Output: a summary set S
                                           |M |
       Signif icance(topici , j) =        P ij          (5)    1: S = ∅
                                              |Mkj |           2: A = {set
                                     topick ∈A                           of active topics in F}   
                                                               3: Mc =    m|argmaxW (m, i, F), ∀i ∈ A
   In addition, to have an overall estimation of the                            m
importance of each topic throughout the event, we              4: while |S| < L or Mc 6= ∅ do
calculate two measures for each topic using a simi-            5:   for each message m in Mc       do
lar approach as [14]. More specifically we define the          6:        calculate score(m) according to Equation 8
peakiness of a topic as:                                       7:    end for
                                                               8:    Select mmax = argmax[score(m)]
                                 max|Mij |                                              mi
            peakiness(topici ) = P                      (6)    9:     S = S ∪ {mmax }
                                   |Mij |
                                     ∀j                       10:     Mc = Mc − {mmax }
                                                              11: end while
and its persistence as                                        12: if |S| < L then
                                              |Mij |          13:     M = ∪Mij , ∀i ∈ A, j ∈ F
                                     avg     P
                                               |Mij |         14:     M0 = M − S
                                  tpeak <j
         persistence(topici ) =               |Mij |
                                                        (7)   15:     while |S| < L do
                                     avg     P
                                               |Mij |         16:        for each message m in M 0 do
                                  j<tpeak
                                                              17:            calculate score(m) according to Equa-
where tpeak is the time that the maximum peak of the              tion 8
timeline occurs.                                              18:        end for
                                                              19:        Select mmax = argmax[score(m)]
                                                                                             mi
3.3   Topic-Time Summarization
                                                              20:        S = S ∪ {mmax }
Our goal is to use the StreamGrid to summarize the            21:        M 0 = M 0 − {mmax }
event for an arbitrary time frame. As summary we              22:    end while
denote a set of representative messages that mention          23: end if
the key aspects of the selected time period. Assuming
that topics can capture these aspects, we use the ac-             To produce a summary S of length L, the algorithm
tive topics for that period to create a summary that          first gets the set of active topics as described above.
meets the following criteria: a) as many aspects as           Then, it collects the messages Mc with the highest
possible are covered and b) redundancy due to near            weight W (m, F) in each active topic (line 3). Through
duplicate messages is minimized. To achieve this, we          the lines 4-11, the algorithm, following a greedy ap-
proach, selects the messages that maximize the score        number of topics creates topics with very few associ-
of Equation 8. This consists of two parts weighted          ated messages. We found that for K > 200 there is
by a parameter a. The first part, measures the impor-       a substantial proportion of topics that have no asso-
tance of the message, while the second the redundancy       ciated message. Taking into account these facts, we
compared to the set of already selected messages. The       set K = 200 for the rest of the evaluation. Regarding
importance of a message m ∈ topici is a combination         the pooling scheme, merging tweets having the same
of two factors: a) the significance of the topic it be-     hashtags into single documents gave us the best per-
longs to, at this time frame, and b) the contribution       formance with respect to perplexity and average topic
of its textual content. To measure the redundancy of        similarity.
a message, we compute its average cosine similarity to
the already selected messages. If the summary length
is not reached, we perform the same selection process
on the set of tweets that belong to the active topics
(Lines 12-23).


score(m) = a∗Importance(m)−(1−a)∗Redundancy(m)
                                          (8)


 Importance(m) = Signif icance(i, F) ∗ W (m, F) (9)


    Redundance(m, S) = avg Similarity(m, m0 )       (10)
                         m0 ∈S


4     Experiments
4.1    Dataset and event description
We conducted an evaluation of the proposed method
on a dataset around the Sundance 2013 Film Festi-
val that took place between January 15th and 30th,
2013. We used the Streaming API of Twitter to ac-
quire tweets containing terms related to Sundance and
                                                            Figure 2: Perplexity and Average Similarity between
posted during the event. More precisely, we collected
                                                            topics for different number of topics K
all tweets containing the hashtags, #sundance, #sun-
dance2013 and #sundancefest, and all the tweets that
mentioned the official account of Sundance Film Fes-        4.3   StreamGrid Construction
tival (@sundancefest). This resulted in a dataset of        The first part of Table 2 contains the top five topics
201,752 tweets. Among them, 100,046 were original           with respect to the peakiness and the second one the
tweets, while the rest of them were retweets. Although      topics with the highest persistence ratio. Examining
using three hashtags and one mentioned account cov-         the set of persistent topics we conclude that they can
ers only a subset of all possible tweets about the event,   be divided into two main categories: The first com-
we consider this subset sufficiently representative as      prises the truly persistent topics that are regularly
the vast majority of Twitter’s users tend to adopt the      discussed during the event, while the second category
official hashtags provided by organizers during events.     is made up of multiplexed topics that LDA failed to
                                                            split further. This is due to the fact that some top-
4.2    Topic detection
                                                            ics are conceptually different but share a similar set of
Figure 2 shows the perplexity and average similarity        related terms. This obviously affects summarization
for different numbers of topics K. Although there is        performance, as for each topic we select only the top
significant variance for the different values of K, the     weighted message. Thus, if the topic contains more
main trend for perplexity is to decrease as K increases.    than one concepts then the summarization algorithm
As we can see from Figure 2, the average similarity be-     selects only one concept and ignores the rest.
tween all pairs of topics appears to stabilize for values      Figure 3 depicts the timelines of the same two sets
of K larger than 100 topics. However, having a large        of topics respectively. It becomes obvious that peaky
                                                            Figure 4: StreamGrid: Each cell of StreamGrid corre-
                                                            sponds to a specific time interval and topic
                                                                potentially interesting events that gathered less
                                                                attention tend to be missed.
                                                              • tf · idf Summarizer: We use the tf · idf weighting
Figure 3: Timelines of the top five peaky and persis-           scheme described in the previous section to get
tent topics                                                     the L highest weighted tweets.

topics are highly localized, while persistent topics sus-     • Cluster-based Summarizer: Instead of active top-
tain for the whole duration of the event. To provide a          ics, we divide the tweets of the time interval into L
visual representation of the StreamGrid structure over          clusters using k-means clustering. For each cluster
the whole duration of the event, we represent it as a           produced this way, we pick the highest weighted
heat map (Figure 4). The coloured cells in the grid             tweet using the tf · idf scheme.
represent the time intervals, in which the correspond-
                                                              • LexRank Summarizer: We create a graph where
ing topics are active, and the color of the cell gives
                                                                nodes represent tweets and the weights of edges
the significance of each active topic at this point. As
                                                                between nodes represent their pairwise cosine sim-
shown in Figure 4, StreamGrid appears to be sparse,
                                                                ilarity. The total weight of a tweet is the sum of
as only a few cells in it contain active topics. How-
                                                                the weights of the adjacent edges. The summary
ever, one can also observe several topics (rows) that
                                                                consists of the L highest weighted tweets in the
exhibit consistent activity over the whole duration of
                                                                graph.
the event.

4.4   Summarization                                         Table 1: Details of five time intervals with the highest
Baselines: To evaluate the summaries produced with          activity during Sundance Film Festival 2013
StreamGrid, we used five baseline methods. Given an                 Start                 End            #Tweets
arbitrary time interval, we first get the set of messages    Thu Jan 17 23:00       Fri Jan 18 00:00        1545
posted during this interval and then we apply the fol-        Sat Jan 19 19:00      Sat Jan 19 20:00        1477
lowing baselines to produce a summary of constant            Mon Jan 21 19:00 Mon Jan 21 20:00              1247
length L.                                                     Sun Jan 27 03:00      Sun Jan 27 08:00        3735
                                                             Wed Jan 23 18:00 Wed Jan 23 21:00              1910
  • Random Summarizer: For the set of tweets we
    choose randomly a subset of L tweets.                      Finally, we compare the results of the StreamGrid
                                                            Summarizer to the ones of the baseline methods for five
  • Popularity Summarizer: We select the L most             time intervals that are connected with high activity
    retweeted messages to form a summary. This              during the main event. We detect these intervals by
    favours the tweets that have attracted the atten-       applying the peak detection algorithm of the previous
    tion of the audience. However, niche topics and         section to the timeline of the whole dataset. We rank
                                                     the detected bursts according to the rate of tweets and
                                                     use the top five of them. The details of these intervals
                                                     are provided in Table 1.
                                                        Table 3 contains summaries consisting of five tweets
                                                     using StreamGrid and three of the baselines for the
                                                     time period around the Awards Ceremony of Sundance
                                                     Film Festival. Unsurprisingly, this is the time period
                                                     with the highest peak during the event. During this
                                                     period what may be reasonably considered as impor-
                                                     tant pertains to the films that won awards. Such
                                                     messages are usually posted by authoritative users
                                                     and become highly retweeted. For this reason, sum-
                                                     maries based on the number of retweets cover quite
                                                     effectively the winning films. However, in other cases
                                                     choosing very popular tweets does not lead to informa-
                                                     tive summaries. For example in the third time inter-
Figure 5: StreamGrid-based Multimedia Summary
                                                     val, the summary consists of tweets like “So freaking
during awards ceremony (4th row in Table 1)
                                                     cool. #sundance http://t.co/C7a8rSaw” and “#Sun-
                                                     dance day 4- leavin for Vegas now. Bye for now
                                                     http://t.co/C2aRZnEC”. These tweets were retweeted
                                                     a lot, but may be considered as non-informative for
                                                     the event. On the other hand, StreamGrid-based sum-
                                                     maries for the Awards Ceremony contain tweets about
                                                     winning films, even though these messages are not
                                                     very popular. That is an indication that StreamGrid
                                                     may detect an important topic even in cases that this
                                                     does not attract attention from many users. Regard-
                                                     ing the Cluster-based Summarization, an interesting
                                                     feature is that avoidance of redundancy is inherent in
                                                     the method, as similar messages are clustered together,
                                                     and only the most weighted of them are selected for
                                                     the summary. However, the weakness of the method
                                                     is that not all clusters represent important aspects of
                                                     the event.
Figure 6: Multimedia Summary using most retweeted       Another indication of how topic modelling can im-
images during awards ceremony (4th row in Table 1)   prove summarization is the fact that StreamGrid, com-
                                                     pared to the other baselines, tends to include tweets
                                                     that mention films. The reason that this happens is
                                                     that most of the topics detected by LDA are about
                                                     films, so when the proposed summarization algorithm
                                                     selects a set of tweets from the pool of active moments,
                                                     this leads to the selection of film-associated tweets. We
                                                     expect that, for other types of events, it will naturally
                                                     generalize to other pertinent entities of interest that
                                                     occur frequently, thus leading to the creation of top-
                                                     ics. A noticeable disadvantage of baselines such as
                                                     tf · idf and LexRank is the remarkable existence of
                                                     redundancy. For example in case of LexRank four out
                                                     of five tweets are related to the ’Fruitvale’ film. This
                                                     indicates that redundancy minimization is a necessary
                                                     component of any summarization approach.
                                                        Finally, to evaluate how well the proposed method
Figure 7: Multimedia Summary using LexRank during
                                                     can create visual summaries, we apply it on the subset
awards ceremony (4th row in Table 1)
                                                     of tweets with embedded pictures. These tweets that
                                                     comprise about 10% of the dataset create a consider-
                               Table 2: Examples of peaky and persistent topics
                                               Peaky Topics
              Topic    Representative Terms                              Peakiness      #tweets
               135     paris, hilton, Blackfish, cnn, films                0.358          695
               133     death, drink, countryman, sundance, charlie         0.247          588
                11     lovelace, amanda, seyfried, portraits, premiere     0.161         1293
                50     defeat, inevitable, pete, mister, film              0.143          267
                29     butch, dynamite, android, worth, apps               0.123          323
                                                 Persistent Topics
              Topic    Representative Terms                              Persistence    #tweets
                63     hemingway, running, follow, crazy , marshall         3.963        2494
                75     jehane, square, girlrising, premiere, screening      2.650         500
               108     vhs, sequel , horror, review , time                  2.318         469
                45     afar, week, enjoy, ways, kicked                      1.612         127
                17     lindsay, lohan, canyons, blame, snubbed              1.557         343

ably sparser StreamGrid as the bursty periods in this     proach with competing summarization algorithms in a
subset are much fewer. An example of a multimedia         more systematic way, over more events and with the
summary using StreamGrid for the Awards Ceremony          help of independent evaluators, with the goal of better
is shown in Figure 5. Comparing the StreamGrid-           capturing the subjective quality aspects of summariza-
based multimedia summaries with the ones produced         tion. Taking into account the large number of topic
by the popular images (6), we observe that Stream-        modelling techniques that appeared in literature over
Grid does not perform noticeably better in this task.     the last years, we plan to investigate how the under-
This can be explained by the fact that tweets with em-    lying model affects the summarization process. Fur-
bedded media have text of very low length and infor-      thermore, we intend to create a real-time version of
mativeness, which leads LDA to inferior performance       StreamGrid, which could be used to get summaries of
with respect to the creation of representative topics     evolving and continuous streams of messages. To this
and the assignment of messages to them. Regarding         end, we plan to employ more advanced topic modelling
the redundancy in multimedia summaries, we found          methods that can detect topic drift and unseen topics
that using cosine similarity on the text of images as     on new incoming messages. Finally, we will investi-
a metric of similarity between them is not appropri-      gate methods to integrate popularity and user author-
ate to minimize redundancy. This can be seen in the       ity into the summarization process.
LexRank-based summary in Figure 7. To this end, a            Acknowledgements: This work is supported by
combination of visual and textual features is foreseen    the SocialSensor FP7 project, partially funded by the
as a more suitable means for discarding similar images.   EC under contract number 287975.

5   Conclusion and future work
In this work, we proposed a framework for the summa-      References
rization of micro-blogging messages during large scale     [1] Celebrating      #SB48    on       Twitter.
events. The framework makes use of topic modelling             https://blog.twitter.com/2014/
to detect the underlying aspects of an event to the set        celebrating-sb48-on-twitter, 2014. [Online;
of related messages. Then, for each topic it derives           accessed 27-Feb-2014].
its temporal representation by associating messages to
the discovered topics. Subsequently, a burst detec-        [2] O. Alonso and K. Shiells. Timelines as summaries
tion algorithm is used to find the important intervals         of popular scheduled events. In Proceedings of
for each topic. Finally, a greedy summarization algo-          the 22nd international conference on World Wide
rithm generates summaries for arbitrary time intervals         Web companion, pages 1037–1044. International
using the set of active topics for the same time dura-         World Wide Web Conferences Steering Commit-
tion. The results of experiments in a Twitter dataset          tee, 2013.
around the Sundance Film Festival appear promising,
demonstrating the potential of topic modelling on the      [3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent
multi-document summarization problem.                          dirichlet allocation. J. Mach. Learn. Res., 3:993–
   For future work, we first plan to compare our ap-           1022, Mar. 2003.
                   Table 3: Summaries during awards ceremony (4th line in Table 1)
Method       Examples
             Profound comment from @JoKiefer : Looper storyline echoes war on terror. Kill the terrorist
             before he becomes one? #Sundance13 #dirtywars
             #Sundance Institute Mahindra Global Filmmaking Award winnners include UK co-prodcution:
             Eva Weber: Let the Northern Lights Erase your Name
             #Sundance Institute Mahindra Global Filmmaking Award winnners include UK co-prodcution:
tf · idf
             Eva Weber: Let the Northern Lights Erase your Name
             #PussyRiot - A Punk Prayer takes home a World Cinema Doc Special Jury Award, directors Mike
             Lerner & Maxim Pozdorovkin: http://t.co/VmOP3tmg
             Gideon’s Army was really eye-opening. I had no idea how many brave men and women have died
             trying to put Bibles in hotel rooms. #Sundance
             Yes! Audience Award U.S. Dramatic to ’Fruitvale’ starring Wallace, aka Michael B. Jordan. Predict
             Grand Jury Prize, too #sundance
             Fruitvale wins audience award for U.S. dramatic; looks like it cld well be a sweep, w Grand Jury
             prize too. #Sundance
             @vulture: Yes! Audience Award U.S. Dramatic to ’Fruitvale’ starring Wallace, aka Michael B.
LexRank
             Jordan. Predict Grand Jury Prize, too #sundance
             @vulture: Start the Oscar watch now. ’Blood Brother’ wins both Grand Jury and Audience Award
             for U.S. Documentary #sundance it’s coming
             FRUITVALE wins the #Sundance Grand Jury Prize AND the Audience Award. Could not be
             happier. Congrats @fruitvalemovie and Ryan Coogler!
             #PussyRiot - A Punk Prayer takes home a World Cinema Doc Special Jury Award, directors Mike
             Lerner Maxim Pozdorovkin: http://t.co/VmOP3tmg
             Sebastian Silva’s ”Crystal Fairy” Wins #Sundance World Cinema Dramatic Directing Award:
             http://t.co/sKNn1Dqf
             Ryan Coogler’s ”Fruitvale” Wins #Sundance U.S. Dramatic Audience Award, presented by @Acu-
Popularity
             raInsider: http://t.co/Lknocyos
             JUST IN: Pinoy film ”Metro Manila” wins the World Cinema Dramatic Audience award at the
             Sundance Film Festival — via @goldenglobes
             ”The Spectacular Now” Wins #Sundance U. S. Dramatic Special Jury Award for actors Miles
             Teller Shailene Woodley: http://t.co/1Ouz2B7a
             Sebastian Silva’s recorded speech: singing Hava Nagila while warping his face in Photo Booth.
             Word. #Sundance
             My pics of ”The Spectacular Now” #Sundance Q&A , winner Special Jury award for acting to
             Miles Teller & Shailene Woodley http://t.co/G7r5QK1p
             #sundance Awards ... US Documentary ... Best Cinematography goes to Dirty Wars
StreamGrid
             You know it would be hilarious if AUSTENLAND won the big #Sundance prize. Admit it, you’d
             soak that up.
             YES! ”@indiewire: Ryan Coogler wins the audience award for FRUITVALE. #sundance
             http://t.co/NiVIzcTU”
             Grand Jury Prizes #Sundance: ”Fruitvale” (dramatic) & ”Blood Brother” (doc) FilmLinc list of
             winners: http://t.co/iyoeHuGz
 [4] D. Chakrabarti and K. Punera. Event summa-                uments. Inf. Process. Manage., 40(6):919–938,
     rization using tweets. In ICWSM, 2011.                    Nov. 2004.
 [5] M. Dork, D. Gruen, C. Williamson, and                 [13] E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kom-
     S. Carpendale. A visual backchannel for large-             patsiaris, Y. Mass, J. Herzig, and L. Boudakidis.
     scale events. Visualization and Computer Graph-            Eventsense: Capturing the pulse of large-scale
     ics, IEEE Transactions on, 16(6):1129–1138,                events by mining social media streams. In Pro-
     2010.                                                      ceedings of the 17th Panhellenic Conference on
                                                                Informatics, PCI ’13, pages 17–24, New York,
 [6] G. Erkan and D. R. Radev. Lexrank: Graph-                  NY, USA, 2013. ACM.
     based lexical centrality as salience in text summa-
     rization. J. Artif. Int. Res., 22(1):457–479, Dec.    [14] D. A. Shamma, L. Kennedy, and E. F. Churchill.
     2004.                                                      Peaks and persistence: Modeling the shape of mi-
                                                                croblog conversations. In Proceedings of the ACM
 [7] C. Lin, C. Lin, J. Li, D. Wang, Y. Chen, and T. Li.        2011 Conference on Computer Supported Cooper-
     Generating event storylines from microblogs. In            ative Work, CSCW ’11, pages 355–358, New York,
     Proceedings of the 21st ACM International Con-             NY, USA, 2011. ACM.
     ference on Information and Knowledge Manage-
     ment, CIKM ’12, pages 175–184, New York, NY,          [15] B. Sharifi, M.-A. Hutton, and J. Kalita. Summa-
     USA, 2012. ACM.                                            rizing microblogs automatically. In Human Lan-
                                                                guage Technologies: The 2010 Annual Conference
 [8] F. Liu, Y. Liu, and F. Weng. Why is ”sxsw”                 of the North American Chapter of the Association
     trending? exploring multiple text sources for              for Computational Linguistics, HLT ’10, pages
     twitter topic summarization. In Proceedings of             685–688, Stroudsburg, PA, USA, 2010. Associa-
     the ACL Workshop on Language in Social Media               tion for Computational Linguistics.
     (LSM), pages 66–75, 2011.
                                                           [16] C. Shen, F. Liu, F. Weng, and T. Li. A
 [9] A. Marcus, M. S. Bernstein, O. Badar, D. R.                participant-based approach for event summariza-
     Karger, S. Madden, and R. C. Miller. Twitinfo:             tion using twitter streams. In Proceedings of
     Aggregating and visualizing microblogs for event           NAACL-HLT, pages 1152–1162, 2013.
     exploration. In Proceedings of the SIGCHI Con-
     ference on Human Factors in Computing Systems,        [17] L. Shou, Z. Wang, K. Chen, and G. Chen. Sum-
     CHI ’11, pages 227–236, New York, NY, USA,                 blr: Continuous summarization of evolving tweet
     2011. ACM.                                                 streams. In Proceedings of the 36th International
                                                                ACM SIGIR Conference on Research and De-
[10] R. Mehrotra, S. Sanner, W. Buntine, and L. Xie.            velopment in Information Retrieval, SIGIR ’13,
     Improving lda topic models for microblogs via              pages 533–542, New York, NY, USA, 2013. ACM.
     tweet pooling and automatic labeling. In Pro-
     ceedings of the 36th International ACM SIGIR          [18] H. M. Wallach, I. Murray, R. Salakhutdinov, and
     Conference on Research and Development in In-              D. Mimno. Evaluation methods for topic models.
     formation Retrieval, SIGIR ’13, pages 889–892,             In L. Bottou and M. Littman, editors, Proceed-
     New York, NY, USA, 2013. ACM.                              ings of the 26th International Conference on Ma-
                                                                chine Learning (ICML), pages 1105–1112, Mon-
[11] J. Nichols, J. Mahmud, and C. Drews. Summariz-             treal, June 2009. Omnipress.
     ing sporting events using twitter. In Proceedings
     of the 2012 ACM International Conference on In-       [19] D. Wang, T. Li, and M. Ogihara. Generat-
     telligent User Interfaces, IUI ’12, pages 189–198,         ing pictorial storylines via minimum-weight con-
     New York, NY, USA, 2012. ACM.                              nected dominating set approximation in multi-
                                                                view graphs. In AAAI’12, pages –1–1, 2012.
[12] D. R. Radev, H. Jing, M. Styś, and D. Tam.
     Centroid-based summarization of multiple doc-