StreamGrid: Summarization of Large Scale Events using Topic Modelling and Temporal Analysis Emmanouil Schinas Symeon Papadopoulos 1 Dept. of Electrical & Computer Engineering Information Technologies Institute Aristotle University of Thessaloniki Centre for Research & Technology Hellas 2 Information Technologies Institute Thessaloniki, Greece Centre for Research & Technology Hellas papadop@iti.gr manosetro@iti.gr Yiannis Kompatsiaris Pericles A. Mitkas 1 Information Technologies Institute Dept. of Electrical & Computer Engineering Centre for Research & Technology Hellas Aristotle University of Thessaloniki 2 Thessaloniki, Greece Information Technologies Institute ikom@iti.gr Centre for Research & Technology Hellas mitkas@eng.auth.gr 1 Introduction Abstract Due to their increasing popularity, micro-blogging platforms, and especially Twitter, have evolved into a Due to the increasing popularity of microblog- powerful means for getting connected with real world ging platforms, the amount of messages re- events. In large scale public events, ranging from sport lated to large scale public events reach impres- events, such as football matches, to political events sive levels. Although such messages can be and festivals, the users that are somehow involved in quite informative regarding different aspects the event use social media to share their experiences of the main event, there is a lot of spam and re- and express their opinions. In many cases, these mes- dundancy that makes it challenging to extract sages are quite informative and provide real-time cov- insights regarding the event of interest. In this erage of the ongoing event and may be correlated with work we describe a summarization framework important variables related to the event, e.g. film rat- that captures the important moments of an ings [13]. Thus, not surprisingly, the amount of event- event by using a combination of topic mod- related messages has reached impressive levels [1]. elling and bursty activity detection. We pro- However, a significant percentage of micro-blogging pose a data structure named StreamGrid, that messages can be considered as non-informative or maintains the information of active topics in spam. This fact combined with the huge number regular time intervals at several scales. This of messages, makes it very challenging for interested structure is used for the creation of concise stakeholders, such as event organizers and enthusiasts, summaries for any time interval. Finally, the to monitor the evolution of the event and understand evaluation on a large Twitter dataset around its important moments. In case of long-running events, the Sundance Film Festival demonstrates the this becomes even more difficult due to the existence of potential of the proposed framework. numerous sub-events occurring within the main event. Such sub-events have different durations and impact Copyright c by the paper’s authors. Copying permitted only for private and academic purposes. on the main event. In addition, a large portion of the In: S. Papadopoulos, P. Cesar, D. A. Shamma, A. Kelliher, R. messages contain conversations about other entities of Jain (eds.): Proceedings of the SoMuS ICMR 2014 Workshop, interest associated with the event. In other words, an Glasgow, Scotland, 01-04-2014, published at http://ceur-ws.org event-related stream of messages is quite diverse and noisy, with different associated topics, conversations of information. among users, and spam messages. Thus, there is a Shen et al. [16] present a participant-based ap- profound need for event-based summarization methods proach for event summarization. A mixture model is that can produce concise multi-document summaries proposed to detect sub-events at participant level, and for any time interval of the event, covering its main the tf · idf centroid approach is used to create a sum- aspects. mary of each sub-event. Similarly, Chakrabarti and The framework we propose in this work aims to cre- Punera [4] propose the use of a Hidden Markov Model ate topic-based summaries of large-scale events for ar- to obtain a time-based segmentation of the stream that bitrary time durations by applying post-analysis on captures the underlying sub-events. Alonso and Shiells the stream of event related messages. First, we ap- [2] create timelines for football games, annotated with ply LDA topic modelling to discover the underlying the key aspects of the event. Dork et al. [5] propose aspects of the event. To support summarization, we an interface for large scale events that employs several create a 2D-array structure named StreamGrid. This visualizations for interactive presentation of the event. maintains the information of each topic at each time A different problem is tackled by Wang et al. [19]. interval. To create the grid we assign messages to the Unlike other methods, that method aims to create a detected topics and divide topic-associated messages storyline from a set of event-related objects. A multi- using regular time intervals. Next, we create timelines view graph of objects is constructed, where the two for the set of topics and use them to detect the set type of edges capture the contextual similarity and of active topics at each time interval by finding the the temporal proximity among objects. Then a time- bursty activity periods in them. A greedy algorithm ordered sequence of important objects is obtained via is used to obtain a set of representative messages that graph optimization. Lin et al. [7] extends the previous maximize the coverage of the event by selecting the work to generate storylines from a set of micro-blog maximum possible number of active topics and min- messages for arbitrary queries. To achieve this, they imize redundancy across messages at the same time. use query expansion techniques to retrieve the query- Finally, to demonstrate the potential of the proposed related messages and then apply the same method as framework, we perform an experimental evaluation on [19] to create the storyline. a real-world dataset consisting of tweets around the Another approach for summarizing evolving tweet Sundance Film Festival 2013. streams is proposed by the Sumblr framework [17]. The paper is organized as follows. Section 2 con- This relies on an online clustering algorithm for tweets tains a brief survey of related methods and applica- and on maintaining distilled statistics of the clusters at tions. Section 3 describes in detail the proposed frame- specific time snapshots using a structure, named Pyra- work. Section 4 presents an experimental case study midal Time Frame. Then, a summarization technique on the Sundance 2013 dataset. We conclude the paper is employed for generating summaries of arbitrary time and describe future work in Section 5. durations based on the LexRank method [6]. 2 Related Work 3 Proposed Method An overview of the proposed method is illustrated in A substantial body of work exists in literature on Figure 1. The proposed framework processes a stream the problem of micro-blogging summarization. A no- of online messages around an event and extracts infor- table method for multi-document summarization re- mative summaries for any requested time duration. In lies on the computation of centroids based on content. other words, the proposed framework identifies a set Namely, the summary of a set of documents, repre- of topics and then selects related messages based on sented as tf · idf vectors, consists of those documents their importance. that are closest to the centroid of the set [12]. Sharifi et al. [15] propose a method for the generation of a single 3.1 Topic Modelling sentence from a set of tweets, by using a graph-based technique. Nichols et al. [11] describe an algorithm Topic modelling is based on the assumption that each that generates a summary of sports events. They use document can be described as a random mixture of a peak detection algorithm to detect important mo- topics and each topic as a multinomial distribution ments and then apply the method of [15] to extract over terms. In our approach we employ topic mod- summary sentences from the tweets around these mo- elling by using the well known Latent Dirichlet Allo- ments. The work of [8] uses linear-programming opti- cation model [3] across the whole stream of messages. mization to select summary sentences from tweets re- This process is applied after the end of the event, when lated to trending topics. Notably, they also make use all the messages are available. However, topic mod- of linked Web content to extend the original sources elling in micro-blog messages is problematic due to the P −logP (d|θ, φ, G) d∈D perplexity(Dtest ) = exp P (2) Ld d∈D For the similarity between two topics, we calculate the Jaccard coefficient on the sets of top N terms of each topic. Figure 1: The StreamGrid framework 3.2 StreamGrid Creation short length of their text. To overcome this, a lot of After the detection of topics we have to associate mes- approaches have been proposed. To avoid changes on sages with topics. We use the LDA model, estimated standard LDA, a relative simple solution is message from the merged documents, to infer the probabili- pooling, in which messages are pooled together to form ties of each message over the set of topics. We assign larger documents. We experimented with four meth- each message to the topic with the highest probabil- ods of message pooling in a similar way as [10]. First, ity under the condition that this probability exceeds we tried to merge messages using constant length time a predefined threshold. Although thresholding in this bins. Then, we merged messages of the same author to step leaves some messages unassigned, this is a de- form a single document. As a third option, we pooled sirable feature of the procedure as most of the unas- messages together based on their hashtags. Messages signed messages are of low quality. In other words with multiple hastags assigned to multiple documents these mesages can be considered as spam messages and messages without any hashtag were assigned to that cannot contribute any valuable information in the the document with the highest textual similarity. As summary. Next, assignments are used for the creation a fourth option, we used a 1NN clustering algorithm to of a data structure named StreamGrid. The first di- cluster messages with high textual similarity. Each of mension of this grid comprises the detected topics and those clusters formed a single document for the LDA the second corresponds to time, divided into regular method. In addition, for all of the pooling methods time intervals. Each cell c(i, j) of StreamGrid con- we filtered out messages having only one term and re- tains the set of messages Mij associated with topici , moved standard stopwords to discard the non infor- at time interval j. Each message m is represented as a mative terms. tf · idf vector. The idf components are pre-computed over the whole set of messages. The tf part is the Another drawback of LDA is that the number of frequency of a term in the message normalized by the topics must be defined; obviously, the number of top- maximum frequency. Due to the short length of the ics in not known a priori in the context of large events. documents in micro-blogging platforms, this compo- To determine the optimal number of topics for a given nent often equals to one. Using the set of associated set of documents D we calculate two metrics, perplex- messages in each cell, we calculate a merged tf · idf ity and average similarity across topics for different vector vij . In addition, we calculate a weight for each number of topics and choose a value that minimizes message and rank them according to it. The weight of both metrics. For the calculation of perplexity we slit a message m, associated with topici , in a specific time D into training and test documents, we estimate LDA window j is defined as the sum of the weights of the over a range of possible numbers of topics using Dtrain terms contained in m. To calculate the weight of each and calculate the total perplexity of the documents in term t, we use the following tf · idf scheme: the test dataset Dtest [18]. The perplexity of a docu- ment d given a trained model is defined as follows: W (t, i, j) = tfij (t) · idf (t) (3) −logP (d|θ, φ, G) X perplexity(d) = exp (1) W (m, j) = W (t, i, j) (4) Ld t∈m where Ld is the number of terms in document d, θ is where tfij (t) is the frequency of term t ∈ vij into the the document-specific topic distribution, φ is the word cell c(i, j) of StreamGrid, and idf (t) is the inverse doc- distribution for topics, and G is the set of topics in the ument frequency over the whole corpus, W (t, i, j) is trained model. The total perplexity over dataset Dtest the weight of term t in c(i, j), and W (m, j) the weight is defined as of message m in time interval j. To detect the time intervals that a specific topici of use an adapted version of the greedy algorithm used in StreamGrid is active, we create a topic timeline by us- [17]. The algorithm selects messages that are associ- ing time intervals as bins, and counting the associated ated with different topics and that simultaneously have messages of topici in bin j. Then, we apply the peak low degree of textual similarity between each other. detection algorithm used in [9] to detect time frames in The selection process is detailed by Algorithm 1. For the timeline that exhibit bursty behaviour. The algo- an arbitrary time frame F = [a, b], we first find the rithm identifies windows with high activity by finding sequence of time intervals in StreamGrid that covers significant increases in the timeline, compared to the F. Then we get the set of active topics. A topici is historical mean value of activity. The time windows active in F if any cell c(i, j) contained in F is active. reported by the algorithm are used to set the active Also, the significance score of an active topic in F is topics of each time interval. For example, if for a spe- defined as the maximum significance score across all cific topic i, the algorithm identifies a time window time intervals in F. The weight W (t, i, F) of a term [a, b] with high activity, then we define all the time in- t for topici in F is defined as the sum of the weights tervals a ≤ j ≤ b as active moments of topici . After in each cell c(i, j) ∈ F. In a similar way, we define this step, the cells of StreamGrid, have a flag that in- the weight W (m, F) of message m over F. Note that dicates whether a specific cell is active or not. We use although a message belongs to a specific time interval, this flag to select a summary subset of messages, as we use the term weights across the whole time frame described in the next paragraph. Also for each active to calculate the weight of m. topici in a specific time interval j, we calculate a score that captures its significance over the rest of the active Algorithm 1 Topic-Time summarization topics A in the same time interval. Input: StreamGrid, a time frame F, length of sum- mary L Output: a summary set S |M | Signif icance(topici , j) = P ij (5) 1: S = ∅ |Mkj | 2: A = {set topick ∈A  of active topics in F}  3: Mc = m|argmaxW (m, i, F), ∀i ∈ A In addition, to have an overall estimation of the m importance of each topic throughout the event, we 4: while |S| < L or Mc 6= ∅ do calculate two measures for each topic using a simi- 5: for each message m in Mc do lar approach as [14]. More specifically we define the 6: calculate score(m) according to Equation 8 peakiness of a topic as: 7: end for 8: Select mmax = argmax[score(m)] max|Mij | mi peakiness(topici ) = P (6) 9: S = S ∪ {mmax } |Mij | ∀j 10: Mc = Mc − {mmax } 11: end while and its persistence as 12: if |S| < L then |Mij | 13: M = ∪Mij , ∀i ∈ A, j ∈ F avg P |Mij | 14: M0 = M − S tpeak 200 there is by a parameter a. The first part, measures the impor- a substantial proportion of topics that have no asso- tance of the message, while the second the redundancy ciated message. Taking into account these facts, we compared to the set of already selected messages. The set K = 200 for the rest of the evaluation. Regarding importance of a message m ∈ topici is a combination the pooling scheme, merging tweets having the same of two factors: a) the significance of the topic it be- hashtags into single documents gave us the best per- longs to, at this time frame, and b) the contribution formance with respect to perplexity and average topic of its textual content. To measure the redundancy of similarity. a message, we compute its average cosine similarity to the already selected messages. If the summary length is not reached, we perform the same selection process on the set of tweets that belong to the active topics (Lines 12-23). score(m) = a∗Importance(m)−(1−a)∗Redundancy(m) (8) Importance(m) = Signif icance(i, F) ∗ W (m, F) (9) Redundance(m, S) = avg Similarity(m, m0 ) (10) m0 ∈S 4 Experiments 4.1 Dataset and event description We conducted an evaluation of the proposed method on a dataset around the Sundance 2013 Film Festi- val that took place between January 15th and 30th, 2013. We used the Streaming API of Twitter to ac- quire tweets containing terms related to Sundance and Figure 2: Perplexity and Average Similarity between posted during the event. More precisely, we collected topics for different number of topics K all tweets containing the hashtags, #sundance, #sun- dance2013 and #sundancefest, and all the tweets that mentioned the official account of Sundance Film Fes- 4.3 StreamGrid Construction tival (@sundancefest). This resulted in a dataset of The first part of Table 2 contains the top five topics 201,752 tweets. Among them, 100,046 were original with respect to the peakiness and the second one the tweets, while the rest of them were retweets. Although topics with the highest persistence ratio. Examining using three hashtags and one mentioned account cov- the set of persistent topics we conclude that they can ers only a subset of all possible tweets about the event, be divided into two main categories: The first com- we consider this subset sufficiently representative as prises the truly persistent topics that are regularly the vast majority of Twitter’s users tend to adopt the discussed during the event, while the second category official hashtags provided by organizers during events. is made up of multiplexed topics that LDA failed to split further. This is due to the fact that some top- 4.2 Topic detection ics are conceptually different but share a similar set of Figure 2 shows the perplexity and average similarity related terms. This obviously affects summarization for different numbers of topics K. Although there is performance, as for each topic we select only the top significant variance for the different values of K, the weighted message. Thus, if the topic contains more main trend for perplexity is to decrease as K increases. than one concepts then the summarization algorithm As we can see from Figure 2, the average similarity be- selects only one concept and ignores the rest. tween all pairs of topics appears to stabilize for values Figure 3 depicts the timelines of the same two sets of K larger than 100 topics. However, having a large of topics respectively. It becomes obvious that peaky Figure 4: StreamGrid: Each cell of StreamGrid corre- sponds to a specific time interval and topic potentially interesting events that gathered less attention tend to be missed. • tf · idf Summarizer: We use the tf · idf weighting Figure 3: Timelines of the top five peaky and persis- scheme described in the previous section to get tent topics the L highest weighted tweets. topics are highly localized, while persistent topics sus- • Cluster-based Summarizer: Instead of active top- tain for the whole duration of the event. To provide a ics, we divide the tweets of the time interval into L visual representation of the StreamGrid structure over clusters using k-means clustering. For each cluster the whole duration of the event, we represent it as a produced this way, we pick the highest weighted heat map (Figure 4). The coloured cells in the grid tweet using the tf · idf scheme. represent the time intervals, in which the correspond- • LexRank Summarizer: We create a graph where ing topics are active, and the color of the cell gives nodes represent tweets and the weights of edges the significance of each active topic at this point. As between nodes represent their pairwise cosine sim- shown in Figure 4, StreamGrid appears to be sparse, ilarity. The total weight of a tweet is the sum of as only a few cells in it contain active topics. How- the weights of the adjacent edges. The summary ever, one can also observe several topics (rows) that consists of the L highest weighted tweets in the exhibit consistent activity over the whole duration of graph. the event. 4.4 Summarization Table 1: Details of five time intervals with the highest Baselines: To evaluate the summaries produced with activity during Sundance Film Festival 2013 StreamGrid, we used five baseline methods. Given an Start End #Tweets arbitrary time interval, we first get the set of messages Thu Jan 17 23:00 Fri Jan 18 00:00 1545 posted during this interval and then we apply the fol- Sat Jan 19 19:00 Sat Jan 19 20:00 1477 lowing baselines to produce a summary of constant Mon Jan 21 19:00 Mon Jan 21 20:00 1247 length L. Sun Jan 27 03:00 Sun Jan 27 08:00 3735 Wed Jan 23 18:00 Wed Jan 23 21:00 1910 • Random Summarizer: For the set of tweets we choose randomly a subset of L tweets. Finally, we compare the results of the StreamGrid Summarizer to the ones of the baseline methods for five • Popularity Summarizer: We select the L most time intervals that are connected with high activity retweeted messages to form a summary. This during the main event. We detect these intervals by favours the tweets that have attracted the atten- applying the peak detection algorithm of the previous tion of the audience. However, niche topics and section to the timeline of the whole dataset. We rank the detected bursts according to the rate of tweets and use the top five of them. The details of these intervals are provided in Table 1. Table 3 contains summaries consisting of five tweets using StreamGrid and three of the baselines for the time period around the Awards Ceremony of Sundance Film Festival. Unsurprisingly, this is the time period with the highest peak during the event. During this period what may be reasonably considered as impor- tant pertains to the films that won awards. Such messages are usually posted by authoritative users and become highly retweeted. For this reason, sum- maries based on the number of retweets cover quite effectively the winning films. However, in other cases choosing very popular tweets does not lead to informa- tive summaries. For example in the third time inter- Figure 5: StreamGrid-based Multimedia Summary val, the summary consists of tweets like “So freaking during awards ceremony (4th row in Table 1) cool. #sundance http://t.co/C7a8rSaw” and “#Sun- dance day 4- leavin for Vegas now. Bye for now http://t.co/C2aRZnEC”. These tweets were retweeted a lot, but may be considered as non-informative for the event. On the other hand, StreamGrid-based sum- maries for the Awards Ceremony contain tweets about winning films, even though these messages are not very popular. That is an indication that StreamGrid may detect an important topic even in cases that this does not attract attention from many users. Regard- ing the Cluster-based Summarization, an interesting feature is that avoidance of redundancy is inherent in the method, as similar messages are clustered together, and only the most weighted of them are selected for the summary. However, the weakness of the method is that not all clusters represent important aspects of the event. Figure 6: Multimedia Summary using most retweeted Another indication of how topic modelling can im- images during awards ceremony (4th row in Table 1) prove summarization is the fact that StreamGrid, com- pared to the other baselines, tends to include tweets that mention films. The reason that this happens is that most of the topics detected by LDA are about films, so when the proposed summarization algorithm selects a set of tweets from the pool of active moments, this leads to the selection of film-associated tweets. We expect that, for other types of events, it will naturally generalize to other pertinent entities of interest that occur frequently, thus leading to the creation of top- ics. A noticeable disadvantage of baselines such as tf · idf and LexRank is the remarkable existence of redundancy. For example in case of LexRank four out of five tweets are related to the ’Fruitvale’ film. This indicates that redundancy minimization is a necessary component of any summarization approach. Finally, to evaluate how well the proposed method Figure 7: Multimedia Summary using LexRank during can create visual summaries, we apply it on the subset awards ceremony (4th row in Table 1) of tweets with embedded pictures. These tweets that comprise about 10% of the dataset create a consider- Table 2: Examples of peaky and persistent topics Peaky Topics Topic Representative Terms Peakiness #tweets 135 paris, hilton, Blackfish, cnn, films 0.358 695 133 death, drink, countryman, sundance, charlie 0.247 588 11 lovelace, amanda, seyfried, portraits, premiere 0.161 1293 50 defeat, inevitable, pete, mister, film 0.143 267 29 butch, dynamite, android, worth, apps 0.123 323 Persistent Topics Topic Representative Terms Persistence #tweets 63 hemingway, running, follow, crazy , marshall 3.963 2494 75 jehane, square, girlrising, premiere, screening 2.650 500 108 vhs, sequel , horror, review , time 2.318 469 45 afar, week, enjoy, ways, kicked 1.612 127 17 lindsay, lohan, canyons, blame, snubbed 1.557 343 ably sparser StreamGrid as the bursty periods in this proach with competing summarization algorithms in a subset are much fewer. An example of a multimedia more systematic way, over more events and with the summary using StreamGrid for the Awards Ceremony help of independent evaluators, with the goal of better is shown in Figure 5. Comparing the StreamGrid- capturing the subjective quality aspects of summariza- based multimedia summaries with the ones produced tion. Taking into account the large number of topic by the popular images (6), we observe that Stream- modelling techniques that appeared in literature over Grid does not perform noticeably better in this task. the last years, we plan to investigate how the under- This can be explained by the fact that tweets with em- lying model affects the summarization process. Fur- bedded media have text of very low length and infor- thermore, we intend to create a real-time version of mativeness, which leads LDA to inferior performance StreamGrid, which could be used to get summaries of with respect to the creation of representative topics evolving and continuous streams of messages. To this and the assignment of messages to them. Regarding end, we plan to employ more advanced topic modelling the redundancy in multimedia summaries, we found methods that can detect topic drift and unseen topics that using cosine similarity on the text of images as on new incoming messages. Finally, we will investi- a metric of similarity between them is not appropri- gate methods to integrate popularity and user author- ate to minimize redundancy. This can be seen in the ity into the summarization process. LexRank-based summary in Figure 7. To this end, a Acknowledgements: This work is supported by combination of visual and textual features is foreseen the SocialSensor FP7 project, partially funded by the as a more suitable means for discarding similar images. EC under contract number 287975. 5 Conclusion and future work In this work, we proposed a framework for the summa- References rization of micro-blogging messages during large scale [1] Celebrating #SB48 on Twitter. events. The framework makes use of topic modelling https://blog.twitter.com/2014/ to detect the underlying aspects of an event to the set celebrating-sb48-on-twitter, 2014. [Online; of related messages. Then, for each topic it derives accessed 27-Feb-2014]. its temporal representation by associating messages to the discovered topics. Subsequently, a burst detec- [2] O. Alonso and K. Shiells. Timelines as summaries tion algorithm is used to find the important intervals of popular scheduled events. In Proceedings of for each topic. Finally, a greedy summarization algo- the 22nd international conference on World Wide rithm generates summaries for arbitrary time intervals Web companion, pages 1037–1044. International using the set of active topics for the same time dura- World Wide Web Conferences Steering Commit- tion. The results of experiments in a Twitter dataset tee, 2013. around the Sundance Film Festival appear promising, demonstrating the potential of topic modelling on the [3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent multi-document summarization problem. dirichlet allocation. J. Mach. Learn. Res., 3:993– For future work, we first plan to compare our ap- 1022, Mar. 2003. Table 3: Summaries during awards ceremony (4th line in Table 1) Method Examples Profound comment from @JoKiefer : Looper storyline echoes war on terror. Kill the terrorist before he becomes one? #Sundance13 #dirtywars #Sundance Institute Mahindra Global Filmmaking Award winnners include UK co-prodcution: Eva Weber: Let the Northern Lights Erase your Name #Sundance Institute Mahindra Global Filmmaking Award winnners include UK co-prodcution: tf · idf Eva Weber: Let the Northern Lights Erase your Name #PussyRiot - A Punk Prayer takes home a World Cinema Doc Special Jury Award, directors Mike Lerner & Maxim Pozdorovkin: http://t.co/VmOP3tmg Gideon’s Army was really eye-opening. I had no idea how many brave men and women have died trying to put Bibles in hotel rooms. #Sundance Yes! Audience Award U.S. Dramatic to ’Fruitvale’ starring Wallace, aka Michael B. Jordan. Predict Grand Jury Prize, too #sundance Fruitvale wins audience award for U.S. dramatic; looks like it cld well be a sweep, w Grand Jury prize too. #Sundance @vulture: Yes! Audience Award U.S. Dramatic to ’Fruitvale’ starring Wallace, aka Michael B. LexRank Jordan. Predict Grand Jury Prize, too #sundance @vulture: Start the Oscar watch now. ’Blood Brother’ wins both Grand Jury and Audience Award for U.S. Documentary #sundance it’s coming FRUITVALE wins the #Sundance Grand Jury Prize AND the Audience Award. Could not be happier. Congrats @fruitvalemovie and Ryan Coogler! #PussyRiot - A Punk Prayer takes home a World Cinema Doc Special Jury Award, directors Mike Lerner Maxim Pozdorovkin: http://t.co/VmOP3tmg Sebastian Silva’s ”Crystal Fairy” Wins #Sundance World Cinema Dramatic Directing Award: http://t.co/sKNn1Dqf Ryan Coogler’s ”Fruitvale” Wins #Sundance U.S. Dramatic Audience Award, presented by @Acu- Popularity raInsider: http://t.co/Lknocyos JUST IN: Pinoy film ”Metro Manila” wins the World Cinema Dramatic Audience award at the Sundance Film Festival — via @goldenglobes ”The Spectacular Now” Wins #Sundance U. S. Dramatic Special Jury Award for actors Miles Teller Shailene Woodley: http://t.co/1Ouz2B7a Sebastian Silva’s recorded speech: singing Hava Nagila while warping his face in Photo Booth. Word. #Sundance My pics of ”The Spectacular Now” #Sundance Q&A , winner Special Jury award for acting to Miles Teller & Shailene Woodley http://t.co/G7r5QK1p #sundance Awards ... US Documentary ... Best Cinematography goes to Dirty Wars StreamGrid You know it would be hilarious if AUSTENLAND won the big #Sundance prize. Admit it, you’d soak that up. YES! ”@indiewire: Ryan Coogler wins the audience award for FRUITVALE. #sundance http://t.co/NiVIzcTU” Grand Jury Prizes #Sundance: ”Fruitvale” (dramatic) & ”Blood Brother” (doc) FilmLinc list of winners: http://t.co/iyoeHuGz [4] D. Chakrabarti and K. Punera. Event summa- uments. Inf. Process. Manage., 40(6):919–938, rization using tweets. In ICWSM, 2011. Nov. 2004. [5] M. Dork, D. Gruen, C. Williamson, and [13] E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kom- S. Carpendale. A visual backchannel for large- patsiaris, Y. Mass, J. Herzig, and L. Boudakidis. scale events. Visualization and Computer Graph- Eventsense: Capturing the pulse of large-scale ics, IEEE Transactions on, 16(6):1129–1138, events by mining social media streams. In Pro- 2010. ceedings of the 17th Panhellenic Conference on Informatics, PCI ’13, pages 17–24, New York, [6] G. Erkan and D. R. Radev. Lexrank: Graph- NY, USA, 2013. ACM. based lexical centrality as salience in text summa- rization. J. Artif. Int. Res., 22(1):457–479, Dec. [14] D. A. Shamma, L. Kennedy, and E. F. Churchill. 2004. Peaks and persistence: Modeling the shape of mi- croblog conversations. In Proceedings of the ACM [7] C. Lin, C. Lin, J. Li, D. Wang, Y. Chen, and T. Li. 2011 Conference on Computer Supported Cooper- Generating event storylines from microblogs. In ative Work, CSCW ’11, pages 355–358, New York, Proceedings of the 21st ACM International Con- NY, USA, 2011. ACM. ference on Information and Knowledge Manage- ment, CIKM ’12, pages 175–184, New York, NY, [15] B. Sharifi, M.-A. Hutton, and J. Kalita. Summa- USA, 2012. ACM. rizing microblogs automatically. In Human Lan- guage Technologies: The 2010 Annual Conference [8] F. Liu, Y. Liu, and F. Weng. Why is ”sxsw” of the North American Chapter of the Association trending? exploring multiple text sources for for Computational Linguistics, HLT ’10, pages twitter topic summarization. In Proceedings of 685–688, Stroudsburg, PA, USA, 2010. Associa- the ACL Workshop on Language in Social Media tion for Computational Linguistics. (LSM), pages 66–75, 2011. [16] C. Shen, F. Liu, F. Weng, and T. Li. A [9] A. Marcus, M. S. Bernstein, O. Badar, D. R. participant-based approach for event summariza- Karger, S. Madden, and R. C. Miller. Twitinfo: tion using twitter streams. In Proceedings of Aggregating and visualizing microblogs for event NAACL-HLT, pages 1152–1162, 2013. exploration. In Proceedings of the SIGCHI Con- ference on Human Factors in Computing Systems, [17] L. Shou, Z. Wang, K. Chen, and G. Chen. Sum- CHI ’11, pages 227–236, New York, NY, USA, blr: Continuous summarization of evolving tweet 2011. ACM. streams. In Proceedings of the 36th International ACM SIGIR Conference on Research and De- [10] R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. velopment in Information Retrieval, SIGIR ’13, Improving lda topic models for microblogs via pages 533–542, New York, NY, USA, 2013. ACM. tweet pooling and automatic labeling. In Pro- ceedings of the 36th International ACM SIGIR [18] H. M. Wallach, I. Murray, R. Salakhutdinov, and Conference on Research and Development in In- D. Mimno. Evaluation methods for topic models. formation Retrieval, SIGIR ’13, pages 889–892, In L. Bottou and M. Littman, editors, Proceed- New York, NY, USA, 2013. ACM. ings of the 26th International Conference on Ma- chine Learning (ICML), pages 1105–1112, Mon- [11] J. Nichols, J. Mahmud, and C. Drews. Summariz- treal, June 2009. Omnipress. ing sporting events using twitter. In Proceedings of the 2012 ACM International Conference on In- [19] D. Wang, T. Li, and M. Ogihara. Generat- telligent User Interfaces, IUI ’12, pages 189–198, ing pictorial storylines via minimum-weight con- New York, NY, USA, 2012. ACM. nected dominating set approximation in multi- view graphs. In AAAI’12, pages –1–1, 2012. [12] D. R. Radev, H. Jing, M. Styś, and D. Tam. Centroid-based summarization of multiple doc-