<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">StreamGrid: Summarization of Large Scale Events using Topic Modelling and Temporal Analysis</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Emmanouil</forename><surname>Schinas</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Symeon</forename><surname>Papadopoulos</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Yiannis</forename><surname>Kompatsiaris</surname></persName>
						</author>
						<author role="corresp">
							<persName><forename type="first">Pericles</forename><forename type="middle">A</forename><surname>Mitkas</surname></persName>
							<email>mitkas@eng.auth.gr</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Electrical &amp; Computer Engineering</orgName>
								<orgName type="institution" key="instit1">Aristotle</orgName>
								<orgName type="institution" key="instit2">University of Thessaloniki</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Information Technologies Institute Centre for Research &amp; Technology Hellas</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Information Technologies Institute Centre for Research</orgName>
								<orgName type="institution">Technology Hellas</orgName>
								<address>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">Information Technologies Institute Centre for Research</orgName>
								<orgName type="institution">Technology Hellas</orgName>
								<address>
									<settlement>Thessaloniki</settlement>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="department">Dept. of Electrical &amp; Computer Engineering</orgName>
								<orgName type="institution" key="instit1">Aristotle</orgName>
								<orgName type="institution" key="instit2">University of Thessaloniki</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff5">
								<orgName type="department">Information Technologies Institute Centre for Research &amp; Technology Hellas</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff6">
								<address>
									<postCode>01-04-2014</postCode>
									<settlement>Glasgow</settlement>
									<region>Scotland</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">StreamGrid: Summarization of Large Scale Events using Topic Modelling and Temporal Analysis</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F0218062487A40EEE0D684891A855B51</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Due to the increasing popularity of microblogging platforms, the amount of messages related to large scale public events reach impressive levels. Although such messages can be quite informative regarding different aspects of the main event, there is a lot of spam and redundancy that makes it challenging to extract insights regarding the event of interest. In this work we describe a summarization framework that captures the important moments of an event by using a combination of topic modelling and bursty activity detection. We propose a data structure named StreamGrid, that maintains the information of active topics in regular time intervals at several scales. This structure is used for the creation of concise summaries for any time interval. Finally, the evaluation on a large Twitter dataset around the Sundance Film Festival demonstrates the potential of the proposed framework.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Due to their increasing popularity, micro-blogging platforms, and especially Twitter, have evolved into a powerful means for getting connected with real world events. In large scale public events, ranging from sport events, such as football matches, to political events and festivals, the users that are somehow involved in the event use social media to share their experiences and express their opinions. In many cases, these messages are quite informative and provide real-time coverage of the ongoing event and may be correlated with important variables related to the event, e.g. film ratings <ref type="bibr" target="#b16">[13]</ref>. Thus, not surprisingly, the amount of eventrelated messages has reached impressive levels <ref type="bibr" target="#b0">[1]</ref>.</p><p>However, a significant percentage of micro-blogging messages can be considered as non-informative or spam. This fact combined with the huge number of messages, makes it very challenging for interested stakeholders, such as event organizers and enthusiasts, to monitor the evolution of the event and understand its important moments. In case of long-running events, this becomes even more difficult due to the existence of numerous sub-events occurring within the main event. Such sub-events have different durations and impact on the main event. In addition, a large portion of the messages contain conversations about other entities of interest associated with the event. In other words, an event-related stream of messages is quite diverse and noisy, with different associated topics, conversations among users, and spam messages. Thus, there is a profound need for event-based summarization methods that can produce concise multi-document summaries for any time interval of the event, covering its main aspects.</p><p>The framework we propose in this work aims to create topic-based summaries of large-scale events for arbitrary time durations by applying post-analysis on the stream of event related messages. First, we apply LDA topic modelling to discover the underlying aspects of the event. To support summarization, we create a 2D-array structure named StreamGrid. This maintains the information of each topic at each time interval. To create the grid we assign messages to the detected topics and divide topic-associated messages using regular time intervals. Next, we create timelines for the set of topics and use them to detect the set of active topics at each time interval by finding the bursty activity periods in them. A greedy algorithm is used to obtain a set of representative messages that maximize the coverage of the event by selecting the maximum possible number of active topics and minimize redundancy across messages at the same time.</p><p>Finally, to demonstrate the potential of the proposed framework, we perform an experimental evaluation on a real-world dataset consisting of tweets around the Sundance Film Festival 2013.</p><p>The paper is organized as follows. Section 2 contains a brief survey of related methods and applications. Section 3 describes in detail the proposed framework. Section 4 presents an experimental case study on the Sundance 2013 dataset. We conclude the paper and describe future work in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>A substantial body of work exists in literature on the problem of micro-blogging summarization. A notable method for multi-document summarization relies on the computation of centroids based on content. Namely, the summary of a set of documents, represented as tf • idf vectors, consists of those documents that are closest to the centroid of the set <ref type="bibr" target="#b15">[12]</ref>. Sharifi et al. <ref type="bibr" target="#b18">[15]</ref> propose a method for the generation of a single sentence from a set of tweets, by using a graph-based technique. Nichols et al. <ref type="bibr" target="#b14">[11]</ref> describe an algorithm that generates a summary of sports events. They use a peak detection algorithm to detect important moments and then apply the method of <ref type="bibr" target="#b18">[15]</ref> to extract summary sentences from the tweets around these moments. The work of <ref type="bibr" target="#b11">[8]</ref> uses linear-programming optimization to select summary sentences from tweets related to trending topics. Notably, they also make use of linked Web content to extend the original sources of information.</p><p>Shen et al. <ref type="bibr" target="#b19">[16]</ref> present a participant-based approach for event summarization. A mixture model is proposed to detect sub-events at participant level, and the tf • idf centroid approach is used to create a summary of each sub-event. Similarly, Chakrabarti and Punera <ref type="bibr" target="#b7">[4]</ref> propose the use of a Hidden Markov Model to obtain a time-based segmentation of the stream that captures the underlying sub-events. Alonso and Shiells <ref type="bibr" target="#b1">[2]</ref> create timelines for football games, annotated with the key aspects of the event. Dork et al. <ref type="bibr" target="#b8">[5]</ref> propose an interface for large scale events that employs several visualizations for interactive presentation of the event.</p><p>A different problem is tackled by Wang et al. <ref type="bibr" target="#b22">[19]</ref>. Unlike other methods, that method aims to create a storyline from a set of event-related objects. A multiview graph of objects is constructed, where the two type of edges capture the contextual similarity and the temporal proximity among objects. Then a timeordered sequence of important objects is obtained via graph optimization. Lin et al. <ref type="bibr" target="#b10">[7]</ref> extends the previous work to generate storylines from a set of micro-blog messages for arbitrary queries. To achieve this, they use query expansion techniques to retrieve the queryrelated messages and then apply the same method as <ref type="bibr" target="#b22">[19]</ref> to create the storyline.</p><p>Another approach for summarizing evolving tweet streams is proposed by the Sumblr framework <ref type="bibr" target="#b20">[17]</ref>. This relies on an online clustering algorithm for tweets and on maintaining distilled statistics of the clusters at specific time snapshots using a structure, named Pyramidal Time Frame. Then, a summarization technique is employed for generating summaries of arbitrary time durations based on the LexRank method <ref type="bibr" target="#b9">[6]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Proposed Method</head><p>An overview of the proposed method is illustrated in Figure <ref type="figure">1</ref>. The proposed framework processes a stream of online messages around an event and extracts informative summaries for any requested time duration. In other words, the proposed framework identifies a set of topics and then selects related messages based on their importance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Topic Modelling</head><p>Topic modelling is based on the assumption that each document can be described as a random mixture of topics and each topic as a multinomial distribution over terms. In our approach we employ topic modelling by using the well known Latent Dirichlet Allocation model <ref type="bibr" target="#b2">[3]</ref> across the whole stream of messages. This process is applied after the end of the event, when all the messages are available. However, topic modelling in micro-blog messages is problematic due to the Figure <ref type="figure">1</ref>: The StreamGrid framework short length of their text. To overcome this, a lot of approaches have been proposed. To avoid changes on standard LDA, a relative simple solution is message pooling, in which messages are pooled together to form larger documents. We experimented with four methods of message pooling in a similar way as <ref type="bibr" target="#b13">[10]</ref>. First, we tried to merge messages using constant length time bins. Then, we merged messages of the same author to form a single document. As a third option, we pooled messages together based on their hashtags. Messages with multiple hastags assigned to multiple documents and messages without any hashtag were assigned to the document with the highest textual similarity. As a fourth option, we used a 1NN clustering algorithm to cluster messages with high textual similarity. Each of those clusters formed a single document for the LDA method. In addition, for all of the pooling methods we filtered out messages having only one term and removed standard stopwords to discard the non informative terms.</p><p>Another drawback of LDA is that the number of topics must be defined; obviously, the number of topics in not known a priori in the context of large events. To determine the optimal number of topics for a given set of documents D we calculate two metrics, perplexity and average similarity across topics for different number of topics and choose a value that minimizes both metrics. For the calculation of perplexity we slit D into training and test documents, we estimate LDA over a range of possible numbers of topics using D train and calculate the total perplexity of the documents in the test dataset D test <ref type="bibr" target="#b21">[18]</ref>. The perplexity of a document d given a trained model is defined as follows:</p><formula xml:id="formula_0">perplexity(d) = exp −logP (d|θ, φ, G) L d (1)</formula><p>where L d is the number of terms in document d, θ is the document-specific topic distribution, φ is the word distribution for topics, and G is the set of topics in the trained model. The total perplexity over dataset D test is defined as</p><formula xml:id="formula_1">perplexity(D test ) = exp d∈D −logP (d|θ, φ, G) d∈D L d<label>(2)</label></formula><p>For the similarity between two topics, we calculate the Jaccard coefficient on the sets of top N terms of each topic.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">StreamGrid Creation</head><p>After the detection of topics we have to associate messages with topics. We use the LDA model, estimated from the merged documents, to infer the probabilities of each message over the set of topics. We assign each message to the topic with the highest probability under the condition that this probability exceeds a predefined threshold. Although thresholding in this step leaves some messages unassigned, this is a desirable feature of the procedure as most of the unassigned messages are of low quality. In other words these mesages can be considered as spam messages that cannot contribute any valuable information in the summary. Next, assignments are used for the creation of a data structure named StreamGrid. The first dimension of this grid comprises the detected topics and the second corresponds to time, divided into regular time intervals. Each cell c(i, j) of StreamGrid contains the set of messages M ij associated with topic i , at time interval j. Each message m is represented as a tf • idf vector. The idf components are pre-computed over the whole set of messages. The tf part is the frequency of a term in the message normalized by the maximum frequency. Due to the short length of the documents in micro-blogging platforms, this component often equals to one. Using the set of associated messages in each cell, we calculate a merged tf • idf vector v ij . In addition, we calculate a weight for each message and rank them according to it. The weight of a message m, associated with topic i , in a specific time window j is defined as the sum of the weights of the terms contained in m. To calculate the weight of each term t, we use the following tf • idf scheme:</p><formula xml:id="formula_2">W (t, i, j) = tf ij (t) • idf (t)<label>(3)</label></formula><formula xml:id="formula_3">W (m, j) = t∈m W (t, i, j)<label>(4)</label></formula><p>where tf ij (t) is the frequency of term t ∈ v ij into the cell c(i, j) of StreamGrid, and idf (t) is the inverse document frequency over the whole corpus, W (t, i, j) is the weight of term t in c(i, j), and W (m, j) the weight of message m in time interval j.</p><p>To detect the time intervals that a specific topic i of StreamGrid is active, we create a topic timeline by using time intervals as bins, and counting the associated messages of topic i in bin j. Then, we apply the peak detection algorithm used in <ref type="bibr" target="#b12">[9]</ref> to detect time frames in the timeline that exhibit bursty behaviour. The algorithm identifies windows with high activity by finding significant increases in the timeline, compared to the historical mean value of activity. The time windows reported by the algorithm are used to set the active topics of each time interval. For example, if for a specific topic i, the algorithm identifies a time window [a, b] with high activity, then we define all the time intervals a ≤ j ≤ b as active moments of topic i . After this step, the cells of StreamGrid, have a flag that indicates whether a specific cell is active or not. We use this flag to select a summary subset of messages, as described in the next paragraph. Also for each active topic i in a specific time interval j, we calculate a score that captures its significance over the rest of the active topics A in the same time interval.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Signif icance(topic</head><formula xml:id="formula_4">i , j) = |M ij | topic k ∈A |M kj |<label>(5)</label></formula><p>In addition, to have an overall estimation of the importance of each topic throughout the event, we calculate two measures for each topic using a similar approach as <ref type="bibr" target="#b17">[14]</ref>. More specifically we define the peakiness of a topic as:</p><formula xml:id="formula_5">peakiness(topic i ) = max|M ij | ∀j |M ij |<label>(6)</label></formula><p>and its persistence as</p><formula xml:id="formula_6">persistence(topic i ) = avg t peak &lt;j |Mij | |Mij | avg j&lt;t peak |Mij | |Mij |<label>(7)</label></formula><p>where t peak is the time that the maximum peak of the timeline occurs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Topic-Time Summarization</head><p>Our goal is to use the StreamGrid to summarize the event for an arbitrary time frame. As summary we denote a set of representative messages that mention the key aspects of the selected time period. Assuming that topics can capture these aspects, we use the active topics for that period to create a summary that meets the following criteria: a) as many aspects as possible are covered and b) redundancy due to near duplicate messages is minimized. To achieve this, we use an adapted version of the greedy algorithm used in <ref type="bibr" target="#b20">[17]</ref>. The algorithm selects messages that are associated with different topics and that simultaneously have low degree of textual similarity between each other. The selection process is detailed by Algorithm 1. For an arbitrary time frame F = [a, b], we first find the sequence of time intervals in StreamGrid that covers F. Then we get the set of active topics. A topic i is active in F if any cell c(i, j) contained in F is active. Also, the significance score of an active topic in F is defined as the maximum significance score across all time intervals in F. The weight W (t, i, F) of a term t for topic i in F is defined as the sum of the weights in each cell c(i, j) ∈ F. In a similar way, we define the weight W (m, F) of message m over F. Note that although a message belongs to a specific time interval, we use the term weights across the whole time frame to calculate the weight of m.</p><p>Algorithm 1 Topic-Time summarization Input: StreamGrid, a time frame F, length of summary L Output: a summary set S</p><formula xml:id="formula_7">1: S = ∅ 2: A = {set of active topics in F} 3: M c = m|argmax m W (m, i, F), ∀i ∈ A 4: while |S| &lt; L or M c = ∅ do 5:</formula><p>for each message m in M c do 6:</p><p>calculate score(m) according to Equation <ref type="formula">8</ref>7:</p><p>end for To produce a summary S of length L, the algorithm first gets the set of active topics as described above. Then, it collects the messages M c with the highest weight W (m, F) in each active topic (line 3). Through the lines 4-11, the algorithm, following a greedy ap-proach, selects the messages that maximize the score of Equation <ref type="formula">8</ref>. This consists of two parts weighted by a parameter a. The first part, measures the importance of the message, while the second the redundancy compared to the set of already selected messages. The importance of a message m ∈ topic i is a combination of two factors: a) the significance of the topic it belongs to, at this time frame, and b) the contribution of its textual content. To measure the redundancy of a message, we compute its average cosine similarity to the already selected messages. If the summary length is not reached, we perform the same selection process on the set of tweets that belong to the active topics (Lines 12-23). 4 Experiments</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Dataset and event description</head><p>We conducted an evaluation of the proposed method on a dataset around the Sundance 2013 Film Festival that took place between January 15th and 30th, 2013. We used the Streaming API of Twitter to acquire tweets containing terms related to Sundance and posted during the event. More precisely, we collected all tweets containing the hashtags, #sundance, #sun-dance2013 and #sundancefest, and all the tweets that mentioned the official account of Sundance Film Festival (@sundancefest). This resulted in a dataset of 201,752 tweets. Among them, 100,046 were original tweets, while the rest of them were retweets. Although using three hashtags and one mentioned account covers only a subset of all possible tweets about the event, we consider this subset sufficiently representative as the vast majority of Twitter's users tend to adopt the official hashtags provided by organizers during events.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Topic detection</head><p>Figure <ref type="figure">2</ref> shows the perplexity and average similarity for different numbers of topics K. Although there is significant variance for the different values of K, the main trend for perplexity is to decrease as K increases.</p><p>As we can see from Figure <ref type="figure">2</ref>, the average similarity between all pairs of topics appears to stabilize for values of K larger than 100 topics. However, having a large number of topics creates topics with very few associated messages. We found that for K &gt; 200 there is a substantial proportion of topics that have no associated message. Taking into account these facts, we set K = 200 for the rest of the evaluation. Regarding the pooling scheme, merging tweets having the same hashtags into single documents gave us the best performance with respect to perplexity and average topic similarity.</p><p>Figure <ref type="figure">2</ref>: Perplexity and Average Similarity between topics for different number of topics K</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">StreamGrid Construction</head><p>The first part of Table <ref type="table" target="#tab_1">2</ref> contains the top five topics with respect to the peakiness and the second one the topics with the highest persistence ratio. Examining the set of persistent topics we conclude that they can be divided into two main categories: The first comprises the truly persistent topics that are regularly discussed during the event, while the second category is made up of multiplexed topics that LDA failed to split further. This is due to the fact that some topics are conceptually different but share a similar set of related terms. This obviously affects summarization performance, as for each topic we select only the top weighted message. Thus, if the topic contains more than one concepts then the summarization algorithm selects only one concept and ignores the rest. Figure <ref type="figure">3</ref> depicts the timelines of the same two sets of topics respectively. It becomes obvious that peaky Figure <ref type="figure">3</ref>: Timelines of the top five peaky and persistent topics topics are highly localized, while persistent topics sustain for the whole duration of the event. To provide a visual representation of the StreamGrid structure over the whole duration of the event, we represent it as a heat map (Figure <ref type="figure" target="#fig_2">4</ref>). The coloured cells in the grid represent the time intervals, in which the corresponding topics are active, and the color of the cell gives the significance of each active topic at this point. As shown in Figure <ref type="figure" target="#fig_2">4</ref>, StreamGrid appears to be sparse, as only a few cells in it contain active topics. However, one can also observe several topics (rows) that exhibit consistent activity over the whole duration of the event.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Summarization</head><p>Baselines: To evaluate the summaries produced with StreamGrid, we used five baseline methods. Given an arbitrary time interval, we first get the set of messages posted during this interval and then we apply the following baselines to produce a summary of constant length L.</p><p>• Random Summarizer: For the set of tweets we choose randomly a subset of L tweets.</p><p>• Popularity Summarizer: We select the L most retweeted messages to form a summary. This favours the tweets that have attracted the attention of the audience. However, niche topics and potentially interesting events that gathered less attention tend to be missed.</p><p>• tf • idf Summarizer: We use the tf • idf weighting scheme described in the previous section to get the L highest weighted tweets.</p><p>• Cluster-based Summarizer: Instead of active topics, we divide the tweets of the time interval into L clusters using k-means clustering. For each cluster produced this way, we pick the highest weighted tweet using the tf • idf scheme.</p><p>• LexRank Summarizer: We create a graph where nodes represent tweets and the weights of edges between nodes represent their pairwise cosine similarity. The total weight of a tweet is the sum of the weights of the adjacent edges. The summary consists of the L highest weighted tweets in the graph. Finally, we compare the results of the StreamGrid Summarizer to the ones of the baseline methods for five time intervals that are connected with high activity during the main event. We detect these intervals by applying the peak detection algorithm of the previous section to the timeline of the whole dataset. We rank Figure <ref type="figure">5</ref>: StreamGrid-based Multimedia Summary during awards ceremony (4 th row in Table <ref type="table" target="#tab_0">1</ref>) Figure <ref type="figure">6</ref>: Multimedia Summary using most retweeted images during awards ceremony (4 th row in Table <ref type="table" target="#tab_0">1</ref>) Figure <ref type="figure">7</ref>: Multimedia Summary using LexRank during awards ceremony (4 th row in Table <ref type="table" target="#tab_0">1</ref>) the detected bursts according to the rate of tweets and use the top five of them. The details of these intervals are provided in Table <ref type="table" target="#tab_0">1</ref>.</p><p>Table <ref type="table">3</ref> contains summaries consisting of five tweets using StreamGrid and three of the baselines for the time period around the Awards Ceremony of Sundance Film Festival. Unsurprisingly, this is the time period with the highest peak during the event. During this period what may be reasonably considered as important pertains to the films that won awards. Such messages are usually posted by authoritative users and become highly retweeted. For this reason, summaries based on the number of retweets cover quite effectively the winning films. However, in other cases choosing very popular tweets does not lead to informative summaries. For example in the third time interval, the summary consists of tweets like "So freaking cool. #sundance http://t.co/C7a8rSaw" and "#Sundance day 4-leavin for Vegas now. Bye for now http://t.co/C2aRZnEC". These tweets were retweeted a lot, but may be considered as non-informative for the event. On the other hand, StreamGrid-based summaries for the Awards Ceremony contain tweets about winning films, even though these messages are not very popular. That is an indication that StreamGrid may detect an important topic even in cases that this does not attract attention from many users. Regarding the Cluster-based Summarization, an interesting feature is that avoidance of redundancy is inherent in the method, as similar messages are clustered together, and only the most weighted of them are selected for the summary. However, the weakness of the method is that not all clusters represent important aspects of the event.</p><p>Another indication of how topic modelling can improve summarization is the fact that StreamGrid, compared to the other baselines, tends to include tweets that mention films. The reason that this happens is that most of the topics detected by LDA are about films, so when the proposed summarization algorithm selects a set of tweets from the pool of active moments, this leads to the selection of film-associated tweets. We expect that, for other types of events, it will naturally generalize to other pertinent entities of interest that occur frequently, thus leading to the creation of topics. A noticeable disadvantage of baselines such as tf • idf and LexRank is the remarkable existence of redundancy. For example in case of LexRank four out of five tweets are related to the 'Fruitvale' film. This indicates that redundancy minimization is a necessary component of any summarization approach.</p><p>Finally, to evaluate how well the proposed method can create visual summaries, we apply it on the subset of tweets with embedded pictures. These tweets that comprise about 10% of the dataset create a consider- This can be explained by the fact that tweets with embedded media have text of very low length and informativeness, which leads LDA to inferior performance with respect to the creation of representative topics and the assignment of messages to them. Regarding the redundancy in multimedia summaries, we found that using cosine similarity on the text of images as a metric of similarity between them is not appropriate to minimize redundancy. This can be seen in the LexRank-based summary in Figure <ref type="figure">7</ref>. To this end, a combination of visual and textual features is foreseen as a more suitable means for discarding similar images.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and future work</head><p>In this work, we proposed a framework for the summarization of micro-blogging messages during large scale events. The framework makes use of topic modelling to detect the underlying aspects of an event to the set of related messages. Then, for each topic it derives its temporal representation by associating messages to the discovered topics. Subsequently, a burst detection algorithm is used to find the important intervals for each topic. Finally, a greedy summarization algorithm generates summaries for arbitrary time intervals using the set of active topics for the same time duration. The results of experiments in a Twitter dataset around the Sundance Film Festival appear promising, demonstrating the potential of topic modelling on the multi-document summarization problem.</p><p>For future work, we first plan to compare our ap-proach with competing summarization algorithms in a more systematic way, over more events and with the help of independent evaluators, with the goal of better capturing the subjective quality aspects of summarization. Taking into account the large number of topic modelling techniques that appeared in literature over the last years, we plan to investigate how the underlying model affects the summarization process. Furthermore, we intend to create a real-time version of StreamGrid, which could be used to get summaries of evolving and continuous streams of messages. To this end, we plan to employ more advanced topic modelling methods that can detect topic drift and unseen topics on new incoming messages. Finally, we will investigate methods to integrate popularity and user authority into the summarization process.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>S = S ∪ {m max } 10 :</head><label>10</label><figDesc>M c = M c − {m max } 11: end while 12: if |S| &lt; L then 13: M = ∪M ij , ∀i ∈ A, j ∈ F 14: M = M − S 15: while |S| &lt; L do 16: for each message m in M do 17:calculate score(m) according to Equa-</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>8 )</head><label>8</label><figDesc>score(m) = a * Importance(m)−(1−a) * Redundancy(m) (Importance(m) = Signif icance(i, F) * W (m, F) (9) Redundance(m, S) = avg m ∈SSimilarity(m, m )<ref type="bibr" target="#b13">(10)</ref> </figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: StreamGrid: Each cell of StreamGrid corresponds to a specific time interval and topic</figDesc><graphic coords="6,330.49,56.14,208.03,177.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Details of five time intervals with the highest activity during Sundance Film Festival 2013</figDesc><table><row><cell>Start</cell><cell>End</cell><cell>#Tweets</cell></row><row><cell>Thu Jan 17 23:00</cell><cell>Fri Jan 18 00:00</cell><cell>1545</cell></row><row><cell>Sat Jan 19 19:00</cell><cell>Sat Jan 19 20:00</cell><cell>1477</cell></row><row><cell cols="2">Mon Jan 21 19:00 Mon Jan 21 20:00</cell><cell>1247</cell></row><row><cell>Sun Jan 27 03:00</cell><cell>Sun Jan 27 08:00</cell><cell>3735</cell></row><row><cell cols="2">Wed Jan 23 18:00 Wed Jan 23 21:00</cell><cell>1910</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Examples of peaky and persistent topics Peaky Topics</figDesc><table><row><cell cols="2">Topic Representative Terms</cell><cell>Peakiness</cell><cell>#tweets</cell></row><row><cell>135</cell><cell>paris, hilton, Blackfish, cnn, films</cell><cell>0.358</cell><cell>695</cell></row><row><cell>133</cell><cell>death, drink, countryman, sundance, charlie</cell><cell>0.247</cell><cell>588</cell></row><row><cell>11</cell><cell>lovelace, amanda, seyfried, portraits, premiere</cell><cell>0.161</cell><cell>1293</cell></row><row><cell>50</cell><cell>defeat, inevitable, pete, mister, film</cell><cell>0.143</cell><cell>267</cell></row><row><cell>29</cell><cell>butch, dynamite, android, worth, apps</cell><cell>0.123</cell><cell>323</cell></row><row><cell></cell><cell>Persistent Topics</cell><cell></cell><cell></cell></row><row><cell cols="2">Topic Representative Terms</cell><cell cols="2">Persistence #tweets</cell></row><row><cell>63</cell><cell>hemingway, running, follow, crazy , marshall</cell><cell>3.963</cell><cell>2494</cell></row><row><cell>75</cell><cell>jehane, square, girlrising, premiere, screening</cell><cell>2.650</cell><cell>500</cell></row><row><cell>108</cell><cell>vhs, sequel , horror, review , time</cell><cell>2.318</cell><cell>469</cell></row><row><cell>45</cell><cell>afar, week, enjoy, ways, kicked</cell><cell>1.612</cell><cell>127</cell></row><row><cell>17</cell><cell>lindsay, lohan, canyons, blame, snubbed</cell><cell>1.557</cell><cell>343</cell></row><row><cell cols="2">ably sparser StreamGrid as the bursty periods in this</cell><cell></cell><cell></cell></row><row><cell cols="2">subset are much fewer. An example of a multimedia</cell><cell></cell><cell></cell></row><row><cell cols="2">summary using StreamGrid for the Awards Ceremony</cell><cell></cell><cell></cell></row><row><cell cols="2">is shown in Figure 5. Comparing the StreamGrid-</cell><cell></cell><cell></cell></row><row><cell cols="2">based multimedia summaries with the ones produced</cell><cell></cell><cell></cell></row><row><cell cols="2">by the popular images (6), we observe that Stream-</cell><cell></cell><cell></cell></row><row><cell cols="2">Grid does not perform noticeably better in this task.</cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements: This work is supported by the SocialSensor FP7 project, partially funded by the EC under contract number 287975.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://blog.twitter.com/2014/celebrating-sb48-on-twitter" />
		<title level="m">Celebrating #SB48 on Twitter</title>
				<imprint>
			<date type="published" when="2014-02-27">2014. 27-Feb-2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Timelines as summaries of popular scheduled events</title>
		<author>
			<persName><forename type="first">O</forename><surname>Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shiells</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd international conference on World Wide Web companion</title>
				<meeting>the 22nd international conference on World Wide Web companion</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1037" to="1044" />
		</imprint>
	</monogr>
	<note>International World Wide Web Conferences Steering Committee</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Latent dirichlet allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
		<ptr target="http://t.co/VmOP3tmg" />
	</analytic>
	<monogr>
		<title level="m">1) Method Examples tf • idf Profound comment from @JoKiefer : Looper storyline echoes war on terror. Kill the terrorist before he becomes one? #Sundance13 #dirtywars #Sundance Institute Mahindra Global Filmmaking Award winnners include UK co-prodcution: Eva Weber: Let the Northern Lights Erase your Name #Sundance Institute Mahindra Global Filmmaking Award winnners include UK co-prodcution: Eva Weber: Let the Northern Lights Erase your Name #PussyRiot -A Punk Prayer takes home a World Cinema Doc Special Jury Award, directors Mike Lerner &amp; Maxim Pozdorovkin</title>
				<imprint>
			<date type="published" when="2003-03">Mar. 2003</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="993" to="1022" />
		</imprint>
	</monogr>
	<note>Table 3: Summaries during awards ceremony (4 th line in Table</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">was really eye-opening. I had no idea how many brave men and women have died trying to put Bibles in hotel rooms. #Sundance LexRank Yes! Audience</title>
		<author>
			<persName><forename type="first">Gideon's</forename><surname>Army</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Award</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">S</forename></persName>
		</author>
		<ptr target="http://t.co/VmOP3tmg" />
	</analytic>
	<monogr>
		<title level="m">Blood Brother&apos; wins both Grand Jury and Audience Award for U.S. Documentary #sundance it&apos;s coming FRUITVALE wins the #Sundance Grand Jury Prize AND the Audience Award. Could not be happier. Congrats @fruitvalemovie and Ryan Coogler! Popularity #PussyRiot -A Punk Prayer takes home a World Cinema Doc Special Jury Award</title>
				<editor>
			<persName><forename type="first">Mike</forename><surname>Lerner</surname></persName>
		</editor>
		<imprint/>
	</monogr>
	<note>Predict Grand Jury Prize, too</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Crystal Fairy</title>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">'</forename></persName>
		</author>
		<ptr target="http://t.co/sKNn1Dqf" />
	</analytic>
	<monogr>
		<title level="m">Wins #Sundance World Cinema Dramatic Directing Award</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Fruitvale</title>
		<author>
			<persName><forename type="first">Ryan</forename><surname>Coogler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">'</forename></persName>
		</author>
		<ptr target="http://t.co/1Ouz2B7a" />
	</analytic>
	<monogr>
		<title level="m">wins the World Cinema Dramatic Audience award at the Sundance Film Festival -via @goldenglobes &quot;The Spectacular Now&quot; Wins #Sundance U. S. Dramatic Special Jury Award for actors Miles Teller Shailene Woodley</title>
				<imprint/>
	</monogr>
	<note>Wins #Sundance U.S. Dramatic Audience Award</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">s recorded speech: singing Hava Nagila while warping his face in Photo Booth. Word. #Sundance My pics of</title>
		<author>
			<persName><forename type="first">Streamgrid</forename><surname>Sebastian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Silva</forename></persName>
		</author>
		<ptr target="http://t.co/iyoeHuGz" />
	</analytic>
	<monogr>
		<title level="m">The Spectacular Now&quot; #Sundance Q&amp;A , winner Special Jury award for acting to Miles Teller &amp; Shailene Woodley</title>
				<imprint/>
	</monogr>
	<note>Fruitvale&quot; (dramatic) &amp; &quot;Blood Brother. doc) FilmLinc list of winners</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Event summarization using tweets</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chakrabarti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Punera</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICWSM</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A visual backchannel for largescale events. Visualization and Computer Graphics</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dork</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gruen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Williamson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Carpendale</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1129" to="1138" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Lexrank: Graphbased lexical centrality as salience in text summarization</title>
		<author>
			<persName><forename type="first">G</forename><surname>Erkan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Radev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Artif. Int. Res</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="457" to="479" />
			<date type="published" when="2004-12">Dec. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Generating event storylines from microblogs</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM &apos;12</title>
				<meeting>the 21st ACM International Conference on Information and Knowledge Management, CIKM &apos;12<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="175" to="184" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Why is &quot;sxsw&quot; trending? exploring multiple text sources for twitter topic summarization</title>
		<author>
			<persName><forename type="first">F</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Weng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACL Workshop on Language in Social Media (LSM)</title>
				<meeting>the ACL Workshop on Language in Social Media (LSM)</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="66" to="75" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Twitinfo: Aggregating and visualizing microblogs for event exploration</title>
		<author>
			<persName><forename type="first">A</forename><surname>Marcus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Badar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Karger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI &apos;11</title>
				<meeting>the SIGCHI Conference on Human Factors in Computing Systems, CHI &apos;11<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="227" to="236" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Improving lda topic models for microblogs via tweet pooling and automatic labeling</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mehrotra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sanner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Buntine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Xie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;13</title>
				<meeting>the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;13<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="889" to="892" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Summarizing sporting events using twitter</title>
		<author>
			<persName><forename type="first">J</forename><surname>Nichols</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mahmud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Drews</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces, IUI &apos;12</title>
				<meeting>the 2012 ACM International Conference on Intelligent User Interfaces, IUI &apos;12<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="189" to="198" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Centroid-based summarization of multiple documents</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Radev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Styś</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inf. Process. Manage</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="919" to="938" />
			<date type="published" when="2004-11">Nov. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Eventsense: Capturing the pulse of large-scale events by mining social media streams</title>
		<author>
			<persName><forename type="first">E</forename><surname>Schinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Papadopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Diplaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kompatsiaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Herzig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Boudakidis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Panhellenic Conference on Informatics, PCI &apos;13</title>
				<meeting>the 17th Panhellenic Conference on Informatics, PCI &apos;13<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="17" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Peaks and persistence: Modeling the shape of microblog conversations</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Shamma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kennedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">F</forename><surname>Churchill</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work, CSCW &apos;11</title>
				<meeting>the ACM 2011 Conference on Computer Supported Cooperative Work, CSCW &apos;11<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="355" to="358" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Summarizing microblogs automatically</title>
		<author>
			<persName><forename type="first">B</forename><surname>Sharifi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Hutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kalita</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT &apos;10</title>
				<meeting><address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="685" to="688" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A participant-based approach for event summarization using twitter streams</title>
		<author>
			<persName><forename type="first">C</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Weng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NAACL-HLT</title>
				<meeting>NAACL-HLT</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1152" to="1162" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Sumblr: Continuous summarization of evolving tweet streams</title>
		<author>
			<persName><forename type="first">L</forename><surname>Shou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;13</title>
				<meeting>the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;13<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="533" to="542" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Evaluation methods for topic models</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Wallach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Murray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th International Conference on Machine Learning (ICML)</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Littman</surname></persName>
		</editor>
		<meeting>the 26th International Conference on Machine Learning (ICML)</meeting>
		<imprint>
			<publisher>Omnipress</publisher>
			<date type="published" when="2009-06">Montreal. June 2009</date>
			<biblScope unit="page" from="1105" to="1112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Generating pictorial storylines via minimum-weight connected dominating set approximation in multiview graphs</title>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ogihara</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI&apos;12</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="1" to="1" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
