<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">TwiBiNG: A Bipartite News Generator Using Twitter</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Yashvardhan</forename><surname>Sharma</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science Birla Institute of Technology &amp; Science Pilani</orgName>
								<address>
									<postCode>333 031</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Divyansh</forename><surname>Bhatia</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science Birla Institute of Technology &amp; Science Pilani</orgName>
								<address>
									<postCode>333 031</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vivek</forename><forename type="middle">Kishore</forename><surname>Choudhary</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Department of Computer Science Birla Institute of Technology &amp; Science Pilani</orgName>
								<address>
									<postCode>333 031</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">TwiBiNG: A Bipartite News Generator Using Twitter</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">BD2ABAA467F8AEBCE5BA1EF8D8A50B00</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Online Journalism is being seen as future of Journalism. News Professionals are vying to capture newsworthy stories that emerge from crowd. Live Social Media especially Twitter is generating enormous volumes of data every minute. It becomes difficult to select credible and relevant tweets that may form quality news among others. The problem intensifies due to the freedom of Twitter being an informal language. Generating headlines by solving this problem may still not be relevant and may face the question of authenticity. Given a set of keywords and a time period this problem becomes manageable and can be solved efficiently. We propose a bipartite algorithm that clusters authentic tweets based on key phrases and ranks the clusters based on trends in each timeslot. Finally, we present an approach to select those topics which have sufficient content to form a story</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Journalism is the state of art that disseminates information and provides analysis of news to the general Copyright © by the paper's authors. Copying permitted only for private and academic purposes.</p><p>In: S. Papadopoulos, D. Corney, L. Aiello (eds.): Proceedings of the SNOW 2014 Data Challenge, Seoul, Korea, 08-04-2014, published at http://ceur-ws.org public. With the advent of Web 2.0 most of the journalism has gone the online way innovating the term "Online Journalism". Since users of the web are ready to share each and every activity they do in their lives due to the free nature of the world, this has made professionals content hungry. Twitter generates an amount of information that can outrun the storage space of many servers in a few months. Developing a user centered tool that can process this information in real time has become need of the day for professional journalists.</p><p>From the Arab Spring to the Oscars 2014 Selfie tweets have changed the way the world shares information. Scholars today can predict election results better than ever before <ref type="bibr" target="#b0">[Ocon10]</ref>. The "#" Hashtag feature in Twitter has made event stories easier to capture <ref type="bibr" target="#b1">[Zan11]</ref>. As a result social network mining, originally loaded with clustering and classification of online worlds, is leveraging on understanding evolution of real-world events <ref type="bibr" target="#b2">[Dom05]</ref>.Adding another feather to its cap is the fact that newspaper and magazines have started publishing content on social media sites like Twitter and Facebook. To summarize, news no longer breaks it tweets (Solis) <ref type="bibr" target="#b3">[Sol10]</ref>.</p><p>The goal of this paper is to demonstrate the use of Twitter to monitor headlines online and generate news stories. We propose a standalone system TwiB-iNG to extract tweets related to user defined keywords and propose ranked news summaries based on trend and relevance of tweets they contain. The key novelty behind TwiBiNG is generation of Bi-partitite clusters of tweet intentions and use of Longest common-subsequence(LCS) algorithm along with a few tweet creator's details to separate relevant tweets from irrele-vant ones. This approach not only produces better clusters but also generates stories that are authentic, contains less spam and more importantly are distinct from each other. Also since we base our approach on intention of tweets it makes it language independent. Readers should note that by intention we refer to the general subject of tweet; not the intention of the user posting it. The selected datasets were developed from tweets collected between Tue 25 Feb, 18:00 GMT and Wed 26 Feb, 18:00 GMT based on keywords "Syria","Ukraine","Terror","Bitcoin". We collected 1,041,062 unique tweets from 556,295 users which included 648,651 retweets and 135,141 replies. The crawl also included messages sent from or to a set of around 5000 journalists/commentators.</p><p>In short our contributions can be summarized as:</p><p>X We incorporated retweets in BNgrams clustering <ref type="bibr" target="#b4">[Aie13]</ref> and hence improved upon the trend ranking of keywords.</p><p>X We clustered our tweets based on bipartitite graph thereby clubbing similar intention tweets together.</p><p>X We reduced the effect of informal text in Twitter by using LCS based similarity score while dealing with keywords.</p><p>X We presented news headlines by ranking clustered tweets based on relevance to the clustered keyword set and use 'Part Of Speech' tagger to make them readable.</p><p>The remainder of the paper is organized as follows: In Section 2 we take a look at existing algorithms and approaches.Section 3 details about proposed methodologies and approaches. Section 4 provides a discussion of results. Section 5 concludes the work by laying a foundation for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>The work of generating headlines using social media can be seen as a combination of two branches 1) Information Retrieval and Text Mining and 2) Natural Language Processing. Scholars have worked extensively on Twitter data using both the fields. Here we present an overview of existing approaches in both fields:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Text Mining on Twitter Content</head><p>Twitter has its own conventions for language while (@) is used to mention user, (#) is used to identify events and "RT" is used to represent a retweet. Bifet and Frank <ref type="bibr" target="#b5">[Bif10]</ref> use these features for opinion mining. Zhao et al. <ref type="bibr" target="#b6">[Zha11]</ref> develop a Twitter-LDA model through content analysis. The restricted length (140 characters) and informal text are some issues that pose problems to many text mining researchers <ref type="bibr">(Hong and Davison [Hon10]</ref>). Bollen et al. <ref type="bibr" target="#b8">[Bol11]</ref> used terms expressing positive and negative behavior for sentiment analysis on Twitter. Text Clustering is another where scholars have worked for content analysis.</p><p>Goyal and Mehala <ref type="bibr" target="#b9">[Goy13]</ref> presented an approach to find conceptually related queries by clustering on bipartite and tripartite graphs. We try to propose a similar approach for Twitter content analysis using Bipartite graph. <ref type="bibr" target="#b4">[Aie13]</ref> proposes trend based tweet clustering approaches. We present an approach that uses a modified BNgram clustering approach, which has motivation from original approach of <ref type="bibr" target="#b4">[Aie13]</ref>. Phuvipadawat and Murata <ref type="bibr" target="#b10">[Phu10]</ref> present a breaking news prediction algorithm that clusters tweets based on First Story detection after segmenting different stories. TwitterStand <ref type="bibr" target="#b12">[San09]</ref> develops a "leader-follower" text clustering algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Natural Language Processing</head><p>Headline Generation has been active area of research among NLP researchers. Most of the scholars work here by selecting a proper set of keywords and finding a way to combine them in a way that forms a grammatically coherent and meaningful sentence. In Banko et al. <ref type="bibr" target="#b13">[Ban00]</ref> authors present a statistical approach to term selection and term ordering process that depicts the power of non-extractive summarization whereas Jin and Hauptman <ref type="bibr" target="#b14">[Jin01]</ref> presents an approach for extractive summarization along with a Bayesian approach. They also discuss various issues in keyword selection for headline generation. We use Part of speech tagging along with most relevant tweet identification to generate meaningful user readable headline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Methodology</head><p>We divide our process in four phases 1) Data preparation, 2) Data Clustering 3) Cluster Ranking, 4) Tweet Ranking and Headline generation. We will now describe our TwiBiNG system phase by phase:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data Preparation</head><p>Once the data set for a given timeslot is ready by extracting tweets related to a given set of seeds and keywords, we tag entities in tweets using Stanford's Partof-Speech Tagger and extract nouns, HashTags, Users. We ignore other parts of speech, thereby concentrating more on the subject than the predicate. This is because in a given timeslot, it is difficult for predicate to change rapidly for the same subject while the reverse may not be true. These tagged words are referred as key phrases (KP) from now on. We now decide on trending keywords.</p><p>We rank keywords using a modified df-idft <ref type="bibr" target="#b4">[Aie13]</ref> score by incorporating retweets:</p><formula xml:id="formula_0">R(k i ) = Ri−Ri−1 max(Ri,Ri−1) Score(k i ) = t i * log(1 + R(ki+1) ti−1+1 )</formula><p>Here R i represents number of retweets for keyword k in timeslot i and t i represents number of tweets for keyword k. Since a keyword may be related to unbounded number of tweets and retweets in a timeslot deciding on threshold is difficult. Therefore, we decided to normalize the score for each keyword using min-max normalization. Let &lt; K &gt; be the set of tweets in a slot i then normalized score is given by:</p><formula xml:id="formula_1">N ormalizedScore(N K i ) = Score(k i ) − min(Score(&lt; K &gt;)) max(Score(&lt; K &gt;)) − min(Score(&lt; K &gt;))</formula><p>The threshold for these normalized keywords was decided to be 0.0075 through experiments. We select the keywords above this threshold and store them in a set (S i ). We observed that for each timeslot at this threshold we get around 800-875 trending keywords. Once this set was ready we assigned tweets to each keyword, i.e. we reversed the bipartite graph of Figure <ref type="figure">1</ref>. We now filter the tweets based on user details specifically number of followers and status counts. This step is necessary in order to increase authenticity and reduce tweets containing spamming content. Since clustering is based on tweet intention, not performing the previous step may hamper clustering performance. Also the generated stories may not be considered quality news. Our experiments based on (Hutto et. al. <ref type="bibr" target="#b19">[Hut13]</ref>) decided that users with a follower count&gt;600 and tweet count&gt;6000 may be considered authentic and considering tweets by these users alone will significantly improve system performance. Now since we are building a user centered news generator we want tweets related to the keywords defined by user to improve relevancy. For this purpose we scan all keywords in (S i ) and compute their Similarity with user-defined keywords (U i ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>LCS(S i , U</head><formula xml:id="formula_2">i ) = LongestCommonSubsequence(S i , U i )</formula><p>If any LCS(S i , U i ) contains U i then we include all the tweets related to S i in set &lt; T U i &gt; which contains tweet ids related to user centered keywords. We scan the database for the timeslot again and remove those tweets which are not contained in &lt; T U i &gt; (usercentric tweets). At the end of this stage we end up with a set of tweets and related keywords that can be considered authentic for a news story.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Intention based Tweet Clustering</head><p>We use the approach used in <ref type="bibr" target="#b9">[Goy13]</ref> to use bipartite clustering of tweets. The basic aim here is to get real intention of tweets in clusters. Algorithm 1 presents an incremental bipartite algorithm to cluster tweets and keywords. Once we have a set of clusters we know the intention of tweets. As can be seen the threshold is kept &gt; 0.5, which signifies that keywords merged should have an intention similarity of more than 50%. Readers requiring more specific tweets to be clustered together may increase the similarity but this comes at a cost of duplicate tweets being merged together. As can be observed in Algorithm 1, since the clustering is on basis of basis of Intersection(T i ,T j ) there will be duplicate tweets in cluster but a news story containing a lot of duplicate tweets would be considered of poor quality. So removing duplicate content becomes a prime task now.</p><p>Data </p><formula xml:id="formula_3">&lt; F T S i &gt; = &lt; CT S i &gt;-&lt; D i &gt;; &lt; CS i , &lt; CT S i &gt;&gt; =&lt; CS i ,</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&lt; F T S i &gt;&gt; end</head><p>Algorithm 2: To remove Duplicate Tweets from Cluster</p><p>The motivation behind threshold of 0.65 in Algorithm 2 can be observed in O'Connor <ref type="bibr" target="#b20">[Oco10]</ref>. We end this phase with a cluster of keywords and their relevant set of tweets. So now we know the intention of our keywords and we are ready to rank them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Cluster Ranking</head><p>Up until this phase we have obtained required set of clusters. We now need to rank them. Although different authors <ref type="bibr" target="#b15">[Yaj12]</ref>[Hav03] <ref type="bibr" target="#b17">[Shu11]</ref> have proposed efficient topic ranking methods they have a common feature that relevance to considered keywords is considered an important issue. We make use of this fact and of normalized trend score to generate a ranking score for clusters. Since we are vying for a user centric tool our clusters should be most relevant to their intention. Also since we have to generate headlines trend needs a special attention. Keeping the above two facts we present our cluster ranking methodology. Using &lt; U i &gt; we collected tweets for relevant keywords in section 3.1 as set &lt; T U i &gt;. We calculate Relevancy of cluster CS i having tweets &lt; F S i &gt; as:</p><formula xml:id="formula_4">RCS i = Relevancy(CS i ) = M ax(Intersection(Ui,F Si) U nion(Ui,F Si)</formula><p>This relevancy score gives us an indication about the relation of cluster to the user's intention. (N ormalizedScoreof CSi)   This factor indicates that how much a cluster is trending. The idea of taking Max(Normalized Score of CS i ) has its Motivation from BNgram clustering approach used in <ref type="bibr" target="#b4">[Aie13]</ref>. Readers can think of T CS i as a boost factor for relevance.</p><formula xml:id="formula_5">T CS i = T rend(CS i ) = e −M ax</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ClusterScore(CScr i ) = RCS i * T CS i</head><p>We now rank the clusters based on (CScr i ). At the end of this phase we have ranked our clusters and to avoid any confusion further we now refer them as &lt; CS ir , &lt; F T S ir &gt;&gt;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Tweet Ranking in Clusters</head><p>Now once clusters are ranked we need to rank tweets contained in them in order to present them in most relevant order. Before introducing ranking calculations we need to introduce expanded keyword set. This can be seen as a prerequisite in the step of headline formation. This step is necessary and relevant since some of the clusters may contain a small number of keywords and need sufficient information to generate a story. We represent the expanded cluster set as &lt; ECS i &gt; . Let set &lt; K t &gt; represent set of keywords for tweet T i . Then relevance score for T i is calculated as</p><formula xml:id="formula_6">Score(T i) = Intersection(&lt; K t &gt;, &lt; ECS i &gt;) U nion(&lt; K t &gt;, &lt; ECS i &gt;)</formula><p>Now we rank our tweets based on Score(T i ). At the end of this phase, we filter out tweets which have a score(T i ) ¡ 0.3. The threshold 0.3 is based on the results of our experiments, as described in Table <ref type="table">2</ref>.</p><p>Increasing the threshold provides better quality stories but reduces the number of stories at a high rate.</p><p>Hence, readers requiring more focused stories may increase the threshold.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Cluster Selection and Headline Generation</head><p>In this phase we provide an approach to decide which clusters can form news. As can be observed not all clusters form a story, we must judiciously decide on clusters to form news. By experiments, we observed the following Heuristic may be used to select quality clusters: H3.5.1: Those clusters tend to form quality stories which contain at least four keywords, one Hashtag keyword, and is related to at least three tweets .Further , number of non Hashtag keywords should be more than Hashtag keywords. The rationale behind this approach can be explained. The clusters having excessive amounts of hashtags as keywords are usually related to tweets with almost similar content. Having a hashtag allows users to easily identify events and more than three distinct tweets allows us to form a sequence of events. Since, we are needed to identify a fixed number of topics, we follow H3.5.1 and scan all the clusters in &lt; Cs ir &gt; up until the specified number of clusters in each timeslot. Hence, we follow a dynamic approach that is independent of cluster count.</p><p>For Headline Generation we order the keywords in accordance to top ranked tweet in cluster and use POS tagger to connect the keywords. We believe that better approaches to form headlines exist, but we were dealing with informal language so we need to take support from tweet intent to form them. Readers may improve upon this aspect by considering statistical techniques mentioned in section 2.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results and discussion</head><p>Table <ref type="table" target="#tab_1">1</ref> depicts human evaluation of results as carried out by authors. The official evaluation results of our method in the Data Challenge are included in snow2014dc <ref type="bibr" target="#b18">[Pap14]</ref>. The language content shows that our topics were evenly distributed between English and non-English tweets. This is probably due to selection of keywords related to Syria and Ukraine, which allowed foreign phrases to come in the dataset. News Headline Readability being a highly subjective attribute, needs to be evaluated manually. A News Headline is considered readable if majority of the users accessing the system can comprehend it without the use of other resources. Further, it can be observed that 81.60% of our topics were labeled readable by language experts. The images related to the extracted tweets were found to symbolize the news story with 97.67% accuracy.</p><p>Table <ref type="table">2</ref> represents the number of topical clusters with increasing score(Ti) threshold. As can be observed, number of clusters decrease at a high rate with respect to the threshold value. Thereby, allowing us to select 0.3 as our base threshold. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>&lt; CS i , &lt; CT S i &gt;&gt; Set of tweets in a cluster of keywords CS i Result: : &lt; CS i , &lt; F T S i &gt;&gt; Final Set of tweets and clusters while cs i in CS i do while t i in CT S i do</figDesc><table><row><cell>plicate tweets from cluster:</cell></row><row><cell>Data: j=i+1</cell></row><row><cell>if &lt; D i &gt;.contains&lt; t j &gt; = false then</cell></row><row><cell>while t j in CT S j do</cell></row><row><cell>sim(t i , t j )=</cell></row><row><cell>LCS(t i , t j )/Min(t i .length,t j .length)</cell></row><row><cell>if sim(t i , t j ) &gt; 0.65 then</cell></row><row><cell>&lt; D i &gt;.add(t j );</cell></row><row><cell>end</cell></row><row><cell>end</cell></row><row><cell>end</cell></row><row><cell>end</cell></row><row><cell>set of</cell></row><row><cell>tweets</cell></row><row><cell>Let S: represent set of unique keywords</cell></row><row><cell>while clusters exist with similarity &gt; threshold do</cell></row><row><cell>flag=0;</cell></row><row><cell>while s i in S do</cell></row><row><cell>j=i+1;</cell></row><row><cell>while t j in T do</cell></row><row><cell>Sim(s Remove s j from I flag=1;</cell></row><row><cell>end</cell></row><row><cell>end</cell></row><row><cell>if flag=0 then</cell></row><row><cell>b</cell></row><row><cell>end</cell></row><row><cell>reak;</cell></row><row><cell>end</cell></row><row><cell>end</cell></row><row><cell>Algorithm 1: Bipartite Clustering of Tweets using</cell></row><row><cell>Keywords</cell></row><row><cell>In Algorithm 2 we present an algorithm to remove du-</cell></row></table><note>: I&lt; S i , &lt; T S i &gt;&gt; S i and T S i denotes a set of keywords and related tweets Result: O&lt; CS i , &lt; CT S i &gt;&gt; clustered i ,s j ) =Intersection(T s i ,T s j )/Union(T s i ,T s j ); if Sim (s i , s j ) &gt; 0.5 then I&lt; s i , &lt; T s i &gt;&gt; = I&lt; s i = s j , &lt; U nion(T s i , T s j ) &gt;&gt;</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>Human Evaluation of topics</figDesc><table><row><cell>Language</cell><cell></cell><cell></cell><cell cols="3">English Non-English 256 282</cell></row><row><cell cols="3">News Headline Readability</cell><cell cols="2">Good Bad</cell><cell>439 99</cell></row><row><cell cols="2">Topics with images</cell><cell></cell><cell cols="2">Related Unrelated</cell><cell>84 2</cell></row><row><cell cols="6">Table 2: Number of clusters v/s Score(Ti) Threshold</cell></row><row><cell cols="6">Threshold 0.25 0.30 0.35 0.40</cell></row><row><cell>No. of Clusters</cell><cell>754</cell><cell cols="2">538</cell><cell>467</cell><cell>261</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>represents sample topics along with Headline, timestamp, related tweets and set of keywords. The readers should note that not all the tweets in the story are covered, but only the most relevant are shown for clarity.These results show an improved performance over previously existing systems. A limitation of this system is not including user's community which may have allowed us to form tripartite clustering, thereby improving clustering quality at a low cost. Use of better known String matching algorithms may improve cluster quality. Our use of bipartite clustering algorithm can allow future researchers to explore more into this field.</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Acknowledgement</head><p>Authors owe a debt of gratitude to Dr. P. Goyal and Dr. N. Mehala for their constructive criticism and innovative ideas that formed the foundation of this study. We would like to extend special thanks Birla Institute of Technology and Science for providing resources without which this work would never have been completed. We would like to thank SNOW'14 organizers for giving us a chance to work on social sensor project and for their immediate follow up in cases of difficulty.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">From tweets to polls: Linking text sentiment to public opinion time series</title>
		<author>
			<persName><forename type="first">B</forename><surname>O'connor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Balasubramanyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Routledge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICWSM</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="122" to="129" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Recommending#-tags in Twitter</title>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Gassler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Specht</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Semantic Adaptive Social Web (SASWeb 2011)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Workshop on Semantic Adaptive Social Web (SASWeb 2011)</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="volume">730</biblScope>
			<biblScope unit="page" from="67" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Mining social networks for viral marketing</title>
		<author>
			<persName><forename type="first">P</forename><surname>Domingos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Intelligent Systems</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="80" to="82" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">The information divide between traditional and new media</title>
		<author>
			<persName><forename type="first">B</forename><surname>Solis</surname></persName>
		</author>
		<ptr target="http://www.briansolis.com/2010/02/the-information-divide-the-socialization-of-news-and-dissemination/" />
		<imprint>
			<date type="published" when="2010-03-16">2010. March 16, 2014</date>
		</imprint>
	</monogr>
	<note>Internet Draft</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Sensing trending topics in Twitter. Multimedia</title>
		<author>
			<persName><forename type="first">L</forename><surname>Aiello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Petkos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Corney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Papadopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Skraba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Goker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kompatsiaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jaimes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page">2681282</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Sentiment knowledge discovery in Twitter streaming data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bifet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Discovery Science</title>
				<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1" to="15" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Comparing Twitter and traditional media using topic models</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Lim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval</title>
				<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="338" to="349" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Empirical study of topic modeling in Twitter</title>
		<author>
			<persName><forename type="first">L</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">D</forename><surname>Davison</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Social Media Analytics</title>
				<meeting>the First Workshop on Social Media Analytics</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="80" to="88" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bollen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pepe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICWSM</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A robust approach for finding conceptually related queries using feature selection and tripartite graph structure</title>
		<author>
			<persName><forename type="first">P</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mehala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bansal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Information Science</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="575" to="592" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Breaking news detection and tracking in Twitter</title>
		<author>
			<persName><forename type="first">S</forename><surname>Phuvipadawat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Murata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Web Intelligence and Intelligent Agent Technology</title>
				<imprint>
			<publisher>WI-IAT</publisher>
			<date type="published" when="2010">2010. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m">IEEE/WIC/ACM International Conference on</title>
				<imprint>
			<publisher>IEEE</publisher>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="120" to="123" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Twitterstand: news in tweets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Sankaranarayanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Samet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">E</forename><surname>Teitler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Lieberman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sperling</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems</title>
				<meeting>the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="42" to="51" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Headline generation based on statistical translation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">O</forename><surname>Mittal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Witbrock</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th Annual Meeting on Association for Computational Linguistics</title>
				<meeting>the 38th Annual Meeting on Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="318" to="325" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Generation Using a Training Corpus</title>
		<author>
			<persName><forename type="first">R</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Hauptmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Text Processing</title>
				<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="208" to="215" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Twitter topic summarization by ranking tweets using social influence and content quality</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">U A N</forename><surname>Yajuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">Z</forename><surname>Weif Uru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">M</forename><surname>Heung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th International Conference on Computational Linguistics</title>
				<meeting>the 24th International Conference on Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="763" to="780" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Haveliwala</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="784" to="796" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
	<note>IEEE Transactions on</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">An efficient algorithm for topic ranking and modeling topic evolution</title>
		<author>
			<persName><forename type="first">K</forename><surname>Shubhankar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pudi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Database and Expert Systems Applications</title>
				<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="320" to="330" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">SNOW 2014 Data Challenge: Assessing the Performance of News Topic Detection Methods in Social Media</title>
		<author>
			<persName><forename type="first">S</forename><surname>Papadopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Corney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Aiello</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the SNOW 2014 Data Challenge</title>
				<meeting>the SNOW 2014 Data Challenge</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A longitudinal study of follow predictors on twitter</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Hutto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gilbert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</title>
				<meeting>the SIGCHI Conference on Human Factors in Computing Systems</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="821" to="830" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Tweet-Motif: Exploratory Search and Topic Summarization for Twitter</title>
		<author>
			<persName><forename type="first">B</forename><surname>O'connor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krieger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ahn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th Int&apos;l AAAI Conference on Weblogs and Social Media</title>
				<meeting>the 4th Int&apos;l AAAI Conference on Weblogs and Social Media</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
