<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TwiBiNG: A Bipartite News Generator Using Twitter</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Yashvardhan Sharma Divyansh Bhatia Department of Computer Science Department of Computer Science Birla Institute of Technology &amp; Science Birla Institute of Technology &amp; Science Pilani</institution>
          ,
          <addr-line>India 333 031 Pilani</addr-line>
          ,
          <country country="IN">India</country>
          <addr-line>333 031</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Online Journalism is being seen as future of
Journalism. News Professionals are vying to
capture newsworthy stories that emerge from
crowd. Live Social Media especially Twitter
is generating enormous volumes of data every
minute. It becomes difficult to select
credible and relevant tweets that may form quality
news among others. The problem intensifies
due to the freedom of Twitter being an
informal language. Generating headlines by
solving this problem may still not be relevant and
may face the question of authenticity. Given a
set of keywords and a time period this problem
becomes manageable and can be solved
efficiently. We propose a bipartite algorithm that
clusters authentic tweets based on key phrases
and ranks the clusters based on trends in each
timeslot. Finally, we present an approach to
select those topics which have sufficient
content to form a story
Journalism is the state of art that disseminates
information and provides analysis of news to the general
public. With the advent of Web 2.0 most of the
journalism has gone the online way innovating the term
”Online Journalism”. Since users of the web are ready
to share each and every activity they do in their lives
due to the free nature of the world, this has made
professionals content hungry. Twitter generates an
amount of information that can outrun the storage
space of many servers in a few months. Developing
a user centered tool that can process this information
in real time has become need of the day for professional
journalists.</p>
      <p>From the Arab Spring to the Oscars 2014 Selfie
tweets have changed the way the world shares
information. Scholars today can predict election results
better than ever before [Ocon10]. The ”#” Hashtag
feature in Twitter has made event stories easier to
capture [Zan11]. As a result social network mining,
originally loaded with clustering and classification of
online worlds, is leveraging on understanding evolution
of real-world events [Dom05].Adding another feather
to its cap is the fact that newspaper and magazines
have started publishing content on social media sites
like Twitter and Facebook. To summarize, news no
longer breaks it tweets (Solis)[Sol10].</p>
      <p>The goal of this paper is to demonstrate the use
of Twitter to monitor headlines online and generate
news stories. We propose a standalone system
TwiBiNG to extract tweets related to user defined keywords
and propose ranked news summaries based on trend
and relevance of tweets they contain. The key novelty
behind TwiBiNG is generation of Bi-partitite clusters
of tweet intentions and use of Longest
common-subsequence(LCS) algorithm along with a few tweet
creator’s details to separate relevant tweets from
irrelevant ones. This approach not only produces better
clusters but also generates stories that are authentic,
contains less spam and more importantly are distinct
from each other. Also since we base our approach
on intention of tweets it makes it language
independent. Readers should note that by intention we refer
to the general subject of tweet; not the intention of
the user posting it. The selected datasets were
developed from tweets collected between Tue 25 Feb,
18:00 GMT and Wed 26 Feb, 18:00 GMT based on
keywords ”Syria”,”Ukraine”,”Terror”,”Bitcoin”. We
collected 1,041,062 unique tweets from 556,295 users
which included 648,651 retweets and 135,141 replies.
The crawl also included messages sent from or to a set
of around 5000 journalists/commentators.</p>
      <p>In short our contributions can be summarized as:
We incorporated retweets in BNgrams clustering
[Aie13] and hence improved upon the trend
ranking of keywords.</p>
      <p>We clustered our tweets based on bipartitite
graph thereby clubbing similar intention tweets
together.</p>
      <p>We reduced the effect of informal text in Twitter
by using LCS based similarity score while dealing
with keywords.</p>
      <p>We presented news headlines by ranking clustered
tweets based on relevance to the clustered
keyword set and use ‘Part Of Speech’ tagger to make
them readable.</p>
      <p>The remainder of the paper is organized as follows: In
Section 2 we take a look at existing algorithms and
approaches.Section 3 details about proposed
methodologies and approaches. Section 4 provides a discussion
of results. Section 5 concludes the work by laying a
foundation for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>The work of generating headlines using social media
can be seen as a combination of two branches 1)
Information Retrieval and Text Mining and 2) Natural
Language Processing. Scholars have worked extensively on
Twitter data using both the fields. Here we present an
overview of existing approaches in both fields:
2.1</p>
      <sec id="sec-2-1">
        <title>Text Mining on Twitter Content</title>
        <p>Twitter has its own conventions for language while
(@) is used to mention user, (#) is used to identify
events and ”RT” is used to represent a retweet. Bifet
and Frank [Bif10] use these features for opinion
mining. Zhao et al.[Zha11] develop a Twitter-LDA model
through content analysis. The restricted length (140
characters) and informal text are some issues that pose
problems to many text mining researchers (Hong and
Davison [Hon10]). Bollen et al. [Bol11] used terms
expressing positive and negative behavior for sentiment
analysis on Twitter.</p>
        <p>Text Clustering is another where scholars have
worked for content analysis. Goyal and Mehala
[Goy13] presented an approach to find conceptually
related queries by clustering on bipartite and tripartite
graphs. We try to propose a similar approach for
Twitter content analysis using Bipartite graph. [Aie13]
proposes trend based tweet clustering approaches. We
present an approach that uses a modified BNgram
clustering approach, which has motivation from
original approach of [Aie13]. Phuvipadawat and Murata
[Phu10] present a breaking news prediction algorithm
that clusters tweets based on First Story detection
after segmenting different stories. TwitterStand [San09]
develops a ”leader-follower” text clustering algorithm.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Natural Language Processing</title>
        <p>Headline Generation has been active area of research
among NLP researchers. Most of the scholars work
here by selecting a proper set of keywords and finding
a way to combine them in a way that forms a
grammatically coherent and meaningful sentence. In Banko
et al.[Ban00] authors present a statistical approach to
term selection and term ordering process that depicts
the power of non-extractive summarization whereas
Jin and Hauptman [Jin01] presents an approach for
extractive summarization along with a Bayesian
approach. They also discuss various issues in keyword
selection for headline generation. We use Part of speech
tagging along with most relevant tweet identification
to generate meaningful user readable headline.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>We divide our process in four phases 1) Data
preparation, 2) Data Clustering 3) Cluster Ranking, 4) Tweet
Ranking and Headline generation. We will now
describe our TwiBiNG system phase by phase:
3.1</p>
      <sec id="sec-3-1">
        <title>Data Preparation</title>
        <p>Once the data set for a given timeslot is ready by
extracting tweets related to a given set of seeds and
keywords, we tag entities in tweets using Stanford’s
Partof-Speech Tagger and extract nouns, HashTags, Users.
We ignore other parts of speech, thereby concentrating
more on the subject than the predicate. This is
because in a given timeslot, it is difficult for predicate to
change rapidly for the same subject while the reverse
may not be true. These tagged words are referred as
key phrases (KP) from now on. We now decide on
trending keywords.</p>
        <p>We rank keywords using a modified df-idft [Aie13]
score by incorporating retweets:</p>
        <p>Ri Ri 1</p>
        <p>R(ki) = max(Ri;Ri 1)</p>
        <p>Score(ki) = ti log(1 + Rti(ki1++11) )
Here Ri represents number of retweets for keyword k
in timeslot i and ti represents number of tweets for
keyword k. Since a keyword may be related to
unbounded number of tweets and retweets in a timeslot
deciding on threshold is difficult. Therefore, we
decided to normalize the score for each keyword using
min-max normalization. Let &lt; K &gt; be the set of
tweets in a slot i then normalized score is given by:
N ormalizedScore(N Ki) =</p>
        <p>Score(ki) min(Score(&lt; K &gt;))
max(Score(&lt; K &gt;)) min(Score(&lt; K &gt;))
The threshold for these normalized keywords was
decided to be 0.0075 through experiments. We select the
keywords above this threshold and store them in a set
(Si). We observed that for each timeslot at this
threshold we get around 800-875 trending keywords. Once
this set was ready we assigned tweets to each keyword,
i.e. we reversed the bipartite graph of Figure 1. We
now filter the tweets based on user details specifically
number of followers and status counts. This step is
necessary in order to increase authenticity and reduce
tweets containing spamming content. Since clustering
is based on tweet intention, not performing the
previous step may hamper clustering performance. Also the
generated stories may not be considered quality news.
Our experiments based on (Hutto et. al. [Hut13])
decided that users with a follower count&gt;600 and tweet
count&gt;6000 may be considered authentic and
considering tweets by these users alone will significantly
improve system performance.</p>
        <p>Now since we are building a user centered news
generator we want tweets related to the keywords defined
by user to improve relevancy. For this purpose we scan
all keywords in (Si) and compute their Similarity with
user-defined keywords (Ui).</p>
        <p>If any LCS(Si; Ui) contains Ui then we include all the
tweets related to Si in set &lt; T Ui &gt; which contains
tweet ids related to user centered keywords. We scan
the database for the timeslot again and remove those
tweets which are not contained in &lt; T Ui &gt;
(usercentric tweets). At the end of this stage we end up
with a set of tweets and related keywords that can be
considered authentic for a news story.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Intention based Tweet Clustering</title>
        <p>We use the approach used in [Goy13] to use bipartite
clustering of tweets. The basic aim here is to get real
intention of tweets in clusters. Algorithm 1 presents
an incremental bipartite algorithm to cluster tweets
and keywords. Once we have a set of clusters we know
the intention of tweets. As can be seen the threshold
is kept &gt; 0.5, which signifies that keywords merged
should have an intention similarity of more than 50%.
Readers requiring more specific tweets to be clustered
together may increase the similarity but this comes at
a cost of duplicate tweets being merged together. As
can be observed in Algorithm 1, since the clustering
is on basis of basis of Intersection(Ti,Tj ) there will be
duplicate tweets in cluster but a news story
containing a lot of duplicate tweets would be considered of
poor quality. So removing duplicate content becomes
a prime task now.</p>
        <p>Data: I&lt; Si; &lt; T Si &gt;&gt; Si and T Si denotes a set
of keywords and related tweets
Result: O&lt; CSi; &lt; CT Si &gt;&gt; clustered set of
tweets
Let S: represent set of unique keywords
while clusters exist with similarity &gt; threshold do
flag=0;
while si in S do
j=i+1;
while tj in T do</p>
        <p>Sim(si,sj )
=Intersection(T si,T sj )/Union(T si,T sj );
if Sim (si; sj ) &gt; 0:5 then</p>
        <p>I&lt; si; &lt; T si &gt;&gt; =
I&lt; si = sj ; &lt; U nion(T si; T sj ) &gt;&gt;</p>
        <p>Remove sj from I flag=1;
end</p>
        <p>end
end
if ag=0 then</p>
        <p>b
end
reak;
end
Algorithm 1: Bipartite Clustering of Tweets using
Keywords
LCS(Si; Ui) = LongestCommonSubsequence(Si; Ui)
In Algorithm 2 we present an algorithm to remove
duplicate tweets from cluster:</p>
        <p>Data: &lt; CSi; &lt; CT Si &gt;&gt; Set of tweets in a cluster
of keywords CSi
Result: : &lt; CSi; &lt; F T Si &gt;&gt; Final Set of tweets
and clusters
while csi in CSi do
while ti in CT Si do
j=i+1
if &lt; Di &gt;.contains&lt; tj &gt; = false then
while tj in CT Sj do
sim(ti; tj )=
LCS(ti; tj )/Min(ti:length,tj :length)
if sim(ti; tj ) &gt; 0:65 then</p>
        <p>&lt; Di &gt;.add(tj );
end
end</p>
        <p>end
end
&lt; F T Si &gt; = &lt; CT Si &gt;-&lt; Di &gt;;
&lt; CSi; &lt; CT Si &gt;&gt; =&lt; CSi; &lt; F T Si &gt;&gt;
end
Algorithm 2: To remove Duplicate Tweets from
Cluster</p>
        <p>The motivation behind threshold of 0.65 in
Algorithm 2 can be observed in O’Connor [Oco10]. We
end this phase with a cluster of keywords and their
relevant set of tweets. So now we know the intention
of our keywords and we are ready to rank them.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Cluster Ranking</title>
        <p>Up until this phase we have obtained required set of
clusters. We now need to rank them. Although
different authors [Yaj12][Hav03][Shu11] have proposed
efficient topic ranking methods they have a common
feature that relevance to considered keywords is
considered an important issue. We make use of this fact and
of normalized trend score to generate a ranking score
for clusters. Since we are vying for a user centric tool
our clusters should be most relevant to their
intention. Also since we have to generate headlines trend
needs a special attention. Keeping the above two facts
we present our cluster ranking methodology. Using
&lt; Ui &gt; we collected tweets for relevant keywords in
section 3.1 as set &lt; T Ui &gt;. We calculate Relevancy of
cluster CSi having tweets &lt; F Si &gt; as:</p>
        <p>RCSi = Relevancy(CSi) = Max(Intersection(Ui;F Si)
Union(Ui;F Si)
This relevancy score gives us an indication about the
relation of cluster to the user’s intention.</p>
        <p>T CSi = T rend(CSi) = e Max(NormalizedScoreofCSi)
This factor indicates that how much a cluster is
trending. The idea of taking Max(Normalized Score of CSi)
has its Motivation from BNgram clustering approach
used in [Aie13]. Readers can think of T CSi as a boost
factor for relevance.</p>
        <p>ClusterScore(CScri) = RCSi T CSi
We now rank the clusters based on (CScri ). At the
end of this phase we have ranked our clusters and to
avoid any confusion further we now refer them as &lt;
CSir; &lt; F T Sir &gt;&gt;.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Tweet Ranking in Clusters</title>
        <p>Now once clusters are ranked we need to rank tweets
contained in them in order to present them in most
relevant order. Before introducing ranking calculations
we need to introduce expanded keyword set. This can
be seen as a prerequisite in the step of headline
formation. This step is necessary and relevant since some of
the clusters may contain a small number of keywords
and need sufficient information to generate a story. We
represent the expanded cluster set as &lt; ECSi &gt; . Let
set &lt; Kt &gt; represent set of keywords for tweet Ti.
Then relevance score for Ti is calculated as
Score(T i) =</p>
        <p>Intersection(&lt; Kt &gt;; &lt; ECSi &gt;)</p>
        <p>U nion(&lt; Kt &gt;; &lt; ECSi &gt;)
Now we rank our tweets based on Score(Ti). At the
end of this phase, we filter out tweets which have a
score(Ti) ¡ 0.3. The threshold 0.3 is based on the
results of our experiments, as described in Table 2.
Increasing the threshold provides better quality
stories but reduces the number of stories at a high rate.
Hence, readers requiring more focused stories may
increase the threshold.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Cluster Selection and Headline Generation</title>
        <p>In this phase we provide an approach to decide which
clusters can form news. As can be observed not all
clusters form a story, we must judiciously decide on
clusters to form news. By experiments, we observed
the following Heuristic may be used to select quality
clusters: H3.5.1: Those clusters tend to form quality
stories which contain at least four keywords, one
Hashtag keyword, and is related to at least three tweets
.Further , number of non Hashtag keywords should be
more than Hashtag keywords.</p>
        <p>The rationale behind this approach can be
explained. The clusters having excessive amounts of
hashtags as keywords are usually related to tweets with
almost similar content. Having a hashtag allows users
to easily identify events and more than three distinct
tweets allows us to form a sequence of events. Since,
we are needed to identify a fixed number of topics, we
follow H3.5.1 and scan all the clusters in &lt; Csir &gt; up
until the specified number of clusters in each timeslot.
Hence, we follow a dynamic approach that is
independent of cluster count.</p>
        <p>For Headline Generation we order the keywords in
accordance to top ranked tweet in cluster and use POS
tagger to connect the keywords. We believe that better
approaches to form headlines exist, but we were
dealing with informal language so we need to take support
from tweet intent to form them. Readers may improve
upon this aspect by considering statistical techniques
mentioned in section 2.2.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and discussion</title>
      <p>Table 1 depicts human evaluation of results as
carried out by authors. The official evaluation results
of our method in the Data Challenge are included
in snow2014dc [Pap14]. The language content shows
that our topics were evenly distributed between
English and non-English tweets. This is probably due
to selection of keywords related to Syria and Ukraine,
which allowed foreign phrases to come in the dataset.
News Headline Readability being a highly subjective
attribute, needs to be evaluated manually. A News
Headline is considered readable if majority of the users
accessing the system can comprehend it without the
use of other resources. Further, it can be observed
that 81.60% of our topics were labeled readable by
language experts. The images related to the extracted
tweets were found to symbolize the news story with
97.67% accuracy.</p>
      <p>Table 2 represents the number of topical clusters
with increasing score(Ti) threshold. As can be
observed, number of clusters decrease at a high rate with
respect to the threshold value. Thereby, allowing us
to select 0.3 as our base threshold.
are covered, but only the most relevant are shown for
clarity.These results show an improved performance
over previously existing systems. A limitation of this
system is not including user’s community which may
have allowed us to form tripartite clustering, thereby
improving clustering quality at a low cost. Use of
better known String matching algorithms may improve
cluster quality. Our use of bipartite clustering
algorithm can allow future researchers to explore more into
this field.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>Authors owe a debt of gratitude to Dr. P. Goyal and
Dr. N. Mehala for their constructive criticism and
innovative ideas that formed the foundation of this
study. We would like to extend special thanks Birla
Institute of Technology and Science for providing
resources without which this work would never have been
completed. We would like to thank SNOW’14
organizers for giving us a chance to work on social sensor
project and for their immediate follow up in cases of
difficulty.
HEADLINE
Syria alQaeda
leader gives
rivals ultimatum.</p>
      <p>Rivals,
alQaeda,
#Syria,
group,
ultimatum</p>
      <p>Ukraine
parliament</p>
      <p>wants</p>
      <p>Yanukovich
tried international</p>
      <p>court
(25-02-14 18:45)
Russian President</p>
      <p>Vladimir Putin</p>
      <p>ordered test
combat readiness</p>
      <p>for troops
stationed region
that touches</p>
      <p>Ukraines
northern
border
(26-02-14 17:30)
Ukraine leaders
disband riot
police who
kneel down
ask
forgiveness from
the people
(26-02-14 17:45)
Bitcoin turmoil
rumoured 375m
theft closes</p>
      <p>major
exchange.
(26-02-14 03:30)</p>
      <p>riot,
Ukraine,
police,
unit,
crackdown,</p>
      <p>Kiev,
protesters</p>
      <p>time,
website,
transactions,
being,
Bitcoin</p>
      <p>TWEETS
1) #Syria #Homs
#Aleppo Leader</p>
      <p>of Syrian
militant group
challenges rivals</p>
      <p>2) RT: Top
al-qaeda leader
abu khalid
alSuri was reportedly
killed by a
rival.#Syria
#ukraine Rada</p>
      <p>says try</p>
      <p>Yanukovich
before Int Crime
Court. Should be</p>
      <p>tried by
Ukrainians for
crimes against</p>
      <p>Ukrainians!
2) Yanukovich
papers:Snipers who
killed dozens of
protesters came
from Ukraine's
"omega" special
forces.#euromaiden
1) Putin orders
troops to prepare
in case of 'a crisis'</p>
      <p>in Ukraine as
tensions step up.</p>
      <p>Report on The
530 now @tv3News
2) Russia puts
troops on
alert amid
Ukraine tension.</p>
      <p>Not in my
wildest dreams I'd
imagine Arab police</p>
      <p>doing so
#Ukraine riot
police asking
forgiveness
from protesters</p>
      <p>The equivalent of
war when states are
in danger.</p>
      <p>Bitcoin
exchange fears
$400m theft #bitcoin</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>[Ocon10] O'Connor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balasubramanyan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Routledge</surname>
            ,
            <given-names>B. R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          (
          <year>2010</year>
          ),
          <article-title>From tweets to polls: Linking text sentiment to public opinion time series</article-title>
          , ICWSM,
          <volume>11</volume>
          ,
          <fpage>122</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Zan11]
          <string-name>
            <surname>Zangerle</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gassler</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2011</year>
          ). Recommending#
          <article-title>-tags in Twitter</article-title>
          .
          <source>In Proceedings of the Workshop on Semantic Adaptive Social Web (SASWeb</source>
          <year>2011</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          (Vol.
          <volume>730</volume>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>78</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Dom05]
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Mining social networks for viral marketing</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ),
          <fpage>80</fpage>
          -
          <lpage>82</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Sol10]
          <string-name>
            <surname>Solis</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>The information divide between traditional and new media</article-title>
          , http://www.briansolis.com/
          <year>2010</year>
          /02/
          <article-title>the-informationdivide-the-socialization-of-news-and-dissemination/</article-title>
          ,
          <source>Internet Draft (last accessed March</source>
          <volume>16</volume>
          ,
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Aie13]
          <string-name>
            <surname>Aiello</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petkos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corney</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papadopoulos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skraba</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kompatsiaris</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaimes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Sensing trending topics in Twitter</article-title>
          . Multimedia, IEEE Transactions on
          <volume>15</volume>
          (
          <issue>6</issue>
          )
          <fpage>2681282</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Bif10]
          <string-name>
            <surname>Bifet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Sentiment knowledge discovery in Twitter streaming data</article-title>
          .
          <source>In Discovery Science</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Zha11]
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>W. X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>E. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Comparing Twitter and traditional media using topic models</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          (pp.
          <fpage>338</fpage>
          -
          <lpage>349</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Hon10]
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Davison</surname>
            ,
            <given-names>B. D.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Empirical study of topic modeling in Twitter</article-title>
          .
          <source>In Proceedings of the First Workshop on Social Media Analytics</source>
          (pp.
          <fpage>80</fpage>
          -
          <lpage>88</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Bol11]
          <string-name>
            <surname>Bollen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pepe</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena</article-title>
          .
          <source>In ICWSM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Goy13]
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehala</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A robust approach for finding conceptually related queries using feature selection and tripartite graph structure</article-title>
          .
          <source>Journal of Information Science</source>
          ,
          <volume>39</volume>
          (
          <issue>5</issue>
          ),
          <fpage>575</fpage>
          -
          <lpage>592</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Phu10]
          <string-name>
            <surname>Phuvipadawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Murata</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Breaking news detection and tracking in Twitter</article-title>
          .
          <source>In Web Intelligence and Intelligent Agent Technology (WI-IAT)</source>
          ,
          <year>2010</year>
          IEEE/WIC/ACM International Conference on (Vol.
          <volume>3</volume>
          , pp.
          <fpage>120</fpage>
          -
          <lpage>123</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [San09]
          <string-name>
            <surname>Sankaranarayanan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samet</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teitler</surname>
            ,
            <given-names>B. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lieberman</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sperling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Twitterstand: news in tweets</article-title>
          .
          <source>In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems</source>
          (pp.
          <fpage>42</fpage>
          -
          <lpage>51</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Ban00]
          <string-name>
            <surname>Banko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mittal</surname>
            ,
            <given-names>V. O.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Witbrock</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Headline generation based on statistical translation</article-title>
          .
          <source>In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics</source>
          (pp.
          <fpage>318</fpage>
          -
          <lpage>325</lpage>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Jin01]
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hauptmann</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Generation Using a Training Corpus</article-title>
          .
          <source>In Computational Linguistics and Intelligent Text Processing</source>
          (pp.
          <fpage>208</fpage>
          -
          <lpage>215</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>[Yaj12] YaJuan</surname>
            ,
            <given-names>D. U. A. N.</given-names>
          </string-name>
          , WEIF uRu,
          <string-name>
            <given-names>C. Z.</given-names>
            ,
            <surname>Heung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. M.</given-names>
            , &amp;
            <surname>Shum</surname>
          </string-name>
          <string-name>
            <surname>Y.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Twitter topic summarization by ranking tweets using social influence and content quality</article-title>
          .
          <source>In Proceedings of the 24th International Conference on Computational Linguistics</source>
          (pp.
          <fpage>763</fpage>
          -
          <lpage>780</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Hav03]
          <string-name>
            <surname>Haveliwala</surname>
            ,
            <given-names>T. H.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. Knowledge and Data Engineering</article-title>
          , IEEE Transactions on,
          <volume>15</volume>
          (
          <issue>4</issue>
          ),
          <fpage>784</fpage>
          -
          <lpage>796</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Shu11]
          <string-name>
            <surname>Shubhankar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pudi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>An efficient algorithm for topic ranking and modeling topic evolution</article-title>
          .
          <source>In Database and Expert Systems Applications</source>
          (pp.
          <fpage>320</fpage>
          -
          <lpage>330</lpage>
          ). Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Pap14]
          <string-name>
            <surname>Papadopoulos</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corney</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aiello</surname>
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>SNOW 2014 Data Challenge: Assessing the Performance of News Topic Detection Methods in Social Media</article-title>
          .
          <source>In Proceedings of the SNOW 2014 Data Challenge.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Hut13]
          <string-name>
            <surname>Hutto</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yardi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gilbert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A longitudinal study of follow predictors on twitter</article-title>
          .
          <source>In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</source>
          (pp.
          <fpage>821</fpage>
          -
          <lpage>830</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>[Oco10] O'Connor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krieger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ahn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>TweetMotif: Exploratory Search and Topic Summarization for Twitter</article-title>
          .
          <source>In Proceedings of the 4th Int'l AAAI Conference on Weblogs and Social Media.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>