<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Attention Dominating Moments Across Media Types</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Igor Brigadir</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Derek Greene</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Padraig Cunningham</string-name>
          <email>padraig.cunninghamg@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Copyright c 2016 for the individual papers by the paper's authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. In: M. Martinez, U. Kruschwitz</institution>
          ,
          <addr-line>G. Kazai, D. Corney, F. Hopfgartner, R.</addr-line>
          <institution>Campos and D. Albakour (eds.): Proceedings of the NewsIR'16 Workshop at ECIR</institution>
          ,
          <addr-line>Padua, Italy, 20-March- 2016, published at</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we address the problem of identifying attention dominating moments in online media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse in diversity of content for a period of time. In this work we show that diversity at a topic level is e ective for capturing this e ect in blogs, in news articles, and on Twitter. The phenomenon is present in three distinctly di erent media types, each with their own unique features. We describe the phenomenon using case studies relating to major news stories from September 2015.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The problem of detecting breaking news events has
inspired a host of approaches, extracting useful
signals from activity on social networks, newswire, and
other types of media. The online communication
platforms that have been adopted allow these events to
persist in some form. These digital traces can never
fully capture the original experience, but o er us an
opportunity to revisit signi cant phenomena with
different points of view, or help us to characterise and
learn something about the processes involved. Many
di erent forms of news media attempt to record and
disseminate information deemed important enough to
communicate, and as the barriers to broadcasting and
sharing information are removed, attention becomes a
scarce commodity.</p>
      <p>
        We de ne the problem of detecting attention
dominating moments across di erent media types, as a
collapse in diversity in the content generated by a set
of online sources in a topic during a given time
period. Media types here include mainstream news
articles, blog posts, and tweets. These media types di er
in both the category of topics covered [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and their
use of language [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In the context of Twitter, we
dene sources as unique user accounts. For mainstream
news and blogs, sources refer to individual
publications or outlets. Publications may have di erent
numbers of authors, but as unique author information is
not available, we treat each unique blog or news outlet
as a single source.
      </p>
      <p>In Section 3, we describe the two stages of our
proposed event detection procedure. In the rst stage,
content generated by the news, blog and tweet sources
is grouped into broad topical categories, through the
application of matrix factorization to the content
generated by these sources. In the second stage, we
examine the variation in similarity between content
generated by sources within a given topic during a given
time period, in order to identify a collapse in
diversity within a topic which corresponds to an attention
dominating moment. In Section 5, we evaluate this
procedure on a collection of one million news articles
and blog posts from September 2015, along with a
parallel corpus of tweets collected during the same time
period.</p>
      <p>
        Rather than formulating the problem as tracking
the evolution of topics themselves, we consider the
diversity of content within a speci c topic over time. The
motivation is that, for instance, a collapse in diversity
around a major sporting event will be strongly
evident in certain news sources, but not evident in others.
The distinction is important, as this approach is more
suited to retrospective analysis, when the entire
collection of documents of interest is available. The topics
do not change over time, as opposed to a real-time
setting where topics must be updated as new documents
arrive [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The information need is guided by two
major questions. Firstly, when have signi cant collapses
in diversity occurred in a topic of interest? Secondly,
are there di erences between media types when these
events occur?
      </p>
      <p>Our main contributions here are: 1) a
diversitybased approach of detecting attention dominating
news events; 2) a comparison between traditional news
sources, blogs, and Twitter during these events. 3) a
parallel corpus of newsworthy tweets for the NewsIR
dataset.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In previous work, attention dominating news
stories have been described as media explosions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or
restorms [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The idea of combining signals from
multiple sources for detecting or tracking evolution of
events proved e ective in the past. Osborne et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
used signals from Wikipedia page views, together with
Twitter to improve \ rst story detection". Concurrent
Wikipedia edits were used as a signal for breaking news
detection in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        Topic modeling applied to parallel corpora of news
and tweets has been previously explored by a number
of researchers [
        <xref ref-type="bibr" rid="ref11 ref6 ref9">6, 9, 11</xref>
        ]. Extensions to LDA to
account for tweet speci c features have been proposed
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. A comparison between Twitter and content from
newswires was explored in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. A Non-negative
Matrix Factorization (NMF) approach is used for topic
detection in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        How o ine phenomena link to bursty behaviour
online is discussed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] Shannon's
Diversity Index was used to detect a \contraction of
attention" in a tweet stream by measuring diversity of
hashtags. In contrast, we employ a di erent measure
of diversity based on document similarity, applying it
to streams from di erent media types segmented by
topic. Methods for automatically detecting anomalies
or signi cant changes in a time series are discussed in
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] a change-point detection approach is
applied to time series constructed from Tweet keyword
frequencies.
      </p>
      <p>As a broad overview, the common components
involved in detecting high impact, attention
dominating news stories include: selecting relevant subsets
of documents; representation and feature extraction;
constructing time series from features; event detection
and analysis. In this paper we concentrate on a
single key feature of breaking news: a collapse in content
diversity within a xed time window.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Method</title>
      <p>Our objective is to detect when multiple articles in
a topical stream become less diverse, signalling the
emergence of an attention dominating news story. We
consider attention to a phenomenon as the main
driving force behind the decision to produce or broadcast a
communication. Using the diversity of content within
a time window, we attempt to characterise instances
where a particular piece of information becomes
dominant. Concretely, for each type of media, NMF is
used to assign topics to documents; for documents in
a topic, we calculate diversity between documents in
a time window. This type of analysis allows us to
examine the extent to which the onset of an important
breaking news event is accompanied by a collapse in
textual content diversity, both within a group of news
sources and across di erent media types.
3.1</p>
      <sec id="sec-3-1">
        <title>Finding Topics</title>
        <p>We apply a Non-negative Matrix Factorization (NMF)
topic modeling approach to extract potentially
interesting topics from a stream of tweets or set of articles.
For each media source, we build a tf-idf weighted
termdocument matrix and use this as input to NMF.</p>
        <p>
          We also considered LDA to infer topics in these
datasets. The choice of NMF over LDA was primarily
due to computation time. LDA was signi cantly more
computationally expensive than NMF with NNDSVD
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] initialisation. NMF also tends to produce more
coherent topics [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Measuring Diversity</title>
        <p>The same tf-idf representation used for topic modeling
is used in diversity calculations. Each article, blog
post or tweet is a tf-idf vector. A separate
documentterm matrix is built for each media type. Stopwords
and words occurring in fewer than 10 documents are
removed.</p>
        <p>To measure diversity, we calculate the mean cosine
similarity between all unique pairs of articles within a
topic for a xed time window. Given a set of
documents D in a time window, the diversity is:
diversity(D) =</p>
        <p>Pi;j2D;i6=j cosSim(Di; Dj )</p>
        <p>PjiD=1j 1 i</p>
        <p>Where cosSim(Di; Dj ) is the cosine similarity of
tf-idf vectors of documents i and j in a time window.
In practice, calculating similarities between all pairs
of documents can be e ciently performed in parallel,
and can be calculated in a matter of seconds.</p>
        <p>Longer time windows consider more document
pairs, which naturally result in smoother trends. In
contrast, shorter time windows are more sensitive to
brief attention dominating events, but also false
positive spikes|where a small number of articles happen
to be similar in content, but do not constitute an
attention dominating story.</p>
        <p>An alternative to content diversity is also
considered. Ignoring document content, and just
considering the sources of articles, diversity is calculated with
Shannon's Diversity Index:</p>
        <p>H0 =</p>
        <p>R
X pi ln pi
i=1</p>
        <p>Where pi is the proportion of documents produced
by the ith source in a time window of interest, R is
total number of sources in a given media type.</p>
        <p>
          Both diversity measures produce a single diversity
value per time window, generating a univariate time
series. Changes in diversity that are 2 standard
deviations away from the mean are naively considered to
be important enough to warrant attention. Exploring
more robust and well established methods for change
point detection such as [
          <xref ref-type="bibr" rid="ref15 ref4">15, 4</xref>
          ] is left for future work.
        </p>
        <p>For the case studies described in Section 5, the
window length was set to 8 hours. While the fast-paced
\24/7 news cycle" is described as a constant ood of
information, we nd that all three mediums largely
follow a more traditional publishing cycle, with
prominent spikes in number of published articles on weekday
mornings, and low numbers of articles published
outside of normal o ce hours. A more detailed analysis
of publishing times and characteristics will be explored
in future work.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Datasets</title>
      <p>
        To explore attention dominating news stories, we
apply the method described above to three media
sources: mainstream news, blogs, and tweets. For the
rst two sources, the NewsIR dataset1 is used. For
the nal source, we use our own parallel corpus
collected from Twitter2. In contrast to previous work
[
        <xref ref-type="bibr" rid="ref11 ref6">6, 11</xref>
        ] where tweets are retrieved based on keywords
extracted from news articles, the parallel corpus was
derived from a large set of newsworthy sources,
curated by journalists [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Journalists on Twitter curate
lists3 of useful sources by location or general topic of
interest|for example \US Politics" may contain
ac1Available from: http://research.signalmedia.co/
newsir16/signal-dataset.html
2Data: https://dx.doi.org/10.6084/m9.figshare.2074105
3Examples of such lists are available https://twitter.com/
storyful/lists/ and https://twitter.com/syflmid/lists
counts of US politicians and other journalists who tend
to cover US politics related stories.
      </p>
      <p>
        Gathering all members of such lists covering di
erent countries and topics follows the expert-digest
strategy from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A tweet dataset collected independently
of news and blog articles preserves Twitter-speci c
features and topics. Source and document counts are
summarised in Table 1.
      </p>
      <sec id="sec-4-1">
        <title>Media Type</title>
        <p>News
Blogs
Tweets</p>
      </sec>
      <sec id="sec-4-2">
        <title>Sources</title>
        <p>18,948
73,403
30,448</p>
        <p>Of the original 1 million articles provided, 15,878
were ltered as non-English4 or outside the date range
of interest (i.e. created between 2015-09-01 and
201509-31). Tweet language ltering was performed using
meta-data provided in the tweet.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Attention Dominating Events</title>
      <p>In order to compare the same topics across di
erent media types, we compare the top 10 terms
representing the topics from di erent models. Speci cally,
when topics from two di erent models have
stronglyoverlapping (using Jaccard similarity) top term lists,
this indicates that similar events were discussed in
both media types.</p>
      <p>Topics in a model that do not have any overlapping
terms with topics in other models, suggest that
content unique to a platform is prominent. For example:
the \live, periscope, follow, stream, updates" topic in
the tweet corpus has no equivalent among the news or
blog topics. This re ects the fact that the Periscope
app became popular with journalists for broadcasting
short live video streams and Twitter is the main
platform where these streams are announced. The \music,
album, song, video, band " topic is prominent in the
blogs and Twitter, but is not present in news. This
may re ect the fact that most Twitter accounts and
blogs are far more personal in nature.</p>
      <p>An indicative, but not necessary feature of attention
domination news is the presence of a similar topic on
multiple platforms. To illustrate the phenomenon of
topical diversity collapse, we now describe three case
studies.</p>
      <p>4https://github.com/optimaize/language-detector was
used for language detection. Interestingly, language detection
proved e ective for ltering \spammy" articles containing
obfuscated text, large numbers of urls, or containing tabular
data.</p>
      <p>For each case study, we present the following: Top
10 topic terms for a topic in a media type, and a plot
of diversity over time, where:</p>
      <p>Solid lines show diversity of documents over time.
Dashed lines show Shannon Diversity of sources.
Highlighted time periods are when major
developments occurred|based on Wikipedia Current
Events Portal5 for September 2015.</p>
      <p>Dot and Triangle markers indicate periods when
diversity drops 2 standard deviations below the
mean.
5.1</p>
      <sec id="sec-5-1">
        <title>European Refugee Crisis</title>
        <p>
          The European crisis began in 2015, as increasing
numbers of refugees from areas in Syria, Afghanistan, and
Western Balkans [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] sought asylum in the EU. Figure 1
shows a plot of diversity for the documents assigned
to this topic in each 8 hour time window, for the three
media types. To help with visualisation, raw diversity
values are standardised with z-scores on the y axis,
while the x axis grid separates days.
        </p>
        <sec id="sec-5-1-1">
          <title>Media</title>
          <p>Blogs</p>
          <p>The downward trend in diversity between
September 3rd and 5th in the refugee crisis topic can be
explained by the death of Aylan Kurdi. News of his
5https://en.wikipedia.org/wiki/Portal:Current_
events/September_2015
Blogs
News
drowning quickly spread online and made global
headlines. This was a particularly far-reaching story,
dominating news coverage until an announcement on
relaxing controls on the Austro-Hungarian border by
Chancellors Faymann of Austria and Merkel of
Germany. Both Twitter and mainstream news streams
experienced a diversity collapse, while Blogs maintained
more diverse set of articles. Between 19th and 21st,
smaller drops in diversity coinside with Pope Francis'
visit, where the issue of refugees was a prominent topic
of discussion.
5.2</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Donald Trump Presidential Campaign</title>
        <p>Donald Trump's presidential campaign has attracted
considerable attention across all types of media6.
Positions on issues of immigration and religion are
particularly polarising, frequently causing controversies in
mainstream media.</p>
        <sec id="sec-5-2-1">
          <title>Media</title>
          <p>Blogs
News
Tweets</p>
          <p>Top 10 Topic Terms
trump, donald, republican, presidential, debate,
gop, president, candidates, candidate, bush
trump, republican, presidential, donald, debate,
clinton, bush, orina, candidates, campaign
trump, im, love, donald, going, debate, happy,
gop, president, think
1.0
0.0
­1.0
­2.0
­3.0
­4.0
­5.0
1.0
0.0
­1.0
­2.0
­3.0
­4.0</p>
          <p>Blogs</p>
          <p>News
1.0
0.0
­1.0
­2.0
­3.0
­4.0
­­65..00 Tweets
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Figure 2: Standardised diversity scores for Donald
Trump Presidential Campaign topic</p>
          <p>Signi cant events marked around 12th, 17th, 21st
in Figure 2 relate to: Trump's comments on Senator
Rand Paul on Twitter which was discussed on
mainstream news around 12th, but not as prominently on
blogs. On the 16th-17th coverage of a republican
presidential debate hosted by CNN; and 21st|mainstream
news coverage of reactions to events on 17th: during
6https://en.wikipedia.org/wiki/Donald_Trump_
presidential_campaign,_2016
a town hall meeting in Rochester, Donald Trump
declined to correct a man who said that President Obama
is a Muslim.</p>
          <p>The statement prompted a signi cant drop in the
diversity of stories across all platforms. On the 25th,
during a speech given to conservative voters in
Washington, Trump called fellow Republican presidential
candidate Marco Rubio \a clown". Based on the data,
it appears that the reaction to the latter on Twitter
was not as pronounced as among journalists and
bloggers.
5.3</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Pope Francis visits North America</title>
        <p>The visit of Pope Francis spanned 19 to 27
September 2015, where the itinerary included venues in both
Cuba and the United States. This event is a good
illustrative example as it was widely documented7, and
highlights a case where a collapse in diversity did not
occur at the same time on di erent media platforms.</p>
        <sec id="sec-5-3-1">
          <title>Media</title>
          <p>Blogs</p>
          <p>In the case of news publishers, the largest drop in
diversity coincided with the beginning of the Pope's
visit to Havana. Twitter users and bloggers reacted
more on September 23rd and 24th, when the Pope
met with Barack Obama and became the rst Pope to
address a joint session of US Congress.</p>
          <p>7https://en.wikipedia.org/wiki/Pope_Francis'_2015_
visit_to_North_America</p>
          <p>In the Twitter stream, the notable event around
16th-17th is due to large numbers of similar tweets as
preparations for the visit were being discussed, and
#TellThePope trended brie y.</p>
          <p>Earlier in the month, we see evidence of
overlapping attention dominating events. Between 6th and
7th September, the Pope announced the Vatican's
churches will welcome families of refugees. This
announcement followed a signi cant development in the
ongoing European refugee crisis: around 6,500 refugees
arrived in Vienna following Austria's and Germany's
decision to waive asylum system rules. This suggests
that an attention dominating news event in one topic
can trigger events in other topics, especially where
prominent public gures are involved.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>
        While the diversity measure we propose is relatively
simple, it can be easily augmented to account for other
factors. In the simplest form, every similarity value
between a unique pair of articles within a time window
carries an equal weight in the diversity calculation,
implying that a strong similarity between two highly
in uential publishers is just as important as between
two inconsequential publishers with a small audience.
However, this weight could be tuned, either manually
or automatically using external information (e.g. Alexa
rankings). Accounting for social context [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] could also
be achieved by augmenting the topic modeling stage
of the process. Instead of using a classic tf-idf vector
space model, alternative representations that capture
more semantic similarity between documents can be
used. We aim to explore extensions to this measure in
future work.
      </p>
      <p>
        The sequence of events in the European refugee
crisis and papal visit case studies suggest that it may be
possible to identify and track major developments with
global impact by linking attention dominating
moments across multiple topics, as well as across sources
on di erent platforms. Social media communities both
in uence and are in uenced by traditional news media
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Stories break both on Twitter and through
traditional news publishers. Tracking or linking instances of
diversity collapse to explain the direction of in uence
between the di erent media types is also a potential
avenue for future work.
      </p>
      <p>Acknowledgments: This publication has emanated
from research conducted with the support of
Science Foundation Ireland (SFI) under Grant Number
SFI/12/RC/2289.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Boutsidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Gallopoulos</surname>
          </string-name>
          .
          <article-title>Svd based initialization: A head start for nonnegative matrix factorization</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>41</volume>
          (
          <issue>4</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Boydstun</surname>
          </string-name>
          .
          <article-title>Making the news: Politics, the media, and agenda setting</article-title>
          . University of Chicago Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Brigadir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Greene</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          .
          <article-title>Adaptive representations for tracking breaking news on twitter</article-title>
          .
          <source>CoRR, abs/1403.2923</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Esling</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Agon</surname>
          </string-name>
          .
          <article-title>Time-series data mining</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):
          <fpage>12</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gandica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. S. D.</given-names>
            <surname>Aidos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lambiotte</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Carletti</surname>
          </string-name>
          .
          <article-title>On the origin of burstiness in human behavior: The wikipedia edits case</article-title>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Darwish</surname>
          </string-name>
          .
          <article-title>Joint topic modeling for event summarization across news and social media streams</article-title>
          .
          <source>In Proc. 21st ACM international conference on Information and knowledge management</source>
          , pages
          <volume>1173</volume>
          {
          <fpage>1182</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Zafar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          .
          <article-title>On sampling the wisdom of crowds: Random vs. expert sampling of the twitter stream</article-title>
          .
          <source>In Proceedings of the 22nd ACM international conference on Conference on information &amp; knowledge management</source>
          , pages
          <volume>1739</volume>
          {
          <fpage>1744</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E</given-names>
            <surname>.-M. P. Giulio Sabbati</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Saliba</surname>
          </string-name>
          .
          <article-title>Asylum in the eu: Facts and gures</article-title>
          .
          <source>European Parliamentary Research Service, (PE 551.332)</source>
          , mar
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>John</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Kambhampati.</surname>
          </string-name>
          Et-lda:
          <article-title>Joint topic modeling for aligning events and their twitter feedback</article-title>
          .
          <source>In AAAI Conference on Arti cial Intelligence</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Talamadupula</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Kambhampati</surname>
          </string-name>
          . Dude, srsly?:
          <article-title>The surprisingly formal nature of Twitter's language</article-title>
          , pages
          <volume>244</volume>
          {
          <fpage>253</fpage>
          . AAAI press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          , C.-T. Lu, and
          <string-name>
            <given-names>N.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          .
          <article-title>Topical analysis of interactions between news and social media</article-title>
          .
          <source>Proceedings of the 30th AAAI Conference on Arti cial Intelligence</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jungherr</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pascal</surname>
          </string-name>
          .
          <article-title>Forecasting the pulse: how deviations from regular patterns in online data can identify o ine phenomena</article-title>
          .
          <source>Internet Research</source>
          ,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <volume>589</volume>
          {
          <fpage>607</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalyanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mantrach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Saez-Trumper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Vahabi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lanckriet</surname>
          </string-name>
          .
          <article-title>Leveraging social context for modeling topic evolution</article-title>
          .
          <source>In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pages
          <volume>517</volume>
          {
          <fpage>526</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lamba</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Malik</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <article-title>Pfe er. A tempest in a teacup? analyzing restorms on twitter</article-title>
          .
          <source>In Proc. International Conference on Advances in Social Networks Analysis and Mining</source>
          , pages
          <volume>17</volume>
          {
          <fpage>24</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yamada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sugiyama</surname>
          </string-name>
          .
          <article-title>Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation</article-title>
          . ArXiv eprints,
          <source>Mar</source>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>McCreadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Ounis.</surname>
          </string-name>
          <article-title>Bieber no more: First story detection using twitter and wikipedia</article-title>
          .
          <source>In SIGIR Workshop on Time-aware Information Access</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>D. OCallaghan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Greene</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Carthy</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cunningham</surname>
          </string-name>
          .
          <article-title>An analysis of the coherence of descriptors in topic modeling</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>42</volume>
          (
          <issue>13</issue>
          ):
          <volume>5645</volume>
          {
          <fpage>5657</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>McCreadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ounis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Shrimpton</surname>
          </string-name>
          .
          <article-title>Can twitter replace newswire for breaking news</article-title>
          ?
          <source>In Proc. 7th International Conference on Weblogs and Social Media</source>
          ,
          <string-name>
            <surname>ICWSM</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. van Hooland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Summers</surname>
          </string-name>
          .
          <article-title>Mj no more: Using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection</article-title>
          .
          <source>In Proc. 2nnd International Conference on World Wide Web</source>
          , pages
          <volume>791</volume>
          {
          <fpage>794</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>C. K. Vaca</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mantrach</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jaimes</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Saerens</surname>
          </string-name>
          .
          <article-title>A time-based collective factorization for topic discovery and monitoring in news</article-title>
          .
          <source>In Proceedings of the 23rd international conference on World wide web</source>
          , pages
          <volume>527</volume>
          {
          <fpage>538</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhai</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Boyd-Graber</surname>
          </string-name>
          .
          <article-title>Online latent dirichlet allocation with in nite vocabulary</article-title>
          .
          <source>In Proc. 30th International Conference on Machine Learning</source>
          , pages
          <volume>561</volume>
          {
          <fpage>569</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.-P.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Comparing twitter and traditional media using topic models</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          , pages
          <volume>338</volume>
          {
          <fpage>349</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>