Detecting Attention Dominating Moments
                          Across Media Types

                     Igor Brigadir              Derek Greene              Pádraig Cunningham
                        {igor.brigadir, derek.greene, padraig.cunningham}@insight-centre.org
                                           Insight Centre for Data Analytics
                                          University College Dublin, Ireland


                                                                different forms of news media attempt to record and
                                                                disseminate information deemed important enough to
                       Abstract                                 communicate, and as the barriers to broadcasting and
                                                                sharing information are removed, attention becomes a
    In this paper we address the problem of iden-               scarce commodity.
    tifying attention dominating moments in on-                    We define the problem of detecting attention domi-
    line media. We are interested in discovering                nating moments across different media types, as a col-
    moments when everyone seems to be talking                   lapse in diversity in the content generated by a set
    about the same thing. We investigate one par-               of online sources in a topic during a given time pe-
    ticular aspect of breaking news: the tendency               riod. Media types here include mainstream news arti-
    of multiple sources to concentrate attention on             cles, blog posts, and tweets. These media types differ
    a single topic, leading to a collapse in diver-             in both the category of topics covered [22], and their
    sity of content for a period of time. In this               use of language [10]. In the context of Twitter, we de-
    work we show that diversity at a topic level                fine sources as unique user accounts. For mainstream
    is effective for capturing this effect in blogs,            news and blogs, sources refer to individual publica-
    in news articles, and on Twitter. The phe-                  tions or outlets. Publications may have different num-
    nomenon is present in three distinctly differ-              bers of authors, but as unique author information is
    ent media types, each with their own unique                 not available, we treat each unique blog or news outlet
    features. We describe the phenomenon us-                    as a single source.
    ing case studies relating to major news stories                In Section 3, we describe the two stages of our pro-
    from September 2015.                                        posed event detection procedure. In the first stage,
                                                                content generated by the news, blog and tweet sources
1    Introduction                                               is grouped into broad topical categories, through the
The problem of detecting breaking news events has               application of matrix factorization to the content gen-
inspired a host of approaches, extracting useful sig-           erated by these sources. In the second stage, we ex-
nals from activity on social networks, newswire, and            amine the variation in similarity between content gen-
other types of media. The online communication plat-            erated by sources within a given topic during a given
forms that have been adopted allow these events to              time period, in order to identify a collapse in diver-
persist in some form. These digital traces can never            sity within a topic which corresponds to an attention
fully capture the original experience, but offer us an          dominating moment. In Section 5, we evaluate this
opportunity to revisit significant phenomena with dif-          procedure on a collection of one million news articles
ferent points of view, or help us to characterise and           and blog posts from September 2015, along with a par-
learn something about the processes involved. Many              allel corpus of tweets collected during the same time
                                                                period.
    Copyright c 2016 for the individual papers by the paper’s      Rather than formulating the problem as tracking
authors. Copying permitted for private and academic purposes.   the evolution of topics themselves, we consider the di-
This volume is published and copyrighted by its editors.
                                                                versity of content within a specific topic over time. The
    In: M. Martinez, U. Kruschwitz, G. Kazai, D. Corney, F.
Hopfgartner, R. Campos and D. Albakour (eds.): Proceedings
                                                                motivation is that, for instance, a collapse in diversity
of the NewsIR’16 Workshop at ECIR, Padua, Italy, 20-March-      around a major sporting event will be strongly evi-
2016, published at http://ceur-ws.org                           dent in certain news sources, but not evident in others.
The distinction is important, as this approach is more      diversity within a fixed time window.
suited to retrospective analysis, when the entire collec-
tion of documents of interest is available. The topics      3     Proposed Method
do not change over time, as opposed to a real-time set-
ting where topics must be updated as new documents          Our objective is to detect when multiple articles in
arrive [21]. The information need is guided by two ma-      a topical stream become less diverse, signalling the
jor questions. Firstly, when have significant collapses     emergence of an attention dominating news story. We
in diversity occurred in a topic of interest? Secondly,     consider attention to a phenomenon as the main driv-
are there differences between media types when these        ing force behind the decision to produce or broadcast a
events occur?                                               communication. Using the diversity of content within
   Our main contributions here are: 1) a diversity-         a time window, we attempt to characterise instances
based approach of detecting attention dominating            where a particular piece of information becomes dom-
news events; 2) a comparison between traditional news       inant. Concretely, for each type of media, NMF is
sources, blogs, and Twitter during these events. 3) a       used to assign topics to documents; for documents in
parallel corpus of newsworthy tweets for the NewsIR         a topic, we calculate diversity between documents in
dataset.                                                    a time window. This type of analysis allows us to ex-
                                                            amine the extent to which the onset of an important
                                                            breaking news event is accompanied by a collapse in
2   Related Work                                            textual content diversity, both within a group of news
In previous work, attention dominating news sto-            sources and across different media types.
ries have been described as media explosions [2] or
firestorms [14]. The idea of combining signals from         3.1    Finding Topics
multiple sources for detecting or tracking evolution of     We apply a Non-negative Matrix Factorization (NMF)
events proved effective in the past. Osborne et al. [16]    topic modeling approach to extract potentially inter-
used signals from Wikipedia page views, together with       esting topics from a stream of tweets or set of articles.
Twitter to improve “first story detection”. Concurrent      For each media source, we build a tf-idf weighted term-
Wikipedia edits were used as a signal for breaking news     document matrix and use this as input to NMF.
detection in [19].                                             We also considered LDA to infer topics in these
    Topic modeling applied to parallel corpora of news      datasets. The choice of NMF over LDA was primarily
and tweets has been previously explored by a number         due to computation time. LDA was significantly more
of researchers [6, 9, 11]. Extensions to LDA to ac-         computationally expensive than NMF with NNDSVD
count for tweet specific features have been proposed        [1] initialisation. NMF also tends to produce more co-
[22]. A comparison between Twitter and content from         herent topics [17].
newswires was explored in [18]. A Non-negative Ma-
trix Factorization (NMF) approach is used for topic         3.2    Measuring Diversity
detection in [20].
                                                            The same tf-idf representation used for topic modeling
    How offline phenomena link to bursty behaviour on-
                                                            is used in diversity calculations. Each article, blog
line is discussed in [5] and [12]. In [12] Shannon’s Di-
                                                            post or tweet is a tf-idf vector. A separate document-
versity Index was used to detect a “contraction of at-
                                                            term matrix is built for each media type. Stopwords
tention” in a tweet stream by measuring diversity of
                                                            and words occurring in fewer than 10 documents are
hashtags. In contrast, we employ a different measure
                                                            removed.
of diversity based on document similarity, applying it
to streams from different media types segmented by             To measure diversity, we calculate the mean cosine
topic. Methods for automatically detecting anomalies        similarity between all unique pairs of articles within a
or significant changes in a time series are discussed in    topic for a fixed time window. Given a set of docu-
[4]. In [15] a change-point detection approach is ap-       ments D in a time window, the diversity is:
plied to time series constructed from Tweet keyword                                  P
frequencies.                                                                          i,j∈D,i6=j cosSim(Di , Dj )
                                                                  diversity(D) = −             P|D|−1
    As a broad overview, the common components in-                                               i=1  i
volved in detecting high impact, attention dominat-
ing news stories include: selecting relevant subsets            Where cosSim(Di , Dj ) is the cosine similarity of
of documents; representation and feature extraction;        tf-idf vectors of documents i and j in a time window.
constructing time series from features; event detection     In practice, calculating similarities between all pairs
and analysis. In this paper we concentrate on a sin-        of documents can be efficiently performed in parallel,
gle key feature of breaking news: a collapse in content     and can be calculated in a matter of seconds.
   Longer time windows consider more document                    counts of US politicians and other journalists who tend
pairs, which naturally result in smoother trends. In             to cover US politics related stories.
contrast, shorter time windows are more sensitive to                Gathering all members of such lists covering differ-
brief attention dominating events, but also false posi-          ent countries and topics follows the expert-digest strat-
tive spikes—where a small number of articles happen              egy from [7]. A tweet dataset collected independently
to be similar in content, but do not constitute an at-           of news and blog articles preserves Twitter-specific fea-
tention dominating story.                                        tures and topics. Source and document counts are
   An alternative to content diversity is also consid-           summarised in Table 1.
ered. Ignoring document content, and just consider-
ing the sources of articles, diversity is calculated with         Media Type      Sources    Documents      Docs. per 24h
Shannon’s Diversity Index:                                        News             18,948       730,634             8,177
                                                                  Blogs            73,403       253,488            23,568
                             R
                             X                                    Tweets           30,448     3,274,089           125,568
                    H0 = −         pi ln pi                      Table 1: Summary of overall source and document
                             i=1
                                                                 counts by media type after filtering, and average num-
   Where pi is the proportion of documents produced              ber of documents in a 24 hour window.
by the ith source in a time window of interest, R is
total number of sources in a given media type.                      Of the original 1 million articles provided, 15,878
   Both diversity measures produce a single diversity            were filtered as non-English4 or outside the date range
value per time window, generating a univariate time              of interest (i.e. created between 2015-09-01 and 2015-
series. Changes in diversity that are 2 standard devi-           09-31). Tweet language filtering was performed using
ations away from the mean are naively considered to              meta-data provided in the tweet.
be important enough to warrant attention. Exploring
more robust and well established methods for change              5    Attention Dominating Events
point detection such as [15, 4] is left for future work.
   For the case studies described in Section 5, the win-         In order to compare the same topics across differ-
dow length was set to 8 hours. While the fast-paced              ent media types, we compare the top 10 terms repre-
“24/7 news cycle” is described as a constant flood of            senting the topics from different models. Specifically,
information, we find that all three mediums largely              when topics from two different models have strongly-
follow a more traditional publishing cycle, with promi-          overlapping (using Jaccard similarity) top term lists,
nent spikes in number of published articles on weekday           this indicates that similar events were discussed in
mornings, and low numbers of articles published out-             both media types.
side of normal office hours. A more detailed analysis               Topics in a model that do not have any overlapping
of publishing times and characteristics will be explored         terms with topics in other models, suggest that con-
in future work.                                                  tent unique to a platform is prominent. For example:
                                                                 the “live, periscope, follow, stream, updates” topic in
4    Datasets                                                    the tweet corpus has no equivalent among the news or
                                                                 blog topics. This reflects the fact that the Periscope
To explore attention dominating news stories, we                 app became popular with journalists for broadcasting
apply the method described above to three media                  short live video streams and Twitter is the main plat-
sources: mainstream news, blogs, and tweets. For the             form where these streams are announced. The “music,
first two sources, the NewsIR dataset1 is used. For              album, song, video, band ” topic is prominent in the
the final source, we use our own parallel corpus col-            blogs and Twitter, but is not present in news. This
lected from Twitter2 . In contrast to previous work              may reflect the fact that most Twitter accounts and
[6, 11] where tweets are retrieved based on keywords             blogs are far more personal in nature.
extracted from news articles, the parallel corpus was               An indicative, but not necessary feature of attention
derived from a large set of newsworthy sources, cu-              domination news is the presence of a similar topic on
rated by journalists [3]. Journalists on Twitter curate          multiple platforms. To illustrate the phenomenon of
lists3 of useful sources by location or general topic of         topical diversity collapse, we now describe three case
interest—for example “US Politics” may contain ac-               studies.
   1 Available  from:        http://research.signalmedia.co/        4 https://github.com/optimaize/language-detector       was
newsir16/signal-dataset.html                                     used for language detection. Interestingly, language detection
   2 Data: https://dx.doi.org/10.6084/m9.figshare.2074105        proved effective for filtering “spammy” articles containing
   3 Examples of such lists are available https://twitter.com/   obfuscated text, large numbers of urls, or containing tabular
storyful/lists/ and https://twitter.com/syflmid/lists            data.
   For each case study, we present the following: Top                                          drowning quickly spread online and made global head-
10 topic terms for a topic in a media type, and a plot                                         lines. This was a particularly far-reaching story, dom-
of diversity over time, where:                                                                 inating news coverage until an announcement on re-
                                                                                               laxing controls on the Austro-Hungarian border by
       • Solid lines show diversity of documents over time.
                                                                                               Chancellors Faymann of Austria and Merkel of Ger-
       • Dashed lines show Shannon Diversity of sources.                                       many. Both Twitter and mainstream news streams ex-
                                                                                               perienced a diversity collapse, while Blogs maintained
       • Highlighted time periods are when major devel-                                        more diverse set of articles. Between 19th and 21st,
         opments occurred—based on Wikipedia Current                                           smaller drops in diversity coinside with Pope Francis’
         Events Portal5 for September 2015.                                                    visit, where the issue of refugees was a prominent topic
       • Dot and Triangle markers indicate periods when                                        of discussion.
         diversity drops 2 standard deviations below the
         mean.                                                                                 5.2        Donald Trump Presidential Campaign

5.1        European Refugee Crisis                                                             Donald Trump’s presidential campaign has attracted
                                                                                               considerable attention across all types of media6 . Po-
The European crisis began in 2015, as increasing num-                                          sitions on issues of immigration and religion are par-
bers of refugees from areas in Syria, Afghanistan, and                                         ticularly polarising, frequently causing controversies in
Western Balkans [8] sought asylum in the EU. Figure 1                                          mainstream media.
shows a plot of diversity for the documents assigned
to this topic in each 8 hour time window, for the three                                        Media            Top 10 Topic Terms
media types. To help with visualisation, raw diversity                                         Blogs            trump, donald, republican, presidential, debate,
values are standardised with z-scores on the y axis,                                                            gop, president, candidates, candidate, bush
while the x axis grid separates days.                                                          News             trump, republican, presidential, donald, debate,
                                                                                                                clinton, bush, fiorina, candidates, campaign
Media            Top 10 Topic Terms                                                            Tweets           trump, im, love, donald, going, debate, happy,
Blogs            refugees, syria, syrian, war, president, govern-                                               gop, president, think
                 ment, military, europe, russia, iran
News             refugees, migrants, border, hungary, eu, europe,                              1.0
                 european, refugee, asylum, germany                                            0.0
                                                                                               ­1.0
Tweets           refugees, syrian, hungary, help, migrants, europe,                            ­2.0
                 border, germany, austria, asylum                                              ­3.0
                                                                                               ­4.0

2.0
                                                                                               ­5.0        Blogs
1.0
0.0                                                                                            1.0
­1.0                                                                                           0.0
­2.0                                                                                           ­1.0
­3.0                                                                                 Blogs     ­2.0
                                                                                               ­3.0

1.0
                                                                                               ­4.0        News
0.0
                                                                                                1.0
­1.0                                                                                            0.0
­2.0                                                                                           ­1.0
                                                                                               ­2.0
­3.0                                                                                 News      ­3.0
                                                                                               ­4.0
                                                                                               ­5.0

1.0
                                                                                               ­6.0        Tweets
0.0                                                                                               01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
­1.0                                                                                           Figure 2: Standardised diversity scores for Donald
­2.0
                                                                                               Trump Presidential Campaign topic
­3.0                                            Tweets
   01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30      Significant events marked around 12th, 17th, 21st
Figure 1: Standardised diversity scores for the Euro-                                          in Figure 2 relate to: Trump’s comments on Senator
pean refugee crisis topic during September 2015, across                                        Rand Paul on Twitter which was discussed on main-
three media types.                                                                             stream news around 12th, but not as prominently on
   The downward trend in diversity between Septem-                                             blogs. On the 16th-17th coverage of a republican pres-
ber 3rd and 5th in the refugee crisis topic can be ex-                                         idential debate hosted by CNN; and 21st—mainstream
plained by the death of Aylan Kurdi. News of his                                               news coverage of reactions to events on 17th: during
   5 https://en.wikipedia.org/wiki/Portal:Current_                                                6 https://en.wikipedia.org/wiki/Donald_Trump_

events/September_2015                                                                          presidential_campaign,_2016
a town hall meeting in Rochester, Donald Trump de-                                                In the Twitter stream, the notable event around
clined to correct a man who said that President Obama                                          16th-17th is due to large numbers of similar tweets as
is a Muslim.                                                                                   preparations for the visit were being discussed, and
   The statement prompted a significant drop in the                                            #TellThePope trended briefly.
diversity of stories across all platforms. On the 25th,                                           Earlier in the month, we see evidence of overlap-
during a speech given to conservative voters in Wash-                                          ping attention dominating events. Between 6th and
ington, Trump called fellow Republican presidential                                            7th September, the Pope announced the Vatican’s
candidate Marco Rubio “a clown”. Based on the data,                                            churches will welcome families of refugees. This an-
it appears that the reaction to the latter on Twitter                                          nouncement followed a significant development in the
was not as pronounced as among journalists and blog-                                           ongoing European refugee crisis: around 6,500 refugees
gers.                                                                                          arrived in Vienna following Austria’s and Germany’s
                                                                                               decision to waive asylum system rules. This suggests
5.3        Pope Francis visits North America                                                   that an attention dominating news event in one topic
The visit of Pope Francis spanned 19 to 27 Septem-                                             can trigger events in other topics, especially where
ber 2015, where the itinerary included venues in both                                          prominent public figures are involved.
Cuba and the United States. This event is a good il-
lustrative example as it was widely documented7 , and
highlights a case where a collapse in diversity did not
occur at the same time on different media platforms.                                           6   Discussion

Media            Top 10 Topic Terms                                                            While the diversity measure we propose is relatively
Blogs            pope, francis, church, catholic, visit, cuba, popes,                          simple, it can be easily augmented to account for other
                 climate, philadelphia, vatican                                                factors. In the simplest form, every similarity value
News             pope, francis, catholic, church, philadelphia,                                between a unique pair of articles within a time window
                 popes, cuba, united, vatican, visit                                           carries an equal weight in the diversity calculation,
Tweets           pope, francis, visit, house, congress, popeindc,
                                                                                               implying that a strong similarity between two highly
                 cuba, white, popeinphilly, philadelphia
                                                                                               influential publishers is just as important as between
                                                                                               two inconsequential publishers with a small audience.
1.0
                                                                                               However, this weight could be tuned, either manually
0.0                                                                                            or automatically using external information (e.g. Alexa
­1.0                                                                                           rankings). Accounting for social context [13] could also
­2.0
            Blogs
                                                                                               be achieved by augmenting the topic modeling stage
                                                                                               of the process. Instead of using a classic tf-idf vector
 1.0                                                                                           space model, alternative representations that capture
 0.5
 0.0                                                                                           more semantic similarity between documents can be
­0.5
­1.0                                                                                           used. We aim to explore extensions to this measure in
­1.5
­2.0                                                                                           future work.
­2.5        News
                                                                                                  The sequence of events in the European refugee cri-
1.0                                                                                            sis and papal visit case studies suggest that it may be
0.0
                                                                                               possible to identify and track major developments with
­1.0

­2.0
                                                                                               global impact by linking attention dominating mo-
­3.0        Tweets                                                                             ments across multiple topics, as well as across sources
   01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30   on different platforms. Social media communities both
Figure 3: Standardised diversity scores for the Papal                                          influence and are influenced by traditional news media
visit topic during September 2015.                                                             [11]. Stories break both on Twitter and through tradi-
                                                                                               tional news publishers. Tracking or linking instances of
   In the case of news publishers, the largest drop in                                         diversity collapse to explain the direction of influence
diversity coincided with the beginning of the Pope’s                                           between the different media types is also a potential
visit to Havana. Twitter users and bloggers reacted                                            avenue for future work.
more on September 23rd and 24th, when the Pope
met with Barack Obama and became the first Pope to                                             Acknowledgments: This publication has emanated
address a joint session of US Congress.                                                        from research conducted with the support of Sci-
   7 https://en.wikipedia.org/wiki/Pope_Francis’_2015_                                         ence Foundation Ireland (SFI) under Grant Number
visit_to_North_America                                                                         SFI/12/RC/2289.
References                                                [13] J. Kalyanam, A. Mantrach, D. Saez-Trumper,
                                                               H. Vahabi, and G. Lanckriet. Leveraging social
 [1] C. Boutsidis and E. Gallopoulos. Svd based ini-
                                                               context for modeling topic evolution. In Proc.
     tialization: A head start for nonnegative matrix
                                                               21th ACM SIGKDD International Conference on
     factorization. Pattern Recognition, 41(4), 2008.
                                                               Knowledge Discovery and Data Mining, pages
 [2] A. E. Boydstun. Making the news: Politics, the            517–526, 2015.
     media, and agenda setting. University of Chicago
                                                          [14] H. Lamba, M. M. Malik, and J. Pfeffer. A tem-
     Press, 2013.
                                                               pest in a teacup? analyzing firestorms on twitter.
 [3] I. Brigadir, D. Greene, and P. Cunningham.                In Proc. International Conference on Advances in
     Adaptive representations for tracking breaking            Social Networks Analysis and Mining, pages 17–
     news on twitter. CoRR, abs/1403.2923, 2014.               24, 2015.

 [4] P. Esling and C. Agon. Time-series data min-         [15] S. Liu, M. Yamada, N. Collier, and M. Sugiyama.
     ing. ACM Computing Surveys (CSUR), 45(1):12,              Change-Point Detection in Time-Series Data by
     2012.                                                     Relative Density-Ratio Estimation. ArXiv e-
                                                               prints, Mar. 2012.
 [5] Y. Gandica, J. Carvalho, F. S. D. Aidos, R. Lam-
     biotte, and T. Carletti. On the origin of bursti-    [16] M. Osborne, S. Petrovic, R. McCreadie, C. Mac-
     ness in human behavior: The wikipedia edits case,         donald, and I. Ounis. Bieber no more: First story
     2016.                                                     detection using twitter and wikipedia. In SI-
                                                               GIR Workshop on Time-aware Information Ac-
 [6] W. Gao, P. Li, and K. Darwish. Joint topic mod-           cess, 2012.
     eling for event summarization across news and so-
     cial media streams. In Proc. 21st ACM interna-       [17] D. OCallaghan, D. Greene, J. Carthy, and P. Cun-
     tional conference on Information and knowledge            ningham. An analysis of the coherence of descrip-
     management, pages 1173–1182. ACM, 2012.                   tors in topic modeling. Expert Systems with Ap-
                                                               plications, 42(13):5645 – 5657, 2015.
 [7] S. Ghosh, M. B. Zafar, P. Bhattacharya,
     N. Sharma, N. Ganguly, and K. Gummadi. On            [18] S. Petrovic, M. Osborne, R. McCreadie, C. Mac-
     sampling the wisdom of crowds: Random vs. ex-             donald, I. Ounis, and L. Shrimpton. Can twitter
     pert sampling of the twitter stream. In Proceed-          replace newswire for breaking news? In Proc. 7th
     ings of the 22nd ACM international conference on          International Conference on Weblogs and Social
     Conference on information & knowledge manage-             Media, ICWSM, 2013.
     ment, pages 1739–1744. ACM, 2013.
                                                          [19] T. Steiner, S. van Hooland, and E. Summers. Mj
 [8] E.-M. P. Giulio Sabbati and S. Saliba. Asylum in          no more: Using concurrent wikipedia edit spikes
     the eu: Facts and figures. European Parliamen-            with social network plausibility checks for break-
     tary Research Service, (PE 551.332), mar 2015.            ing news detection. In Proc. 2nnd International
                                                               Conference on World Wide Web, pages 791–794,
 [9] Y. Hu, A. John, F. Wang, and S. Kambhampati.              2013.
     Et-lda: Joint topic modeling for aligning events
     and their twitter feedback. In AAAI Conference       [20] C. K. Vaca, A. Mantrach, A. Jaimes, and
     on Artificial Intelligence, 2012.                         M. Saerens. A time-based collective factorization
                                                               for topic discovery and monitoring in news. In
[10] Y. Hu, K. Talamadupula, and S. Kambhampati.               Proceedings of the 23rd international conference
     Dude, srsly?: The surprisingly formal nature of           on World wide web, pages 527–538. ACM, 2014.
     Twitter’s language, pages 244–253. AAAI press,
     2013.                                                [21] K. Zhai and J. Boyd-Graber. Online latent dirich-
                                                               let allocation with infinite vocabulary. In Proc.
[11] T. Hua, F. Chen, C.-T. Lu, and N. Ramakrish-              30th International Conference on Machine Learn-
     nan. Topical analysis of interactions between news        ing, pages 561–569, 2013.
     and social media. Proceedings of the 30th AAAI
     Conference on Artificial Intelligence, 2016.         [22] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim,
                                                               H. Yan, and X. Li. Comparing twitter and tradi-
[12] A. Jungherr and J. Pascal. Forecasting the pulse:         tional media using topic models. In Advances in
     how deviations from regular patterns in online            Information Retrieval, pages 338–349. Springer,
     data can identify offline phenomena. Internet Re-         2011.
     search, 23(5):589–607, 2013.