=Paper=
{{Paper
|id=Vol-1960/paper5
|storemode=property
|title=Online Topic Modeling: Keeping Track of News Topics for Social Good
|pdfUrl=https://ceur-ws.org/Vol-1960/paper5.pdf
|volume=Vol-1960
|authors=Zahra Ahmadi,Sophie Burkhardt,Stefan Kramer
}}
==Online Topic Modeling: Keeping Track of News Topics for Social Good==
<pdf width="1500px">https://ceur-ws.org/Vol-1960/paper5.pdf</pdf>
<pre>
 Online Topic Modeling: Keeping Track of News
             Topics for Social Good

             Zahra Ahmadi, Sophie Burkhardt, and Stefan Kramer

            Institut für Informatik, Johannes Gutenberg-Universität,
                    Staudingerweg 9, 55128, Mainz, Germany
     zaahmadi@uni-mainz.de, burkhardt,kramer@informatik.uni-mainz.de


      Abstract. The refugee crisis has become an important, albeit contro-
      versial, issue for European countries. There have been many debates in
      favor or against accepting refugees in social media; however, there is lit-
      tle work on the interpretation of data in this regard. In this paper, we
      propose an online topic modeling approach which is able to evolve over
      time and finds the most important topics at each time slot. Our study
      shows that outside events have a visible impact on the media and this
      perception can be changed or evolving over time.


1   Introduction

Europe has witnessed a large movement of migrants and refugees from Africa and
the Middle East in recent years. The arrival wave started in August 2015, and
since then it has been in the spotlight of the media by reporting an increasing
number of events and heated and polarized debates relevant to this phenomenon.
The implications of this crisis are complex and wide, however, data mining ex-
perts just recently considered the interpretation of the media: Coletto et al. [2]
proposed an adaptive framework to analyze the spatial, temporal and sentiment
aspects of a polarized topic discussed in online social media; the GDELT data1
was used to answer the question of whether the Arab Spring sparked a wave of
global protests2 ; and Data For Democracy built a tool capable of tracking and
analyzing refugees and other people forced to evacuate their homes3 .
    In this work, our goal is to analyze media from Germany as one of the highly
affected European countries to address the following questions: “What are the
main concerns of each party or news source? How does the perception evolve
over time? How is the perception influenced by events? How similar are different
parties and sources in this aspect?”. As a result, we propose an online topic
modeling method to keep track of the topics appearing over time and evaluate
the results on a relatively large dataset from German media.
1
  http://gdeltproject.org/data.html#rawdatafiles
2
  https://foreignpolicy.com/2014/05/30/did-the-arab-spring-really-spark-a-wave-of-global-protests/
3
  https://www.un.org/press/en/2017/pi2207.doc.htm


                                            1
2   Online Topic Modeling

The generative process for Latent Dirichlet Allocation (LDA) is given as follows:

               φ ∼ Dir(β), θ ∼ Dir(α), z ∼ Mult(θ), w ∼ Mult(φz )               (1)

    For each topic k, a multinomial distribution φk over words is drawn from
a Dirichlet distribution with parameter β. For each document d, a distribution
over topics θ is drawn from a Dirichlet with parameter α. For each word wdi in
document d a topic indicator zdi is drawn from the multinomial distribution θ.
Finally, the word w is drawn from the multinomial distribution φzdi associated
with the chosen topic.
    To track the topics online, we separate the data into different time slices
D = {D1 , . . . , Dt−1 , Dt }. Following the method proposed by AlSumait et al. [1],
for each time slot t our method learns a topic model by Gibbs sampling [3] where
the parameters β are a weighted mixture of the matrices φ1 , . . . , φt−1 from the
previous time slots:
                                           t−1
                                           X      0   0
                                     βkt =     ω t φtk ,                        (2)
                                       t0 =1

where ω t is the weight associated with time slot t.
    In practice, this means that one has to keep all matrices φt associated with
all time slots in memory to compute the weighted sum for the current time slot.
This is inefficient in terms of memory and runtime and not in the spirit of a true
online method. In their experiments, AlSumait et al. [1] therefore only use the
previous time slot, meaning ω t is zero for all other time slots. This makes the
method more practically relevant; however, it introduces a problem: Consider
the case where a certain topic occurs in one time slot, is absent in the next time
slot, and reoccurs in the next. In this case, the model will forget everything from
the previous occurrence of the topic since it only takes the previous time slot
into account. This makes the results highly dependent on the size of the data
slices and the content of the data.
    Our solution is based on the definition of variational Bayes online topic mod-
els [4]. In online variational Bayes, instead of taking samples, a natural gradient
is calculated. After each batch, the model is then updated as

                             φt = (1 − ρ)φt−1 + ρφ̂t ,                          (3)
where ρ is a real-valued update parameter in the [0, 1] interval and φ̂ is the
estimate for φ based on the current batch. In our model, we adopt this strategy
and only use it for updating the parameter β:

                            β t = (1 − ρ)β t−1 + ρφt−1 .                        (4)
    This means that we let the prior parameter β converge to a stationary point,
whereas φt is specific to a certain time slot t. Thus, we can analyze the data in
a certain time slot while inducing the model to keep the topics stable over time


                                         2
without having to save any of the previous matrices φ. In contrast to the online
method by Hoffman et al. [4], our method learns topics that are only based on
the data from the current time slot, making it easier to track changes or detect
specific events.


3   Experiments
We extracted a set of news articles with a keyword related to refugees from
January 2016 to May 2017 from German media. We preprocessed the data by
removing numbers, stop words and words with one letter and made all letters
lower case. Those instances which become empty by preprocessing are removed;
hence, the dataset is reduced to 208 , 683 articles with 71 , 633 features. To run
offline LDA and the online topic modeling method, we set the number of topics
to 100 . The time slots contain 10 , 000 instances, and the method is repeated
100 times for each slot. The update parameter ρ is set to 10010.9 , according to
the instructions provided by Hoffman et al. [4].
     Figure 1 illustrates the results of offline LDA and the topic evolution of the
online topic modeling method as a word cloud for one of the topics which is
mainly about the AFD party (a right-wing populist political party in Germany)
and their news related to refugees. Each box presents the English translation of
the top 20 most frequent words of the current topic. The dates show the start
point of the time slot, and we represent every third slot in this figure. We can see
that although the topic changes over time, it is interestingly all about the AFD
party news. The advantage of this model over the previous online LDA models
(e.g. [4]) is that it puts more emphasis on the current batch than updating the
previous topic model incrementally with a marginal effect of the new batch.
     Looking into the most frequent words of each temporal topic, we can observe
that in each period, based on the upcoming events, some topics are highlighted:
e.g., in the Landtag election of the state North Rhine Westphalia, which was
held on 14 May 2017, Helmut Seifen (the AFD politician) was elected. We can
already see him on top of the news related to refugee politics on 2017 − 03 − 31;
or because of the importance of the Bundestag election for a young party like
AFD, we can observe many discussions related to that since the beginning of
2017, although the election is held only in September 2017. This topic is one
of the 100 resulting topics by our method. We observe some other topics about
other parties (e.g., SPD and CDU), some topics related to refugee integration,
some about job markets or even about women and children. However, not all of
the topics are that well-defined.


4   Conclusion, Challenges, and Future Work
We proposed an online topic modeling method to find the topics related to
refugees in German media. During our experiments, we faced several challenges.
Our first goal was to find a categorization of the reasons for being against or
in favor of accepting the refugees among different opinions. Although the model


                                         3
                          petry
                      partei
                       alternative
                                     eu


                                      deutschland
                                          spitzenkandidatin


                                                   januar
                                                                steht

                                                                 jörg

                                                                 landtag


                                                                               afd
                                                                           wähler
                                                                                         landtagswahl


                                                                                    parteien
                                                                                    frauke     euro
                                                                                                        klar
                                                                                                        lucke


                                                                                                                üchtlinge
                                                                                                                              bundeskanzlerin


                                                                                                                             türkei

                                                                                                                        bundestagswahl
                                                                                                                                       angela


                                                                                                                                      migration
                                                                                                                                                  politiker
                                                                                                                                                  cdu


                                                                                                                                                                   grünen
                                                                                                                                                              alexander


                                                                                                                                                                 merkel


                                                                                         gauland


                                                                                                                                                                                 2017-02-06
                                                                   2016-07-05                                           2017-01-11
                    2016-01-18                                                                                                                                                afd party parties
                                                              afd party convention                                        afd party
                 afd men pictures                                                                                                                                            germany april cdu
                                                              april elected named                                   bundestag election
                germany members                                                                                                                                              bundestag election
                                                                remain perceived                                    petry percent polls
                   pegida petry                                                                                                                                                  election left
                                                              germany best racism                                         april poll
                fugitives leave last                                                                                                                                            right populist
                                                                   relationships                                     party convention
                   lower saxony                                                                                                                                                alternative saar
                                                                nationalism level                                   frauke union lucke
                member april party                                                                                                                                            landtag election
                                                                    effort color                                      government cdu
                belongs help frauke                                                                                                                                          election campaign
                                                                    folkloristic                                       grünen linken
                  saxony summer                                                                                                                                                 württemberg
                                                               from rhine humor                                     difficult wing köln
                    disappeared                                                                                                                                             september politician
                                                                 allegedly british                                          bernd
                                                                                                                                                                             polls baden march


                    2017-05-01
                                                                                                                         2017-03-31                                              2017-03-08
                    vorpommern
                                                                                                                    afd refugee politics                                    afd german members
                  mecklenburg afd                                 2017-04-16
                                                                                                                        seifen helmut                                       government majority
                   strongest force                               afd cdu parties
                                                                                                                     election campaign                                        keeps participates
                 parliament berlin                             bundestag politics
                                                                                                                    worker mobilization                                        stop planned us
               union germany party                               party coalition
                                                                                                                    leave party parties                                         president exit
                 according to cdu                              answer spd grünen
                                                                                                                     citizens currently                                     climate change paris
                   parties brussels                            linke wagenknecht
                                                                                                                       strong records                                           energy politics
                      problems                                linnemann ask vote
                                                                                                                           elections                                         climate agreement
                 social democratic                             vacuum linken get
                                                                                                                      refugee numbers                                       terminate proved cli-
                  landtag election                              contribute union
                                                                                                                     times terror strife                                    mate protection plan
                faction commission
                                                                                                                          left party                                             human made
                         see


Fig. 1: An example of topic evolution on a topic relevant to AFD party. The word cloud
illustrates the output of a batch LDA on the data while the text boxes show the output
of our proposed method.


finds interesting topics in the data, this goal remains unsolved. Another unsuc-
cessful attempt was to find an unsupervised method to cluster different sources
based on their opinions expressed in their articles with the hope of finding their
political view. As a future work one could develop a semi-supervised approach
to build a topic model which can reach these goals.

References
1. AlSumait, L., Barbará, D., Domeniconi, C.: On-line lda: Adaptive topic models for
   mining text streams with applications to topic detection and tracking. In: Eighth
   IEEE International Conference on Data Mining. pp. 3–12 (2008)
2. Coletto, M., Esuli, A., Lucchese, C., Muntean, C.I., Nardini, F.M., Perego, R.,
   Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks:
   Perception of the mediterranean refugees crisis. In: IEEE/ACM International Con-
   ference on Advances in Social Networks Analysis and Mining. pp. 1270–1277 (2016)
3. Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceedings of the Na-
   tional Academy of Sciences of the United States of America. vol. 101, pp. 5228–5235.
   National Academy of Sciences (2004)
4. Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent dirichlet allocation.
   In: Advances in neural information processing systems. pp. 856–864 (2010)


                                                                                                                4

</pre>