<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Rieger); kalange@statistik.tu-dortmund.de (K. Lange);
lfossdorf@statistik.tu-dortmund.de (J. Flossdorf); jentsch@statistik.tu-dortmund.de (C. Jentsch)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Dynamic change detection in topics based on rolling LDAs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jonas Rieger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai-Robin Lange</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathan Flossdorf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Jentsch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Statistics, TU Dortmund University</institution>
          ,
          <addr-line>44221 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Topic modeling methods such as e.g. Latent Dirichlet Allocation (LDA) are popular techniques to analyze large text corpora. With huge amounts of textual data that are collected over time in various fields of applied research, it becomes also relevant to be able to automatically monitor the evolution of topics identified from some sort of dynamic topic modeling approach. For this purpose, we propose a dynamic change detection method that relies on a rolling version of the classical LDA that allows for coherently modeled topics over time that are able to adapt to changing vocabulary. The changes are detected by assessing the intensity of word change in the LDA's topics over time in comparison to the expected intensity of word change under stable conditions using resampling techniques. We apply our method to topics obtained by applying the RollingLDA to Covid-19 related news data from CNN and illustrate that the detected changes in these topics are well interpretable.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;change point</kwd>
        <kwd>event</kwd>
        <kwd>shift</kwd>
        <kwd>narrative</kwd>
        <kwd>story</kwd>
        <kwd>evolution</kwd>
        <kwd>monitoring</kwd>
        <kwd>Latent Dirichlet Allocation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        there are the two perspectives towards this issue: ofline and online applications. Our approach
is applicable for both tasks, but, for each time point, it relies exclusively on the text data that has
already been observed. Hence, we focus on the usually more relevant task of online monitoring.
In traditional schemes for change detection [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], control charts are applied to visualize the
monitoring procedure using a control statistic which is successively calculated for each time
point. An alarm is triggered whenever the statistic lies outside of some control limits. In practice,
there are a variety of diferent control charts including memory-free setups (e.g. Shewhart
charts) and memory-based charts (e.g. EWMA, CUSUM), However, these traditional procedures
can not be applied to textual data of the shelf because of the high dimensionality of large text
corpora. In addition, an in-control state to reliably calculate the control limits is frequently
not available due to the strong dynamics in text data, e.g. newspaper articles. To overcome
these issues, we propose to use a control statistic based on a similarity metric that represents
the resemblance of topic’s word distributions over consecutive time points. Control limits
are derived by a resampling procedure using word count vectors based on time-variant topics
modeled by RollingLDA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>In a similar context, the usage of LDA was proposed for change point detection for topic
distributions in texts [4], which is based on a modified version of the wild binary segmentation
algorithm [5] designed for ofline detection setups. There is also work considering Bayesian
online monitoring [6] for textual data using a document-based model [7] and an approach based
on similarity metrics, which aims to detect global events in topics in ofline settings [ 8]. There
is also work which analyzes the transitions of narratives between topics [9]. In contrast, the
rolling window approach of RollingLDA constructs coherently interpretable topics modeled
over time and allows the resulting dynamic change detection method to become applicable
in online settings. Compared to the mentioned related methods, our method is designed to
detect changes in word distributions of topics over time rather than global changes in topic
distributions (of sets) of documents [e.g. 4, 7, 8] or sentiments in topics [e.g. 10] or (in contrast)
changes in topic distributions of words [e.g. 11]. This results in a more refined monitoring
procedure that allows for the detection of narrative shifts that are changing the word usage
within a certain topic instead of measuring the frequency of a topic over time within the whole
corpus. Building on this, we aim that our proposed method can provide groundwork for the
extraction and temporal localization of narratives in texts.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodological framework</title>
      <p>For the proposed change detection algorithm, we make use of the existing method of a rolling
version of the classical LDA (RollingLDA) to construct coherent topics over time and measure
similarities of topics for consecutive time points using the well-established cosine similarity.</p>
      <sec id="sec-2-1">
        <title>2.1. Latent Dirichlet Allocation</title>
        <p>The classical LDA [12] models distributions of  latent topics for each text. Let   () be a single
word token at position  = 1, …  () in text  = 1, … ,  of a corpus of  texts. Then, a single
text is given by
and the corresponding topic assignments for each text are given by
 () = ( 1() , … , 
 ()() ) ,   ()
∈  = { 1, … ,   }.</p>
        <p>From this, let</p>
        <p>() ,  = 1, … ,  ,  = 1, … , 
 to topic  . Then, we define the cumulative count of word  in topic  over all texts by  
(•) and
denote the number of assignments of word  in text
probability model [13] can be written as
denote the total count of assignments to topic  by  
(••). Using these definitions, the underlying
  ()
∣   () ,   ∼ Discr(  ),   ∼ Dir(),
  () ∣   ∼ Discr(  ),   ∼ Dir().</p>
        <p>For a given parameter set { , , }</p>
        <p>, with the Dirichlet priors  and  defining the type of mixture
of topics in every text and the type of mixture of words in every topic, LDA assigns one of the
 topics to each token. A word distribution estimator per topic for   = ( ,1 , … ,  , ) ∈ (0, 1)
can be derived through the collapsed Gibbs sampler procedure [13] by
 ̂, =

 (•) + 
 (••)</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. RollingLDA</title>
        <p>
          RollingLDA [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is a rolling version of classical LDA. New texts are modeled based on existing
topics of the previous model. Thereby, not the whole knowledge of the entire past of the model
is used, but only the information of the topics from more recent texts based on a user-chosen
memory parameter. For each time point, based on the topic assignments within this memory
period, the topics are initialized and modeled forward. This form of modeling preserves the topic
structure of the model so that topics remain coherently interpretable over time. At the same
time, constraining the knowledge of the model to the user-chosen memory period allows for
changes in topics based on new vocabulary or word choices. There are other dynamic variants
of the LDA approach [14, 15, 16, 17, 18] deliberately designed to model gradual changes, and
therefore not as well suited to detect abrupt changes. We use the update algorithm RollingLDA
to make our proposed change detection method applicable in an online manner. Thereby, a text
is assigned to a time point on the basis of its publication date. The step size of the and is chosen
on a weekly basis in the present case as this seems natural for journalistic texts.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Similarity</title>
        <p>Our change detection algorithm builds on a similarity measure for word count vectors. Following
up on the notation from Section 2.1 the word count vector for topic  ∈ {1, … ,  }
at one time
point  ∈ {0, … ,  } is given by
 | = ( (| •1), … ,  |</p>
        <p>(• ) ) ∈ ℕ0 = {0, 1, 2, …} .</p>
        <p>Then, monitoring the similarity of topics over time for (consecutive) time points  1 and  2 is
done using the cosine similarity
cos ( | 1
,  | 2) =
∑  | 1
(•)

(•)
| 2
√∑ ( | 1
(•) )2√∑ ( | 2
(•) )2
.</p>
        <p>
          The choice of cosine similarity is common in the context of change point detection for text
data [e.g. 8, 19]. Compared to other similarity measures such as the Jaccard coeficient,
JensenShannon Divergence,  2-, Hellinger and Manhattan Distance, the cosine similarity fulfills some
typical conditions required for monitoring a similarity measure [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Change detection</title>
      <p>In combination with the existing method RollingLDA and cosine similarity, our contributed
method for change detection relies on classical resampling approaches to identify changes
within topics. We estimate the realized change in a topic based on the similarity between the
current and previous count vectors of word assignments and compare the resulting similarity
score to resampling-based similarity scores which are generated under stable conditions, such
that no extraordinary changes occurred in the topic.</p>
      <sec id="sec-3-1">
        <title>3.1. Set of changes</title>
        <p>−   , … ,  − 1 , given by
Suppose we consider  topics over  time points to be monitored. If the actual observed
similarity of the word vector of some topic  ∈ {1, … ,  }
at some time  ∈ {0, 1, … ,  }
given
by  | , compared to the frequency vector of the topic over a predefined reference time period
 |(−
 )∶(−1) = ∑  |− ,



=1
(2)
(3)
(4)
is smaller than a threshold   which is calibrated based on similarities under stable conditions
(see Section 3.2), then we identify a change within topic  at time  . The set of identified changes
in topic  up to time point  can then be defined as</p>
        <p>= { ∣ 0 &lt;  ≤  ≤  ∶
cos ( |
,  |(−
 )∶(−1) ) &lt;  
 } ∪ 0,
where the time point  = 0 is always included for technical reasons, to compute the current run
length without a change   = min { max,  − max  −1 }. Thus, the reference period spans the
last  max time points if no change was detected during that time, and spans the time that has
passed since the last change, otherwise. The parameter  max is to be chosen by the user and is
intended to smooth the similarities to prevent from detecting false positives.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dynamic thresholds</title>
        <p>let  ̂ and  ̂

(−  )∶(−1) be defined by</p>
        <p>For the calculation of the threshold   , the estimated word distribution of a topic  at some time
point  , as well as over the corresponding reference period  −   , … ,  − 1 are needed. For this,
(5)
(6)
(7)
analogously to Equation (1).</p>
        <p>
          The application of the change point detection algorithm is designed for text data, more
precisely for empirical word distributions of  topics modeled by LDA in a given text corpus.
Since word choice - especially in journalistic texts - varies considerably over time, a situation in
which there is no change in the word distribution within topics across consecutive time points
does not reflect the expected situation. Rather, it is to be expected that topics change gradually
on an ongoing basis. Accordingly, our method aims to identify not the numerous customary

changes in the topics, but the unexpectedly large ones. To do so, we define an expected word
distribution  ̃() for time point  under stable conditions that include the customary changes

as a convex combination of the two estimators of the word distribution of topic  , one for the
reference time period  −   , … ,  − 1 and one for the current time point  . Using the mixture
parameter  ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] , which can be tuned based on how substantial the detected changes should
be, the intensity of the expected change is considered in the determination of this estimator by
        </p>
        <p>̃() = (1 − )  ̂
,
(−  )∶(−1)
+   ̂,( ) .</p>
        <p>Our method uses the estimator  ̃() to simulate  expected word count vectors  ̃

based on a parametric bootstrap approach. In this process, each word is drawn according to its
estimated probability of occurrence regarding  ̃() and each sample  consists of  |

number of words assigned to topic  at time point  . Then, we calculate the cosine similarity
(••) draws, the
| ,  = 1, … , 
 ̂

, =


|
(•) + 
(••)
|
,  |(−
 )∶(−1) )
for each of the  = 1, … ,</p>
        <p>bootstrap samples and set the threshold   equal to the 0.01 quantile
of these simulated similarity values generated under stable conditions. Combinations of topics
and time points for which the observed similarity is smaller than the corresponding quantile
are classified as change points according to Equation ( 4).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Analysis</title>
      <p>For conducting the real data analysis, the data set under study was created with Python,
whereas the preprocessing, the modeling, all postprocessing steps and analyses are performed
using R. The scripts for all analysis steps can be found in the associated GitHub repository
github.com/JonasRieger/topicalchanges.</p>
      <sec id="sec-4-1">
        <title>4.1. Data and study design</title>
        <p>To assess the quality of our change point algorithm, we use the TLS-Covid19 data set [20]. It is
generated using Covid-19 related liveblog articles of CNN, collected from January 22nd 2020 up
until December 12th 2021. Each liveblog is interpreted to belong to a topic and comprises texts
and key moments. The texts form a time line containing events, which are summarized by its
key moments. The resulting corpus consists of 27,432 texts and 1,462 key moments. Although
the data set contains multiple key moments per day on average, we do not consider all them a
change point as our aim is to detect larger changes based on aggregated weekly texts. However,
these key moments serve well as indicators, which enable us to check whether the detected
changes are actually related to real events or if they are false positives.</p>
        <p>We use common NLP preprocessing steps for the texts, i.e. characters are formatted to
lowercase, numbers and punctuation are removed. Moreover, a trusted stopword list is applied
to remove words that do not help in classifying texts in topics, we use a lemmatization dictionary
(github.com/michmech/lemmatization-lists) and neglect words with less than two characters.</p>
        <p>We model the CNN data set using RollingLDA on a weekly basis, starting on Saturday of
each week, and we consider the previous week as initialization for the model’s topics. The first
10 days of modeling, Wednesday, January 22nd 2020 until Friday, January 31st 2020, serve as
the initial chunk corresponding to  = 0 . During this period, 605 texts were published. In the
data set, there are weeks that do not contain any texts. In this case, the corresponding time
point is omitted. Then, to model the texts of the following chunk, at least the last 10 texts
are used, as well as all other texts published on the same date as the oldest of these 10 texts.
As parameters, we assume  = 12 topics, define the reference period of the topics to the last
 max = 4 weeks, and choose  = 0.85 , since these values are accountable by plausibility and
seem to yield reasonable results. For other parameter choices, i.e.  = 8, … , 20 ,  max = 1, … , 20,
 = 0.5, … 0.8, 0.81, … , 0.90 , results can be found in our associated repository.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Findings</title>
        <p>The results of our chosen model are displayed in Figure 1. Fig. 1a shows the detected changes
by vertical gray lines, which are the weeks in which the observed similarity (blue curve) is
lower than the expected one (red curve). Furthermore, for two changes we show which words
are mainly causing the detection of the change. The score of a word in a topic at a given time
point is calculated by the topic’s similarity without considering this word and subtracting it
from the actual realized similarity. These leave-one-out cosine impact scores for the words with
the five most negative scores are shown in Fig. 1b and 1c. In general, most of the changes we
detect occur within the first four months of 2020. This is because the wording was constantly
changing, as the Covid-19 epidemic turned into a pandemic over the course of these months.
New people and organizations were associated with Covid-19, which is why we detect a bunch
of consecutive changes in every topic. As the pandemic reached out into further countries,
the detected changes became less frequent for most topics. In the following we share our
interpretation of some exemplary detected changes.</p>
        <p>The third topic, containing information about vaccination and testing procedures, shows
a change in the week starting on the 13th of March 2021. In this week, the AstraZeneca
vaccination process in several EU-states was stopped due the risk of causing blood clots.1
The sixth topic, a topic about medical studies and research, shows a change in the following
week, in which AstraZeneca presented a study about the efectiveness of its vaccine. 2 Another
interesting detection is the change in the vaccination-related topic 10 in December 2020, just as
the vaccination process started in the US.3</p>
        <p>Political changes are also detected in several topics, such as the start of Joe Biden’s presidential
era in late January 2021 in topic 11, the return of Donald Trump to ofice after his Covid-19
infection in October 20204 in topic 9 (cf. Fig. 1c) or the discussion about the origin of the virus
after a WHO report in late March 2021 in topic 9. 5 A Covid-19 outbreak in the South Korean
Sarang-jeil church in August 20206 is detected in topic 2 (cf. Fig. 1b).</p>
        <p>While these topics detect changes across the entire time span, the twelfth topic, representing
the report of the current number of Covid cases, does not detect a single change after March
1CNN online, 2021-03-15 3:03 p.m. ET, “Spain joins Germany, France and Italy in halting AstraZeneca
Covid19 vaccinations”, https://edition.cnn.com/world/live-news/coronavirus-pandemic-vaccine-updates-03-15-21/h_
d938057f2ef588f74565bdbb01f12387, visited on 2022-01-20.</p>
        <p>2CNN online, 2021-03-25 2:48 a.m. ET, “New AstraZeneca report says vaccine was 76% efective in preventing
Covid-19 symptoms”, https://edition.cnn.com/world/live-news/coronavirus-pandemic-vaccine-updates-03-25-21/h_
9f01e2e53b62873f1c742254d27fbf5f, visited on 2022-01-20.</p>
        <p>3CNN online, 2020-12-14 10:08 p.m. ET, “The first doses of FDA-authorized Covid-19
vaccine were administered in the US. Here’s what we know”, https://edition.cnn.com/world/live-news/
coronavirus-pandemic-vaccine-updates-12-15-20/h_32be1a72dc05f874eda167c95c8f1bba, visited on 2022-01-20.</p>
        <p>4CNN online, 2020-10-12 12:01 a.m. ET, “Trump says he tested ‘totally negative’ for Covid-19”, https://edition.
cnn.com/world/live-news/coronavirus-pandemic-10-12-20-intl/h_7570d53b184a5b1d6ec97ce67330e4c9, visited on
2022-01-20.</p>
        <p>5CNN online, 2021-03-29 11:22 a.m. ET, “Upcoming WHO report will deem Covid-19 lab leak extremely
unlikely, source says”, https://www.cnn.com/world/live-news/coronavirus-pandemic-vaccine-updates-03-29-21/h_
1f239fee1b0584ca9a5b6085357ac907, visited on 2022-01-20.</p>
        <p>6CNN online, 2020-08-20 12:55 a.m. ET, “South Korea’s latest church-linked coronavirus outbreak is turning
into a battle over religious freedom”, https://edition.cnn.com/world/live-news/coronavirus-pandemic-08-20-20-intl/
h_288a15acd1b29e732c4e10693641088a, visited on 2022-01-20.
2020. This is most likely because, after the pandemic had reached the US and Europe in early
2020, the number of cases was consistently reported and the interpretations and implication of
those case numbers are detected as changes in other topics. Even in the last months of the data
set, in which the number of texts decreased and the results thus show a lower similarity, the
twelfth topic retained a rather high similarity of above 0.75.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>In this paper, we presented a novel change detection method for text data. To construct
coherently interpretably topics, we used RollingLDA to model a time series on textual data and
compared the model’s word distribution vectors with those of texts resampled under stable
conditions. We applied our model on the TLS-Covid19 data set consisting of Covid-19 related
news articles from CNN between January 2020 and December 2021.</p>
      <p>Our method detects several meaningful changes in the evolving news coverage during the
pandemic, including e.g. the start of vaccinations and several controversies over the course of
the vaccination campaign as well as political changes such as the start of Joe Biden’s presidential
era. Out of 78 detected changes, we were instantly able to judge 55 (71%) as plausible ones based
on manual labeling using the leave-one-out cosine impacts (cf. Fig. 1b, 1c and repository). The
share increases to 78% if we exclude the turbulent initial phase of the Covid-19 pandemic and
only consider changes since April 2020. While we cannot tell how many changes were missed
out that could be considered as important as the ones mentioned above, our model contains a
mixture parameter to calibrate the detection for general change of topics within a usual news
week. If more, but less substantial or less, but more substantial changes are to be detected,
this parameter  can be tuned accordingly. In combination with the maximum length of the
reference period  max, the set {,  max} forms the model’s hyperparameters to be optimized.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The present study is part of a project of the Dortmund Center for data-based Media Analysis
(DoCMA) at TU Dortmund University. The work was supported by the Mercator Research
Center Ruhr (MERCUR) with project number PR-2019-0019. In addition, the authors gratefully
acknowledge the computing time provided on the Linux HPC cluster at TU Dortmund University
(LiDO3), partially funded in the course of the Large-Scale Equipment Initiative by the German
Research Foundation (DFG) as project 271512359.
[4] A. Bose, S. S. Mukherjee, Changepoint analysis of topic proportions in temporal text data,
2021. arXiv:2112.00827.
[5] P. Fryzlewicz, Wild binary segmentation for multiple change-point detection, The Annals
of Statistics 42 (2014) 2243–2281. doi:10.1214/14-AOS1245.
[6] R. P. Adams, D. J. MacKay, Bayesian online changepoint detection, 2007. arXiv:0710.3742.
[7] T. Kim, J. Choi, Reading documents for bayesian online change point detection, in:
Proceedings of the 2015 EMNLP-Conference, ACL, 2015, pp. 1610–1619. doi:10.18653/
v1/D15-1184.
[8] N. Keane, C. Yee, L. Zhou, Using topic modeling and similarity thresholds to detect events,
in: Proceedings of the The 3rd Workshop on EVENTS: Definition, Detection, Coreference,
and Representation, ACL, 2015, pp. 34–42. doi:10.3115/v1/W15-0805.
[9] Q. Mei, C. Zhai, Discovering evolutionary theme patterns from text: An exploration of
temporal text mining, in: Proceedings of the 11th SIGKDD-Conference, ACM, 2005, pp.
198–207. doi:10.1145/1081870.1081895.
[10] Q. Liang, K. Wang, Monitoring of user-generated reviews via a sequential reverse
joint sentiment-topic model, Quality and Reliability Engineering International 35 (2019)
1180–1199. doi:10.1002/qre.2452.
[11] L. Frermann, M. Lapata, A Bayesian model of diachronic meaning change, Transactions of
the Association of Computational Linguistics 4 (2016) 31–45. doi:10.1162/tacl_a_00081.
[12] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent Dirichlet Allocation, Journal of Machine Learning</p>
      <p>Research 3 (2003) 993–1022. doi:10.1162/jmlr.2003.3.4-5.993.
[13] T. L. Grifiths, M. Steyvers, Finding scientific topics, Proceedings of the National Academy
of Sciences 101 (2004) 5228–5235. doi:10.1073/pnas.0307752101.
[14] X. Song, C.-Y. Lin, B. L. Tseng, M.-T. Sun, Modeling and predicting personal information
dissemination behavior, in: Proceedings of the 11th SIGKDD-Conference, ACM, 2005, pp.
479–488. doi:10.1145/1081870.1081925.
[15] D. M. Blei, T. L. Grifiths, M. I. Jordan, J. B. Tenenbaum, Hierarchical topic models and the
nested chinese restaurant process, in: Advances in Neural Information Processing Systems,
volume 16, MIT Press, 2003, pp. 17–24. URL: https://proceedings.neurips.cc/paper/2003/
hash/7b41bfa5085806dfa24b8c9de0ce567f-Abstract.html.
[16] X. Wang, A. McCallum, Topics over time: A non-markov continuous-time model of
topical trends, in: Proceedings of the 12th SIGKDD-Conference, ACM, 2006, pp. 424–433.
doi:10.1145/1150402.1150450.
[17] D. M. Blei, J. D. Laferty, Dynamic topic models, in: Proceedings of the 23rd
ICML</p>
      <p>Conference, ACM, 2006, pp. 113–120. doi:10.1145/1143844.1143859.
[18] C. Wang, D. M. Blei, D. Heckerman, Continuous time dynamic topic models, in:
Proceedings of the 24th UAI-Conference, AUAI Press, 2008, pp. 579–586. URL: https:
//dl.acm.org/doi/10.5555/3023476.3023545.
[19] Y. Wang, C. Goutte, Real-time change point detection using on-line topic models, in:
Proceedings of the 27th ACL-Conference, ACL, 2018, pp. 2505–2515. URL: https://www.
aclweb.org/anthology/C18-1212.
[20] A. Pasquali, R. Campos, A. Ribeiro, B. Santana, A. Jorge, A. Jatowt, TLS-Covid19: A new
annotated corpus for timeline summarization, in: Advances in Information Retrieval, ECIR
2021, volume 12656 of LNCS, 2021, pp. 497–512. doi:10.1007/978-3-030-72113-8_33.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jentsch</surname>
          </string-name>
          , J. Rahnenführer,
          <string-name>
            <surname>RollingLDA:</surname>
          </string-name>
          <article-title>An update algorithm of Latent Dirichlet Allocation to construct consistent time series from textual data</article-title>
          ,
          <source>in: Findings Proceedings of the 2021 EMNLP-Conference, ACL</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2337</fpage>
          -
          <lpage>2347</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          . findings-emnlp.
          <volume>201</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Montgomery</surname>
          </string-name>
          ,
          <article-title>Introduction to statistical quality control</article-title>
          , John Wiley &amp; Sons,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Oakland</surname>
          </string-name>
          , Statistical process control,
          <source>Routledge</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>