<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ross</forename><surname>Deans Kristensen-Mclachlan</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Linguistics, Cognitive Science, and Semiotics</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rebecca</forename><forename type="middle">M M</forename><surname>Hicke</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Cornell University</orgName>
								<address>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Márton</forename><surname>Kardos</surname></persName>
							<email>martonkardos@cas.au.dk</email>
							<affiliation key="aff0">
								<orgName type="department">Center for Humanities Computing</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mette</forename><surname>Thunø</surname></persName>
							<email>mettethunoe@cas.au.dk</email>
							<affiliation key="aff3">
								<orgName type="department">Department of Global Studies</orgName>
								<orgName type="institution">Aarhus University</orgName>
								<address>
									<country key="DK">Denmark</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Kristensen-Mclachlan</surname></persName>
						</author>
						<title level="a" type="main">Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BE0368A03D70ACDEB2E6727B204C3ACD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>novelty</term>
					<term>contextual topic models</term>
					<term>Chinese</term>
					<term>information dynamics</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Does the People's Republic of China (PRC) interfere with European elections through ethnic Chinese diaspora media? This question forms the basis of an ongoing research project exploring how PRC narratives about European elections are represented in Chinese diaspora media, and thus the objectives of PRC news media manipulation. In order to study diaspora media efÏciently and at scale, it is necessary to use techniques derived from quantitative text analysis, such as topic modelling. In this paper, we present a pipeline for studying information dynamics in Chinese media. Firstly, we present KeyNMF, a new approach to static and dynamic topic modelling using transformer-based contextual embedding models. We provide benchmark evaluations to demonstrate that our approach is competitive on a number of Chinese datasets and metrics. Secondly, we integrate KeyNMF with existing methods for describing information dynamics in complex systems. We apply this pipeline to data from five news sites, focusing on the period of time leading up to the 2024 European parliamentary elections. Our methods and results demonstrate the effectiveness of KeyNMF for studying information dynamics in Chinese media and lay groundwork for further work addressing the broader research questions.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>A number of major elections took place in the West over the course of 2024. Across Europe, citizens took to the polls in early June to elect members to the European Parliament. In France, the election of a new Assemblée nationale caused political turmoil, while the United Kingdom voted in a Labour government for the first time in 14 years. On the other side of the Atlantic Ocean, the United States of America will vote to determine their new President in November. The fallout of these elections remains to be seen but it seems clear that this year is one of political change and upheaval.</p><p>Much digital ink is spilled on these topics in Western media as the various electorates determine their preferences before elections and digest the fallout afterwards. Moreover, a significant part of this media coverage is fundamentally persuasive, aiming to convince voters to bet on the candidate who most closely aligns with the social and economic ideology of the media outlets and their owners <ref type="bibr" target="#b12">[13]</ref>. Likewise, coverage of these elections is not limited to European media institutions, with media outlets around the world updating their readership on how these elections impact them.</p><p>In this context, one particular type of media stands out as especially interesting: ethnic Chinese media targeting diaspora communities in Europe, a group which by some estimates comprises around 1.5-3 million individuals. These media outlets are potentially invaluable sources for understanding how the Chinese government and the Chinese Communist Party (CCP) attempt to influence the diaspora. Furthermore, studying these outlets potentially provides unique insights into how China views itself in relation to the West by showing how the PRC presents itself to its diaspora groups. A growing body of literature has already begun to address these questions in the context of social media <ref type="bibr" target="#b27">[28,</ref><ref type="bibr" target="#b28">29]</ref> or in terms of digital infrastructure more generally <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>. In ongoing research, our aim is to assess whether Chinese diaspora news sources intend to impact opinions on elections in the West during 2024. We attempt to understand the control of information flow in Chinese diaspora media and how this control is used to set specific agendas during electoral periods: promoting certain political parties or individual candidates, polarizing citizens, and attacking or promoting specific political positions.</p><p>To pursue this research, we design a pipeline for analyzing large amounts of Chineselanguage news data. First, we introduce KeyNMF, a novel approach to creating contextsensitive topics models via transformer-based encoder models. KeyNMF can be trivially applied across different languages and in data scarce environments, and is shown here to create coherent, human-interpretable outputs when working with Chinese language data. We then integrate KeyNMF with existing techniques for describing the information dynamics of complex systems which measure the novelty and resonance of information present in a system over time. We use this pipeline to perform preliminary analysis on our dataset of Chinese diaspora media, finding clear trends in the novelty and resonance signals which correlate with significant political events. The results presented are thus intended to be both a proof of concept and a stepping stone towards more meaningful understanding of the dynamics underlying Chinese diaspora media.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Information Dynamics</head><p>The study of information dynamics in complex cultural systems has been a central aspect of research in computational humanities and cultural analytics in recent years. One of the most promising approaches to this problem was introduced in <ref type="bibr" target="#b2">[3]</ref> which studied the shifting debates which took place during the French Revolution. In this approach, divergence in content between different time slices can be calculated using information-theoretic measures. These measures can then be used to quantify two interrelated values: the novelty of the system, or how much the new time slice diverges from preceding time slices; and the resonance of this information, which describes how information persists over time.</p><p>Novelty-resonance patterns have been studied in a number of different discourse domains. <ref type="bibr" target="#b20">[21]</ref> demonstrate their usefulness in identifying so-called trend reservoirs on Reddit. Similar interaction patterns between novelty and resonance have been successfully employed to study the manner in which online news media responded to catastrophic events <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b18">19]</ref>. In <ref type="bibr" target="#b30">[31]</ref>, the same fundamental method of analysis demonstrates that novelty-resonance patterns clearly track major social and historical events in the 20th century, using data taken from the front page of Dutch newspapers.</p><p>Calculating these underlying dynamics requires the creation of some kind of numerical representation of the data. Specifically, the difference between individual windows is computed by finding the windowed relative entropy, in this case calculated using Jensen-Shannon Divergence (JSD). Since JSD computes the distance between probability distributions, the numerical representations of the data are required to take that form. In <ref type="bibr" target="#b1">[2]</ref>, this was achieved by calculating the probabilities of a pre-trained, BERT-based emotion classification model, where the predicted probabilities for each label created a distribution over emotions for each document. However, for most purposes, novelty and resonance are calculated based on distributions generated by a probabilistic topic model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Vanilla LDA</head><p>Typically, novelty and resonance are calculated from topic probability distributions extracted by Latent Dirichlet Allocation (LDA) <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b6">7]</ref>. Topic distributions in documents are a natural choice for information dynamics, as they are immediately usable with entropy-based measures. LDA is a generative bag-of-words model, which assumes that a document contains a mixture of topics and all words in the document are drawn from this mixture distribution.</p><p>However, LDA has a number of well-known shortcomings. Documents have to be heavily pre-processed for optimal results; otherwise, the topic descriptions produced by the model are often contaminated by noise and stop words <ref type="bibr" target="#b15">[16]</ref>. In addition, since LDA makes the bag-ofwords assumption, it cannot utilize contextual and syntactic information, nor general properties of natural language learned from outside sources. Finally, LDA is sensitive to hyperparameter choices and Wallach, Mimno, and McCallum <ref type="bibr" target="#b29">[30]</ref> demonstrate that using symmetric Dirichlet priors, which is the case in canonical implementations <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b21">22]</ref> and the majority of academic studies, can lead to sub-optimal performance.</p><p>There have also been challenges to the generalizability of LDA from the perspective of Chinese NLP, as the primary structural and semantic unit of Chinese is the character rather than the word <ref type="bibr" target="#b31">[32,</ref><ref type="bibr" target="#b23">24]</ref>. While these concerns might be overstated, working with Chinese language data causes specific challenges in terms of tokenization and semantics which directly impact the efÏcacy of traditional LDA approaches to topic modelling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Alternatives to LDA</head><p>A major shortcoming of LDA when trying to model change over time is that topics are calculated over all documents, essentially flattening any temporal aspect of the data. This is unde-sirable, since topics themselves naturally evolve over time, meaning that LDA may not reflect the true dynamics of a system. These issue is partly rectified by dynamic topic models <ref type="bibr" target="#b7">[8]</ref> which account for temporal changes in topics with a state-space model. However, Dynamic LDA models are even more parameter-rich than the vanilla implementation and thus amplify its limitations.</p><p>Recently, contemporary topic models have shown that it is possible to utilize embeddings from the sentence transformers <ref type="bibr" target="#b26">[27]</ref> to infuse contextual information into topic models and to allow for transfer learning <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b0">1,</ref><ref type="bibr" target="#b15">16]</ref>. This contextual information can lead to more coherent and semantically interpretable topics. In addition, since these models draw on existing pre-trained language models, they do not require training a generative model from scratch. This means that it is possible to train topic models in data scarce contexts where traditional LDA might perform poorly.</p><p>Among the most popular of these contemporary models is BERTopic <ref type="bibr" target="#b13">[14]</ref>, which also has dynamic modelling capabilities. In this model, topic-term importances are estimated post-hoc on pre-defined time slices based on one underlying topic model. However, as with LDA, BERTopic is sensitive to pre-processing <ref type="bibr" target="#b15">[16]</ref>. Additionally, because BERTopic is a clustering topic model, documents are only assigned a single topic label. This renders the model impractical in settings where documents are expected to contain multiple topics and means that BERTopic is not suitable for calculating novelty and resonance, since the entropy calculations assume probability distributions over documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">KeyNMF</head><p>We propose KeyNMF, a novel topic modelling approach that utilizes neural text embeddings. KeyNMF builds on the reliability, stability <ref type="bibr" target="#b3">[4]</ref>, scalability <ref type="bibr" target="#b16">[17]</ref>, and interpretability of Nonnegative Matrix Factorization (NMF) <ref type="bibr" target="#b11">[12]</ref>, while mitigating its sensitivity to pre-processing and making use of contextual information in texts. This is achieved by: 1) computing keyword importances from documents with contextual embeddings (similar to KeyBERT <ref type="bibr" target="#b14">[15]</ref>); and 2) decomposing those importances with NMF.</p><p>We release an implementation of KeyNMF as part of the Turftopic Python package.<ref type="foot" target="#foot_0">1</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Model Description</head><p>KeyNMF operationalizes topic extraction as the following steps:</p><p>1. For each document 𝑑: a) Let 𝑥 𝑑 be the document's embedding produced with an encoder model. b) Let 𝑣 𝑤 be the word embedding of a word 𝑤 produced with the same encoder model. c) Let 𝐾 𝑑 be the set of 𝑁 keywords in 𝑑 with the highest cosine similarity to 𝑑:</p><formula xml:id="formula_0">𝐾 𝑑 = arg max 𝐾 * ∑ 𝑤∈𝐾 * sim(𝑥 𝑑 , 𝑣 𝑤 ), where |𝐾 𝑑 | = 𝑁 and 𝑤 ∈ 𝑑 2.</formula><p>Arrange the keyword similarities into a non-negative keyword matrix 𝑀. Let 𝑀 𝑑𝑤 be the importance of keyword 𝑤 in document 𝑑:</p><p>𝑀 𝑑𝑤 = { sim(𝑑, 𝑤), if 𝑤 ∈ 𝐾 𝑑 and sim(𝑥 𝑑 , 𝑣 𝑤 ) &gt; 0 0, otherwise.</p><p>3. Decompose 𝑀 with non-negative matrix factorization: 𝑀 ≈ 𝑊 𝐻 , where 𝑊 is the document-topic matrix, and 𝐻 is the topic-term-matrix. This is achieved with coordinatedescent, minimizing the square loss 𝐿(𝑊 , 𝐻 ) = ||𝑋 − 𝑊 𝐻 ||<ref type="foot" target="#foot_1">2</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Dynamic KeyNMF</head><p>KeyNMF can be used for modelling topics' evolution in a corpus over time. This is done by first computing a global model over the entire corpus, then calculating time-specific topic-term importances in predefined time slices. Specifically:</p><p>1. Compute the keyword matrix 𝑀 for the whole corpus.</p><p>2. Decompose 𝑀 with non-negative matrix factorization: 𝑀 ≈ 𝑊 𝐻 . 3. For each time slice 𝑡: a) Let 𝑊 𝑡 be a subset of 𝑊 and 𝑀 𝑡 a subset of 𝑀 for the documents in time slice 𝑡. b) Obtain the topic-term-matrix for 𝑡 with NMF while fixing 𝑊 𝑡 :</p><formula xml:id="formula_1">𝐻 𝑡 = arg min 𝐻 * ||𝑀 𝑡 − 𝑊 𝑡 𝐻 * || 2</formula><p>c) The temporal importance of topic 𝑗 is then 𝐼 𝑡𝑗 = ∑ 𝑑∈𝑡 (𝑊 𝑡 ) 𝑑𝑗 , where all 𝑑 are documents in time slice 𝑡. We can obtain pseudo-topic distributions in the time-slices by L1-normalizing the temporal importances:</p><formula xml:id="formula_2">P 𝑡𝑗 = 𝐼 𝑡𝑗 ∑ 𝑖 𝐼 𝑡𝑖 .</formula><p>Since NMF is not a probabilistic model, we use temporal pseudo-probabilities as a proxy for topic distributions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Performance</head><p>To demonstrate KeyNMF's effectiveness as a topic model, we evaluate its performance using the topic-benchmark Python package and the paraphrase-multilingual-MiniLM-L12-v2 2 embedding model. 15 keywords are extracted for each document. Our evaluation procedure is based on that of Kardos, Kostkan, Vermillet, Nielbo, Enevoldsen, and Rocca <ref type="bibr" target="#b15">[16]</ref>, but, since our intended use case is Chinese news data, we ran the benchmark using the same corpora and pipeline as in our investigations (see Sections 4 and 5). Additionally, we utilized paraphrasemultilingual-MiniLM for measuring external word embedding coherence, instead of an English Word2Vec model.   Based on our evaluations, KeyNMF's performance is comparable with state-of-the-art contextual topic models, and performs especially well on external coherence, only rivalled by Top2Vec on most corpora, which explicitly selects words based on their proximity in semantic space (see Table <ref type="table" target="#tab_1">1</ref>). The model represents a drastic improvement over classical topic models outperforming both NMF and LDA significantly, indicating that the contextual information infused into the model enhances its performance in a meaningful way.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1.">Sensitivity to Number of Keywords</head><p>We additionally test whether the number of keywords extracted from a text influences the model's performance on different corpora, which allows us to determine KeyNMF's robustness to hyperparameter choices. We used the same news sources, pipeline, and quantitative metrics for evaluating this property of the model as for previous evaluations and analyses. The number of keywords was varied from 5 to 100 with a step size of 5 (see Figure <ref type="figure" target="#fig_0">1</ref>).</p><p>We observed that performance was relatively stable regardless of number of keywords, and converged rather quickly. Only minimal fluctuations are observable with 𝑁 &gt; 25 on most corpora. However, on Xinozhou and Yidali-Huarenjie, lower values of 𝑁 (5-15) resulted in higher coherence scores. We thus deem 15 keywords a balanced choice of 𝑁 for further investigations. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Data</head><p>Having demonstrated the effectiveness of KeyNMF, we use it as the basis for our study of Chinese diaspora media. Our dataset comprises news articles from five sites aimed at Chinese diaspora populations in the EU: Chinanews, <ref type="foot" target="#foot_3">4</ref> Ihuawen, <ref type="foot" target="#foot_4">5</ref> Yidali Huarenjie, <ref type="foot" target="#foot_5">6</ref> Xinouzhou, <ref type="foot" target="#foot_6">7</ref> and Oushinet. <ref type="foot" target="#foot_7">8</ref> We select these sites because they represent a variety of formats, audiences, and perspectives. Oushinet has the largest target audience, with articles in several languages and local journalists writing specifically for the site. In contrast, Xinouzhou reports mostly on Chinese local news, Yidali Huarenjie and Chinanews are community media platforms based in Italy and Scandinavia respectively, and Ihuawen is a weekly magazine based in the United Kingdom.</p><p>Our data collection focuses on articles linked from each site's front page and a selection of subpages we deem likely to contain information on international relations, particularly with Europe (listed in Appendix A). We hypothesize that articles linked from these main pages will reflect the topics each news site is attempting to highlight, and will thus provide information on the priorities of the forces backing the media landscape. We scrape all articles linked from each front page and subpage every six hours using a custom web scraper. An article is only scraped once per time point, even if it was linked from multiple pages, but can be scraped multiple times if it appears at multiple time points. Data collection from four sites -Chinanews, Ihuawen, Xinouzhou, and Oushinet -began at 18:15 on April 30, 2024 and collection from the fifth site, Yidali Huarenjie, began at 12:15 on May 7, 2024. Our dataset includes all articles scraped until 6:15 on June 17, 2024, one week after the EU Parliamentary elections took place. Once scraped, we extract the body of each article from the corresponding html file. We attempt to minimize the amount of boilerplate text (e.g. bylines and publication dates) included in the extracted texts; although it is impossible to remove all such text from our dataset, a hand analysis of ten random articles from each news site indicates that the amount of 'junk' text included in the final dataset is minimal.</p><p>The total and unique number of articles collected from each site are reported in Figure <ref type="figure" target="#fig_1">2</ref>. It is clear that different sites follow different publication patterns. To further validate this, we examine the number of 'new' articles at each time point for each source, or the number of articles that were not included in the last scrape (Figure <ref type="figure" target="#fig_2">3</ref>). We see that some sites, like Xinouzhou and Yidali Huarenjie, frequently refresh the articles displayed on their main pages, leading to a larger number of unique articles. In contrast, sites like Ihuawen appear to keep several articles on the main pages for a long time, meaning that they display a very small number of unique articles overall. These differences likely affect the patterns we see in the information systems for each source.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experimental Design</head><p>Extracted article texts are embedded with a multilingual transformer-based model <ref type="bibr" target="#b25">[26]</ref> 9 using the Sentence Transformers library. 10 The embedding is done entirely on a 64-core CPU with 384GB RAM. Each document is embedded once for each time it appears in the dataset. In total, embedding all the documents takes ∼2 hours. The maximum sequence length of this embedding model is 128 tokens. Thus, any article longer than 128 tokens is truncated and information from later in the piece is not included in the embedding. Although this is a limitation, we do not consider it prohibitive, as previous research has shown that the bulk of the content in a 9 paraphrase-multilingual-MiniLM-L12-v2 10 https://sbert.net news article is presented at the very beginning -a widely-practiced professional standard for journalistic writing known as the inverted pyramid <ref type="bibr" target="#b22">[23]</ref>.</p><p>Since our primary interest is understanding the evolution of information dynamics in each news site over time, we use Dynamic KeyNMF to find topic proportions for each timeslice. For keyword extraction, we utilize the jieba tokenizer and remove stop-words present in an authoritative list, <ref type="foot" target="#foot_8">11</ref> with the retained tokens then encoded using the same multilingual model as was used on the documents <ref type="bibr" target="#b25">[26]</ref>. We fit multiple models with 10, 25, and 50 topics respectively in order to investigate topical dynamics at multiple levels of granularity. Separate models are fit for each news site. The plotted topics over time, top keywords for each topic at each timeslice, and topic distributions at each timeslice are extracted from each model and saved for further analysis.</p><p>We then use the topic pseudo-distributions to measure the novelty and resonance signals for each news site and, following <ref type="bibr" target="#b19">[20]</ref> and <ref type="bibr" target="#b1">[2]</ref>, use windowed relative entropy with Jensen-Shannon divergence to calculate both metrics. For a window of size 𝑛, the novelty at time point 𝑡 is the mean entropy of the topic pseudo-distribution at 𝑡 ( P 𝑡 ) and the 𝑛 previous pseudo-distributions. The transience at time point 𝑡 is the mean entropy of the topic pseudo-distribution at 𝑡 and the 𝑛 subsequent pseudo-distributions. Then, the resonance of a time point is the novelty at that point minus the transience. We use a window of size 12 when calculating both signals, which is equivalent to three days of data.</p><p>We apply nonlinear adaptive filtering to smooth the extracted novelty and resonance, again following <ref type="bibr" target="#b19">[20]</ref> and <ref type="bibr" target="#b1">[2]</ref>. This removes noise from the signals by calculating the value at a given time point relative to the surrounding time points. We use a span of 56, the same as <ref type="bibr" target="#b1">[2]</ref>, for smoothing. The code we use for calculating novelty and resonance is adapted from that released alongside <ref type="bibr" target="#b1">[2]</ref> and <ref type="bibr" target="#b19">[20]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Results and Discussion</head><p>We find clear trends in the novelty and resonance signals that correlate to significant events in the EU during the period studied: Xi Jinping's European Tour (May 5-10), Putin's state visit to China (May 16-17), and the EU parliamentary elections (June 6-9). Our analysis focuses on the novelty and resonance trends extracted from the KeyNMF models with ten topics as these provide the clearest signals. The results for 25 and 50 topics are included in Appendix C.1. We additionally focus our in depth discussion of the results on the two largest news sources, Xinouzhou and Oushinet, for this preliminary validation of the pipeline.</p><p>We see spikes in novelty of varying strengths for both Xinouzhou and Oushinet during Xi Jinping's European tour (Figure <ref type="figure" target="#fig_3">4</ref>). There are also corresponding dips in resonance before his tour for both sites, followed by increases in resonance during the tour. This indicates that novel information is introduced to the site ecosystems during the tour which replaces previous topics of interest, and which persists in the system for some time.</p><p>One of the most productive aspects of Dynamic KeyNMF is that it allows us to study topic fluctuations over time. Thus, we explore which topical shifts contribute to changes in the novelty and resonance signals. For example, on Oushinet, the time period during Xi Jinping's European tour is associated with high pseudo-probabilities for a topic defined by the keywords Paris, France and state visit and a topic defined by President, China, and Xi Jinping (Appendix C.2, Figure <ref type="figure">9</ref>). Towards the end of the tour, a topic on diplomacy and bilaterial relations between China and France also gains prominence. For Xinouzhou, this time period contains a peak in the pseudo-probabilities for two topics on Hungary and Chinese relations with Hungary, one of the locations on the tour. Similarly, there is a noticeable spike in the novelty and resonance for Oushinet directly before Putin's state visit to China. This period is marked by relatively high pseudo-probabilities for a topic characterized by the terms China, Beijing, Chinese, and Chinese News Service and a topic with the keywords Russia, Ukraine, Putin, and Moscow (Appendix C.2, Figure <ref type="figure">7</ref>).</p><p>Most significantly for this study, there are fluctuations in novelty and resonance for both sites around the EU parliamentary elections. Specifically, there are peaks in the novelty and resonance signals for Xinouzhou and Oushinet before and after the elections, with troughs throughout much of the election period. We hypothesize that these trends reflect a focus on election-related news which begins in early June and continues through the elections and then an introduction of new topics after their end. Again examining the topic distributions, we see that for Oushinet the period before and during the election is marked by high pseudo-probabilities for two topics directly related to the parliamentary elections, one topic surrounding the Spanish prime minister, and two on Russia and Ukraine and the Israel-Palestine war (Appendix C.2, Figure <ref type="figure">8</ref>). Interestingly, pseudo-probabilities for the topic most directly focused on the elections continued to grow even after the election, suggesting that Oushinet was still discussing the election results during this time. Similarly, for Xinouzhou, three topics focused on the UK elections, Europe broadly, and the Spanish prime minister were comparatively prominent towards the end of May and beginning of June.</p><p>Overall, we find that this pipeline allows us to effectively locate changes in news ecosystems, correlate these changes to political and cultural events of interest, and further explore possible reasons for these changes via topic models. It reveals differences in media responses both between events and between sites, while also demonstrating the similarities in sites' news ecosystems, such as the increased discussion of the Spanish prime minister on both Xinouzhou and Oushinet before the EU parliamentary elections. We believe that the combination of the novelty and resonance metrics with the novel KeyNMF topic model will permit further in-depth analysis of these media sites and facilitate research on other Chinese-language domains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>In this paper, we present a pipeline designed to facilitate research on the underlying information dynamics of Chinese diaspora media published in Europe. This pipeline combines existing information-theoretic methods that model how new information enters and persists in systems with a novel topic model, KeyNMF. KeyNMF overcomes some of the weaknesses of previous traditional and contextual topic models, demonstrating high performance on standard benchmarks. We validate this pipeline through preliminary experimentation on our dataset of Chinese diaspora media, finding that it reveals informational trends that correlate with major, newsworthy events in European politics and allows for further analysis of the topical changes that cause those trends. While further qualitative research is required to fully understand these dynamics, we believe that we have presented a major step forward in terms of context-sensitive and interpretable topic modelling and information dynamics which can generalize to multilingual and data scarce environments.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Additional Experimental Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.1. Novelty and Resonance Ablations</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>8:</head><p>The distributions over time for eight topics with high pseudo-probabilities around the EU parliamentary elections. These topics are generated by the 10-topic KeyNMF models for Oushinet and Xinouzhou. Note that the y-axis scale differs for each subplot. Note that the y-axis scale differs for each subplot.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Sensitivity of KeyNMF to the choice of 𝑁 keywords on multiple metrics and news sources.</figDesc><graphic coords="6,89.28,246.91,416.72,138.91" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The total and unique number of articles collected for each news site.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3:The number of new articles collected at each time point for each source. An article is 'new' if it did not appear in the collected set of articles from the previous time point.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: The novelty and resonance plots for each news site from KeyNMF with ten topics. The three shaded areas represent Xi Jinping's European tour (May 5-10, 2024), Putin's state visit to China (May 16-17, 2024), and the EU parliamentary elections (June 6-9, 2024). Note that the y-axis ranges differ for each chart.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: The novelty and resonance plots for each news site from KeyNMF with 25 topics. The three shaded areas represent Xi Jinping's European tour (May 5-10, 2024), Putin's state visit to China (May 16-17, 2024), and the EU parliamentary elections (June 6-9, 2024). Note that the y-axis ranges differ for each chart.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :Figure 7 :</head><label>67</label><figDesc>Figure 6: The novelty and resonance plots for each news site from KeyNMF with 50 topics. The three shaded areas represent Xi Jinping's European tour (May 5-10, 2024), Putin's state visit to China (May 16-17, 2024), and the EU parliamentary elections (June 6-9, 2024). Note that the y-axis ranges differ for each chart.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure</head><label></label><figDesc>FigureThe distributions over time for five topics with high pseudo-probabilities during Xi Jinping's European tour. These topics are generated by the 10-topic KeyNMF models for Oushinet and Xinouzhou. Note that the y-axis scale differs for each subplot.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>KeyNMF's performance on Chinese news data against a number of baselines. Topic descriptions were evaluated on diversity (𝑑), internal (𝐶 𝑖𝑛 ) and external (𝐶 𝑒𝑥 ) word embedding coherence.</figDesc><table><row><cell>Model</cell><cell>𝑑</cell><cell>chinanews 𝐶 in 𝐶 ex</cell><cell>𝑑</cell><cell>ihuawen 𝐶 in</cell><cell>𝐶 ex</cell><cell>𝑑</cell><cell>oushinet 𝐶 in</cell><cell>𝐶 ex</cell><cell>𝑑</cell><cell>xinozhou 𝐶 in</cell><cell>𝐶 ex</cell><cell>yidali-huarenjie 𝑑 𝐶 in 𝐶 ex</cell></row><row><cell>KeyNMF S³ Top2Vec</cell><cell cols="12">0.93 0.29 0.63 0.91 0.17 0.64 0.84 0.23 0.58 0.85 0.26 0.55 0.88 0.52 0.57 0.91 0.16 0.47 0.91 0.11 0.47 0.83 0.12 0.54 0.96 0.17 0.55 0.93 0.46 0.52 0.78 0</cell></row></table><note>.14 0.71 0.83 0.10 0.70 0.87 0.12 0.73 0.86 0.14 0.71 0.75 0.46 0.69 BERTopic 0.89 0.31 0.52 0.89 0.26 0.50 0.84 0.23 0.50 0.84 0.26 0.52 0.91 0.57 0.51</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>CTM combined 0.99 0</head><label></label><figDesc>.27 0.52 0.99 0.23 0.51 0.99 0.21 0.51 0.98 0.25 0.51 0.97 0.54 0.49 CTM zeroshot 0.99 0.28 0.53 0.99 0.23 0.50 0.99 0.22 0.50 1.00 0.26 0.51 0.97 0.54 0.51 NMF 0.74 0.27 0.57 0.60 0.18 0.53 0.64 0.18 0.54 0.66 0.18 0.56 0.71 0.49 0.54 LDA 0.61 0.19 0.57 0.53 0.16 0.54 0.41 0.13 0.54 0.48 0.14 0.58 0.57 0.34 0.54</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://x-tabdeveloping.github.io/turftopic/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">This gives Top2Vec an unfair advantage on this metric as it selects descriptive words based on the same criteria as the metric. 𝐶 𝑒𝑥 scores on Top2Vec should thus be interpreted with caution.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://www.chinanews.se</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://ihuawen.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://yidali.huarenjie.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://www.xinouzhou.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">http://www.oushinet.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_8">https://github.com/stopwords-iso/stopwords-zh/blob/master/stopwords-zh.txt</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Part of the computation done for this project was performed on the UCloud interactive HPC system, which is managed by the eScience Center at the University of Southern Denmark.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. News Site Subpages</head><p>The subpages scraped for each news site are listed below:</p><p>• Xinouzhou: France, Italy, Spain, UK, Germany, Hungary, International </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. NPMI Coherence</head><p>Since NPMI Coherence has historical significance in topic modeling literature, we also evaluated topic descriptions with this metrics. Due to theoretical and practical limitations <ref type="bibr" target="#b15">[16]</ref>, however, we do not consider NPMI Coherence a good metric for evaluating topic models. For the sake of completeness, we report 𝐶 NPMI scores in Table <ref type="table">2</ref>. </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Top2Vec: Distributed Representations of Topics</title>
		<author>
			<persName><forename type="first">D</forename><surname>Angelov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2008.09470</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Emodynamics: Detecting and Characterizing Pandemic Sentiment Change Points on Danish Twitter</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Baglini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Østergaard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">N</forename><surname>Larsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourth Conference on Computational Humanities Research, CHR 2022</title>
				<meeting>the Fourth Conference on Computational Humanities Research, CHR 2022<address><addrLine>Antwerp, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="162" to="176" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Individuals, Institutions, and Innovation in the Debates of the French Revolution</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T J</forename><surname>Barron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Spang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dedeo</surname></persName>
		</author>
		<idno type="DOI">10.1073/pnas.1717729115</idno>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the National Academy of Sciences</title>
		<imprint>
			<biblScope unit="volume">115</biblScope>
			<biblScope unit="issue">18</biblScope>
			<biblScope unit="page" from="4607" to="4612" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Stability of Topic Modeling via Matrix Factorization</title>
		<author>
			<persName><forename type="first">M</forename><surname>Belford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Mac</forename><surname>Namee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Greene</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2017.08.047</idno>
	</analytic>
	<monogr>
		<title level="j">Expert Systems With Applications</title>
		<imprint>
			<biblScope unit="volume">91</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="159" to="169" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Pre-Training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bianchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Terragni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hovy</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.acl-short.96</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</title>
		<title level="s">Short Papers</title>
		<meeting>the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="759" to="766" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Cross-lingual Contextualized Topic Models with Zero-shot Learning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bianchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Terragni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hovy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nozza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.eacl-main.143</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume</title>
				<meeting>the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1676" to="1683" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Probabilistic Topic Models</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<idno type="DOI">10.1145/2133806.2133826</idno>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="77" to="84" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Dynamic Topic Models</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Lafferty</surname></persName>
		</author>
		<idno type="DOI">10.1145/1143844.1143859</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd International Conference on Machine Learning</title>
				<meeting>the 23rd International Conference on Machine Learning<address><addrLine>Pittsburgh, Pennsylvania, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="113" to="120" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Latent Dirichlet Allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
		<idno type="DOI">10.5555/944919.944937</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Authoritarian Design: How the Digital Architecture on China&apos;s Sina Weibo Facilitate Information Control</title>
		<author>
			<persName><forename type="first">V</forename><surname>Brussee</surname></persName>
		</author>
		<idno type="DOI">10.1163/22142312-bja10033</idno>
	</analytic>
	<monogr>
		<title level="j">Asiascape: Digital Asia</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="207" to="241" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">&lt;Redirecting&gt; the Diaspora: China&apos;s United Front Work and the Hyperlink Networks of Diasporic Chinese Websites in Cyberspace</title>
		<author>
			<persName><forename type="first">K</forename><surname>Chan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Alden</surname></persName>
		</author>
		<idno type="DOI">10.1080/2474736x.2023.2179409</idno>
	</analytic>
	<monogr>
		<title level="j">Political Research Exchange</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="21" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cichocki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-H</forename><surname>Phan</surname></persName>
		</author>
		<idno type="DOI">10.1587/transfun.E92.A.708</idno>
	</analytic>
	<monogr>
		<title level="j">IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E92.a</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="708" to="721" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Who Won the Election? Explaining News Coverage of Election Results in Multi-Party Systems</title>
		<author>
			<persName><forename type="first">K</forename><surname>Gatterman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Meyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wurzer</surname></persName>
		</author>
		<idno type="DOI">10.1111/1475-6765.12498</idno>
	</analytic>
	<monogr>
		<title level="j">European Journal of Political Research</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="857" to="877" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure</title>
		<author>
			<persName><forename type="first">M</forename><surname>Grootendorst</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2203.05794</idno>
		<idno type="arXiv">arXiv:2203.05794</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">KeyBERT: Minimal Keyword Extraction with BERT</title>
		<author>
			<persName><forename type="first">M</forename><surname>Grootendorst</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.4461265</idno>
	</analytic>
	<monogr>
		<title level="j">Version</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">0</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">𝑆 3 -Semantic Signal Separation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kardos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kostkan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-Q</forename><surname>Vermillet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nielbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Enevoldsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rocca</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2406.09556</idno>
		<idno type="arXiv">arXiv:2406.09556[cs.LG</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Online Algorithms for Nonnegative Matrix Factorization with the Itakura-Saito Divergence</title>
		<author>
			<persName><forename type="first">A</forename><surname>Lefèvre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Févotte</surname></persName>
		</author>
		<idno type="DOI">10.1109/aspaa.2011.6082314</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)</title>
				<meeting><address><addrLine>New Paltz, NY, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011. 2011</date>
			<biblScope unit="page" from="313" to="316" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">News Information Decoupling: An Information Signature of Catastrophes in Legacy News Media</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Baglini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B</forename><surname>Vahlstrup</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Enevoldsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bechmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roepstorff</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2101.02956</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 European Association for Digital Humanities Conference</title>
				<meeting>the 2020 European Association for Digital Humanities Conference<address><addrLine>Krasnoyarsk, Russia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Pandemic News Information Uncertainty -News Dynamics Mirror Differential Response Strategies to COVID-19</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Enevoldsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Baglini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roepstorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0278098</idno>
	</analytic>
	<monogr>
		<title level="j">Plos One</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">e0278098</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">storff</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Haestrup</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Enevoldsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B</forename><surname>Vahlstrup</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Baglini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roep</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2102.06505</idno>
		<idno type="arXiv">arXiv:2102.06505</idno>
	</analytic>
	<monogr>
		<title level="m">When No News is Bad News -Detection of Negative Events from News Media Content</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note>cs.CY</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Trend Reservoir Detection: Minimal Persistence and Resonant Behavior of Trends in Social Media</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B</forename><surname>Vahlstrup</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bechmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2109.08589</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Computational Humanities Research (CHR 2020)</title>
				<meeting>the Workshop on Computational Humanities Research (CHR 2020)<address><addrLine>Amsterdam, the Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="290" to="297" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Scikit-Learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">News and Its Communicative Quality: the Inverted Pyramid -When and Why Did It Appear?</title>
		<author>
			<persName><forename type="first">H</forename><surname>Pöttker</surname></persName>
		</author>
		<idno type="DOI">10.1080/1461670032000136596</idno>
	</analytic>
	<monogr>
		<title level="j">Journalism Studies</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="501" to="511" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Topic modeling of Chinese language beyond a bag-ofwords</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wan</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.csl.2016.03.004</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.csl.2016.03.004" />
	</analytic>
	<monogr>
		<title level="j">Computer Speech &amp; Language</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="60" to="78" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Software Framework for Topic Modelling with Large Corpora</title>
		<author>
			<persName><forename type="first">R</forename><surname>Řehůřek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sojka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</title>
				<meeting>the LREC 2010 Workshop on New Challenges for NLP Frameworks<address><addrLine>Malta</addrLine></address></meeting>
		<imprint>
			<publisher>Valletta</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="45" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2004.09813</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="4512" to="4525" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1410</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3982" to="3992" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Schliebs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bailey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Howard</surname></persName>
		</author>
		<title level="m">China&apos;s Public Diplomacy Operations: Understanding Engagement and Inauthentic Amplification of PRC Diplomats on Facebook and Twitter</title>
				<meeting><address><addrLine>Oxford, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Programme on Democracy &amp; Technology</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">Tech. rep</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">The Initial Digitalization of Chinese Diplomacy (2019-2021): Establishing Global Communication Networks on Twitter</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thunø</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<idno type="DOI">10.1080/10670564.2023.2195811</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Contemporary China</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="244" to="266" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Rethinking LDA: Why Priors Matter</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Wallach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mimno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<meeting><address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Event Flow -How Events Shaped the Flow of the News, 1950-1995</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wevers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kostkan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Nielbo</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2109.08589</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third Conference on Computational Humanities Research, CHR 2021</title>
				<meeting>the Third Conference on Computational Humanities Research, CHR 2021<address><addrLine>Amsterdam, the Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="62" to="76" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Topic Modeling of Chinese Language Using Character-Word Relations</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Neural Information Processing</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="139" to="147" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
