<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Quantitative Analysis of Propagandistic Narratives in Large Text Corpses Using Machine Learning Methods</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Illia</forename><surname>Dagil</surname></persName>
							<email>illia.i.dagil@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Taras Shevchenko National University of Kyiv</orgName>
								<address>
									<addrLine>Akademika Hlushkova Av. 4d</addrLine>
									<postCode>03680</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Iryna</forename><surname>Vergunova</surname></persName>
							<email>vergunova@hotmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Taras Shevchenko National University of Kyiv</orgName>
								<address>
									<addrLine>Akademika Hlushkova Av. 4d</addrLine>
									<postCode>03680</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yaroslav</forename><surname>Tereshchenko</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Taras Shevchenko National University of Kyiv</orgName>
								<address>
									<addrLine>Akademika Hlushkova Av. 4d</addrLine>
									<postCode>03680</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Quantitative Analysis of Propagandistic Narratives in Large Text Corpses Using Machine Learning Methods</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8D32553C998A22D1E5DC215339708512</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Topic Modeling</term>
					<term>News Analysis</term>
					<term>Natural Language Processing</term>
					<term>Embedding Model</term>
					<term>Large Language Model</term>
					<term>Clustering 1</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents a novel algorithm for topic modeling, specifically designed to identify and analyze propaganda narratives in large-scale news corpora. The algorithm combines advanced natural language processing techniques, embedding models, and clustering algorithms to assist analysts, communication experts, and government agencies in efficiently processing and identifying propaganda content. A series of ese experiments, five different embedding models were compared along with four clustering algorithms, each tested with various hyperparameters. A significant challenge addressed was determining the appropriate granularity of clusters, balancing between detailed analysis and broader trends. Additionally, narrative extraction was deeply investigated using large language models (LLMs) providing accurate and structured identification of complex narratives. This approach allows not only the identification of propaganda but also the development of counter-narratives, with the potential to be adapted for broader applications such as communication network analysis.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Propaganda and disinformation are among the most significant challenges facing the modern information environment. In a time when people have access to an overwhelming amount of information, the manipulation of facts and the spread of false narratives can have far-reaching effects. These include shaping public opinion, influencing election outcomes, impacting international relations, and even justifying conflicts.</p><p>Disinformation is often used as a geopolitical tool, turning the media into a battleground. Although propaganda is not a new phenomenon, modern technologies and social media platforms have enabled it to spread faster and more widely than ever before <ref type="bibr" target="#b0">[1]</ref><ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref>.</p><p>Analyzing propaganda narratives and disinformation campaigns is essential to defending democratic societies and ensuring information security. Upholding objectivity, information reliability, and source transparency is not only an academic endeavor but also a matter of national security <ref type="bibr" target="#b4">[5]</ref><ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref>.</p><p>In the current era of information warfare, effectively combating propaganda and disinformation is critical. To achieve this, a comprehensive analysis is needed. One of the key elements of this analysis is identifying propaganda narratives and assessing their prevalence. Natural language processing (NLP) and traditional machine learning techniques <ref type="bibr" target="#b9">[10]</ref><ref type="bibr" target="#b10">[11]</ref><ref type="bibr" target="#b11">[12]</ref><ref type="bibr">[13]</ref> can be applied to handle large volumes of text efficiently. Also, the study of propaganda has become a highly relevant and timely topic within the academic community, drawing significant attention due to its critical implications -5]. This approach not only saves time and resources but also provides a more objective and unbiased analysis compared to manual review.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Problem Statement</head><p>The primary goal of this research is to develop an efficient method for identifying and measuring the prevalence of propaganda narratives within large news corpora. The objective is to accurately detect and quantify narratives, presenting the results as a ranked list based on the frequency of occurrence within the dataset. Furthermore, the potential audience reach for each narrative must be estimated to assess the broader impact of these narratives. To enhance the understanding of how these narratives evolve and spread, an infographic will be created to visually represent their dissemination patterns over time. This visualization will help highlight the influence of key narratives across different regions, channels, and time frames, offering insights into their propagation and reception. The ultimate aim of the research is to provide a tool that can support more informed decision-making by analysts, policymakers, and communication experts, enabling them to counteract disinformation and propaganda more effectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Algorithm</head><p>In this section, we introduce a comprehensive algorithm designed for the analysis and identification of propaganda narratives within large corpora of news texts. The algorithm leverages natural language processing and machine learning techniques to automate the detection of narratives and evaluate their prevalence across different datasets. By combining large language models, embedding models, and clustering algorithms, the method provides a systematic approach to dissecting complex narratives, offering insights into how propaganda themes evolve and spread. The following steps outline the key stages of the algorithm and the models used to achieve this.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data Collection</head><p>Assume we have access to a corpus of all existing news texts, and the messages we need to analyze are a subset of this corpus. Selecting the appropriate subset is a crucial step in the algorithm. This selection can be made based on various criteria or a combination of them, such as:</p><p>• The publication time frame of the news, • The source of the news (specific social networks, resources, channels, etc.),</p><p>• The presence of certain keywords.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Identifying Propaganda Narratives</head><p>Each news item may either contain no propaganda narratives (e.g., "Meteorologists predict rain and strong winds in region N") or include multiple narratives (e.g., "US Navy exercises near Taiwan cross n the territorial integrity of the PRC"). The extraction of these narratives is done using a large language model (LLM). This choice is based on several objective reasons: the most advanced LLMs are capable of following instructions, reformulating, and translating texts into English while maintaining the original meaning. Additionally, LLMs have larger context windows, allowing them to process longer texts more effectively than other neural networks, including transformer-based architectures. They can also provide structured responses (e.g., JSON format), which allows easy parsing. To mitigate the risk of hallucinations, techniques such as prompt engineering and evidence-based model outputs can be applied. The above example was generated using the GPTis always a list, allowing for the extraction of any number of narratives (from zero to multiple). The field provides the exact quote from which the narrative was derived. If this matches the original news text, the analysis can be considered validated to some extent. This structured format also helps to minimize hallucinations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Creating Vector Representations of Narrative Analysis</head><p>Suppose we have two narrative analyses. To compare their similarity, we need to define a similarity metric. While we could directly compare the words used in the analyses, this approach may reduce the quality of comparison because it would ignore the sequence and context of the words. Two contrasting examples illustrate the limitations of this approach:</p><p>• A set of identical narratives expressed with different wording and phrasing.</p><p>• A set of narratives that use the same words but describe opposing viewpoints (e.g., "Russia is conducting terrorist acts in Ukraine" vs. "Ukraine is conducting terrorist acts in Russia"). Modern embedding models can solve these issues by representing texts as vectors in latent space, preserving the semantic meaning of the text. As a result, we can create vector representations for each narrative analysis while maintaining a link to the original news item. These embeddings can then be compared using popular distance metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Clustering the Vector Representations</head><p>Our goal is to identify the most popular groups of narratives. Since we now have a measure of distance between objects and an understanding of the data structure, we can apply clustering methods to group similar narratives. Larger clusters will represent more popular narratives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Summarization</head><p>In the previous step, we obtained clusters, which may contain thousands of news items. Presenting results in this form would not be practical, so we need to identify the overarching narrative within each cluster. One way to do this is by randomly selecting N news items (where N is much smaller than the cluster size) and summarizing them using an LLM. This result can be considered the "headline" for the cluster. The headline can then be used in further results presentation and the next step, which is cluster validation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">Validation of Results</head><p>We begin the validation with the largest cluster. Using the headline, we can re-annotate the cluster to assess how well each narrative aligns with the main idea identified through the summarization of the randomly selected narratives. The LLM's response at this stage will classify each narrative as:</p><formula xml:id="formula_0">• , • , •</formula><p>. Based on these classifications, we can assess the quality of the clustering. We then set a threshold for the acceptable proportion of narratives that do not align with the cluster. If this proportion is low, these narratives can be marked as noise. If the proportion is too high, we must return to step 4 and rerun the clustering with different input hyperparameters or even a different algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7.">Presentation of Results</head><p>The ultimate goal of this algorithm is to generate an analytical report that provides insights into the popularity of different narrative groups. Assuming we have access to all necessary metadata (publication dates, source names, language, audience size, etc.), we can use data visualization techniques to explore statistical indicators of narrative popularity, identify periods of narrative spikes, and generate word clouds. This gives the reader a deeper understanding of the information campaign and offers insights for further research into the causal links between the publication of the news sample and the overall propaganda narrative.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Research results</head><p>In this section, we will discuss the research results, covering everything from the data and model descriptions to the experimental outcomes, evaluation metrics, and identified challenges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Dataset description</head><p>The proposed algorithm has been developed, tested, and refined using three different datasets collected for research purposes:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>February 2022 (2,500 analyzed news articles, from 02.2022 to 08.2023), • articles, from 02.2022 to 12.2023), • The information space of the Baltic states during Russia's full-scale invasion of Ukraine (354,700 analyzed news articles from 152 channels, from 02.2022 to 04.2024). For this research project, a subset of the dataset from the analysis of the Baltic information space during Russia's full-scale invasion of Ukraine was chosen as the demonstration dataset. This subset specifically focuses on Russian-language propaganda channels targeting Lithuania, Latvia, and Estonia. It contains 29,322 news articles published by 14 selected Telegram channels during the period from 02.2022 to 04.2024.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Machine Learning models</head><p>The proposed algorithm employs three types of machine learning models: a large language model, -4-Turbo was chosen for news analysis, and GPT-3.5-Turbo was used for result validation. Several models were compared for the text vectorization task, including:</p><p>• OpenAI text-embedding-3-small,</p><p>• OpenAI text-embedding-ada-002,</p><p>• HuggingFace Alibaba-NLP/gte-Qwen1.5-7B-instruct,</p><p>• HuggingFace WhereIsAI/UAE-Large-V1,</p><p>• HuggingFace intfloat/multilingual-e5-base.</p><p>For clustering the embedding vectors, the following algorithms were applied:</p><p>• K-Means++ with the elbow method to determine K,</p><p>• Hierarchical Clustering,</p><p>• DBSCAN,</p><p>• HDBSCAN.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">The problem of determining cluster granularity</head><p>The issue of cluster granularity arises from the need to balance between the number of clusters and the level of detail they represent. On one hand, creating a large number of small clusters can capture the unique features of individual texts. On the other hand, merging texts into larger clusters based on common characteristics risks losing important details. In the context of analyzing propaganda narratives, this dilemma becomes especially significant. Too much detail can obscure the broader picture, as propaganda narratives often have complex structures and employ a variety of tactics to achieve their goals. At the same time, excessive generalization may overlook subtle but crucial differences between narratives, which can be critical for understanding the mechanisms of propaganda.</p><p>Addressing the problem of cluster granularity requires expert intervention. An expert with deep knowledge of the subject matter can identify which text characteristics are essential for clustering and which can be disregarded. This expertise allows for the creation of clusters that best align with the research objectives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Implementation results</head><p>Implementation results are presented in tables 1-5 and in Figures <ref type="figure" target="#fig_2">2-3</ref>.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>This research presents a new method for quantitatively assessing the popularity of propaganda narratives, which enables the systematic and automated analysis of the information space. The applied natural language processing (NLP) and machine learning techniques significantly enhance the efficiency of analyzing large volumes of text data. Furthermore, the objectification of the analysis process is critically important. Human involvement can introduce subjective interpretations, bias, and errors. An algorithmic approach ensures consistency and reproducibility of results, which is essential for any analytical work. Despite its considerable potential, using AI and machine learning methods for propaganda analysis comes with challenges. First, the availability and quality of data are crucial for the effectiveness of machine learning models. Incomplete or biased data can significantly affect the accuracy of the analysis. Another challenge is that algorithms may struggle to interpret irony, sarcasm, and cultural references, which are often used in propaganda texts. However, with the advancement of modern models, this issue is becoming less of a concern.</p><p>During the experiments, a dataset of news articles was collected and annotated, and several hypotheses and models were tested to determine the best approach for analysis. The results of the study include:</p><p>• A list of identified narratives from the dataset along with metrics of their popularity, • Comparative tables of clustering metrics for the results of embedding models, • Infographics illustrating the relationship between the annotated categories and the semantics of news within clusters, • An example of an infographic for narrative representation.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example narrative analysis in JSON format.</figDesc><graphic coords="3,156.75,62.35,287.14,213.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Extracted narratives distribution.</figDesc><graphic coords="7,76.58,62.35,447.48,329.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Visual representation of embeddings compressed to 3D space using t-SNE method.</figDesc><graphic coords="7,76.58,428.40,447.48,277.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 2 Clustering metrics and results for OpenAI text-embedding-ada-002 model</head><label>2</label><figDesc></figDesc><table><row><cell>Metric / Algorithm</cell><cell>Number of clusters</cell><cell>luster distribution</cell><cell>Silhouette</cell><cell>Davies-Bouldin</cell><cell>Clainski-</cell></row><row><cell></cell><cell></cell><cell>histogram</cell><cell>Coefficient</cell><cell>Index</cell><cell>Haranasz Index</cell></row><row><cell>Hierarchical</cell><cell>15</cell><cell></cell><cell>0.010</cell><cell>5.515</cell><cell>180.944</cell></row><row><cell>clustering</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>K-Means</cell><cell>15</cell><cell></cell><cell>0.036</cell><cell>3.870</cell><cell>490.498</cell></row><row><cell>DBSCAN</cell><cell>3</cell><cell></cell><cell>-0.087</cell><cell>3.612</cell><cell>215.127</cell></row><row><cell>HDBSCAN</cell><cell>3</cell><cell></cell><cell>-0.093</cell><cell>3.486</cell><cell>178.963</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Clustering metrics and results for HuggingFace</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Alibaba-NLP/gte-large-en-v1.5 modelTable 4</head><label>4</label><figDesc>Clustering metrics and results for HuggingFace</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Alibaba-NLP/gte-large-en-v1.5 model Table 5 Clustering metrics and results for HuggingFace intfloat/multilingual-e5-base model</head><label></label><figDesc></figDesc><table><row><cell>Metric / Algorithm</cell><cell>Number of clusters</cell><cell>luster distribution</cell><cell>Silhouette</cell><cell>Davies-Bouldin</cell><cell>Clainski-</cell></row><row><cell></cell><cell></cell><cell>histogram</cell><cell>Coefficient</cell><cell>Index</cell><cell>Haranasz Index</cell></row><row><cell>Hierarchical</cell><cell>25</cell><cell></cell><cell>0.014</cell><cell>4.258</cell><cell>192.817</cell></row><row><cell>clustering</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>K-Means</cell><cell>10</cell><cell></cell><cell>0.040</cell><cell>3.328</cell><cell>723.084</cell></row><row><cell>DBSCAN</cell><cell>4</cell><cell></cell><cell>-0.011</cell><cell>4.493</cell><cell>342.468</cell></row><row><cell>HDBSCAN</cell><cell>3</cell><cell></cell><cell>-0.022</cell><cell>5.965</cell><cell>320.535</cell></row><row><cell>Metric / Algorithm</cell><cell>Number of clusters</cell><cell>luster distribution</cell><cell>Silhouette</cell><cell>Davies-Bouldin</cell><cell>Clainski-</cell></row><row><cell></cell><cell></cell><cell>histogram</cell><cell>Coefficient</cell><cell>Index</cell><cell>Haranasz Index</cell></row><row><cell>Hierarchical</cell><cell>10</cell><cell></cell><cell>0.037</cell><cell>4.515</cell><cell>380.300</cell></row><row><cell>clustering</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>K-Means</cell><cell>10</cell><cell></cell><cell>0.049</cell><cell>3.340</cell><cell>788.349</cell></row><row><cell>DBSCAN</cell><cell>2</cell><cell></cell><cell>0.125</cell><cell>4.654</cell><cell>399.316</cell></row><row><cell>HDBSCAN</cell><cell>3</cell><cell></cell><cell>-0.015</cell><cell>5.992</cell><cell>327.911</cell></row><row><cell>Metric / Algorithm</cell><cell>Number of clusters</cell><cell>luster distribution</cell><cell>Silhouette</cell><cell>Davies-Bouldin</cell><cell>Clainski-</cell></row><row><cell></cell><cell></cell><cell>histogram</cell><cell>Coefficient</cell><cell>Index</cell><cell>Haranasz Index</cell></row><row><cell>Hierarchical</cell><cell>10</cell><cell></cell><cell>0.010</cell><cell>5.797</cell><cell>225.086</cell></row><row><cell>clustering</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>K-Means</cell><cell>10</cell><cell></cell><cell>0.026</cell><cell>4.114</cell><cell>512.323</cell></row><row><cell>DBSCAN</cell><cell>2</cell><cell></cell><cell>0.117</cell><cell>5.662</cell><cell>161.783</cell></row><row><cell>HDBSCAN</cell><cell>3</cell><cell></cell><cell>-0.037</cell><cell>6.462</cell><cell>249.931</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We would like to extend our sincere gratitude to Mantis Analytics for providing the valuable data and sharing their expertise in propaganda analysis, which were instrumental in the success of this research.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Declaration on Generative AI</head><p>The authors have not employed any Generative AI tools.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Assessing the Popularity of Propagandist Narratives Using AI Methods</title>
		<author>
			<persName><forename type="first">I</forename><surname>Dagil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tereshchenko</surname></persName>
		</author>
		<ptr target="https://probability.knu.ua/shv2024/ShV_2024.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceeding of XXII International scientific and practical conference &quot;Shevchenko Spring -2024</title>
				<meeting>eeding of XXII International scientific and practical conference &quot;Shevchenko Spring -2024<address><addrLine>Kyiv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024-04-11">April 11, 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Narrative Theory for Computational Narrative Understanding</title>
		<author>
			<persName><forename type="first">A</forename><surname>Piper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>So</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.emnlp-main.26</idno>
		<ptr target="https://doi.org/10.18653/v1/2021.emnlp-main.26" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Conference on Empirical Methods in Natural Language Processing</title>
				<editor>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">W</forename><surname>-T</surname></persName>
		</editor>
		<editor>
			<persName><surname>Yih</surname></persName>
		</editor>
		<meeting>the Conference on Empirical Methods in Natural Language Processing<address><addrLine>Punta Cana, Dominican Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="298" to="311" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics. Online and</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A survey on narrative extraction from textual data</title>
		<author>
			<persName><forename type="first">Brenda</forename><surname>Santana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ricardo</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evelin</forename><surname>Amorim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alípio</forename><surname>Jorge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Purificação</forename><surname>Silvano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sérgio</forename><surname>Nunes</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10462-022-10338-7</idno>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence Review</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page">8435</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Computational Analysis of Dehumanization of Ukrainians on Russian Social Media Proceedings of LaTeCH-CLf</title>
		<author>
			<persName><forename type="first">Burovova</forename><surname>Kateryna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mariana</forename><surname>Romanyshyn</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2024.latechclfl-1.4.pdf" />
		<imprint>
			<date type="published" when="2024-03-22">March 22-2024, 2024</date>
			<biblScope unit="page" from="28" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">MTEB</title>
		<author>
			<persName><forename type="first">Niklas</forename><surname>Muennighoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nouamane</forename><surname>Tazi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Loïc</forename><surname>Magne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nils</forename><surname>Reimers</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.eacl-main.148</idno>
	</analytic>
	<monogr>
		<title level="m">Massive Text Embedding Benchmark Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Improving Text Embeddings with Large Language Models</title>
		<author>
			<persName><forename type="first">Liang</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nan</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaolong</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Linjun</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rangan</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Furu</forename><surname>Wei</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2024.acl-long.642</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 62nd Annual Meeting the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 62nd Annual Meeting the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ming-Wei</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenton</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</title>
		<author>
			<persName><forename type="first">Nils</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iryna</forename><surname>Gurevych</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/D19-1410.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1937-11">November 3 7, 2019</date>
			<biblScope unit="page" from="3982" to="3992" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Density-based clustering based on hierarchical density estimates</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G B</forename><surname>Ricardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Davoud</forename><surname>Campello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joerg</forename><surname>Moulavi</surname></persName>
		</author>
		<author>
			<persName><surname>Sander</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-37456-2_14</idno>
	</analytic>
	<monogr>
		<title level="m">Conference paper on Advances in Knowledge Discovery and Data Mining (PAKDD</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Tseng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Cao</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Motoda</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Xu</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">2013</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Introduction to HPC with MPI for data science</title>
		<author>
			<persName><forename type="first">Frank</forename><surname>Nielsen</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-21903-5</idno>
		<ptr target="https://doi.org/10.1007/978-3-319-21903-5" />
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>Springer International Publishing</publisher>
			<pubPlace>Switzerland</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A density-based algorithm for discovering clusters in large spatial databases with noise</title>
		<author>
			<persName><forename type="first">Martin</forename><surname>Ester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hans-Peter</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jiirg</forename><surname>Sander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaowei</forename><surname>Xu</surname></persName>
		</author>
		<ptr target="https://file.biolab.si/papers/1996-DBSCAN-KDD.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings Knowledge Discovery and Data Mining (KDD-96)</title>
				<meeting>Knowledge Discovery and Data Mining (KDD-96)</meeting>
		<imprint>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="226" to="231" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Hierarchical density estimates for data clustering, visualization, and outlier detection</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G B</forename><surname>Ricardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Davoud</forename><surname>Campello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arthur</forename><surname>Moulavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jörg</forename><surname>Zimek</surname></persName>
		</author>
		<author>
			<persName><surname>Sander</surname></persName>
		</author>
		<idno type="DOI">10.1145/2733381</idno>
		<idno>51</idno>
		<ptr target="https://doi.org/10.1145/2733381" />
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Knowledge Discovery from Data</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="5" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
