<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>N. Khairova);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Unsupervised approach for misinformation detection in Russia-Ukraine war news</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nina Khairova</string-name>
          <email>khairova.nina@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Galassi</string-name>
          <email>a.galassi@unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Lo Scudo</string-name>
          <email>fabrizio.loscudo@unical.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan Ivasiuk</string-name>
          <email>bogdan.ivasiuk@studio.unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University “Kharkiv Polytechnic Institute”</institution>
          ,
          <addr-line>2, Kyrpychova str., 61002, Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Umeå University</institution>
          ,
          <addr-line>90187</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Bologna</institution>
          ,
          <addr-line>Viale Risorgimento 2, Bologna, 40136</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Calabria</institution>
          ,
          <addr-line>via Bucci, Rende, 87036</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The Russian-Ukrainian war has attracted considerable global attention; however, fake news often obstructs the formation of public opinion and disseminates false information. To address this issue, we have curated the RUWA dataset, comprising over 16,500 news articles covering the pivotal events of the Russian invasion of Ukraine. These articles were sourced from established outlets in the USA, EU, Asia, Ukraine, and Russia, spanning the period from February to September 2022. The paper explores the use of semantic similarity to compare different aspects of articles from various web sources that cover the same events of the war. This unsupervised machine learning approach becomes crucial when obtaining annotated datasets is practically impossible due to the lack of real factchecking during the ongoing war. The research goal is to uncover the potential of employing semantic similarity measures as a viable approach for detecting misinformation in news articles.</p>
      </abstract>
      <kwd-group>
        <kwd>Misinformation issues</kwd>
        <kwd>fake news detection</kwd>
        <kwd>Russian-Ukraine war</kwd>
        <kwd>dataset</kwd>
        <kwd>semantic similarity 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Misinformation, fake news, and disinformation have been present throughout human history.
However, it was only after intense discussions around fake news during events such as the 2016
U.S. presidential election, the U.K. Brexit referendum, and the global spread of the coronavirus
that the issue gained heightened attention. One indirect piece of evidence is the increasing
number of scientific articles addressing this problem, especially in the last three years.
Currently, according to the Scopus database, more than 22,500 scientific papers are related to
the concept of 'misinformation’.</p>
      <p>There are obvious technical and cognitive foundations that can explain why misinformation
has become a significant issue in contemporary digital society. The development of information
technology has increased the reliance of many individuals on online sources for their
information and news. A recent Eurostat study revealed that 72% of internet users in the
European Union now obtain their news online. Similar trends can be observed among adult
Internet consumers in the USA. The trend of widespread and rapid dissemination of
misinformation (fake news) leads to the influence of information or misinformation campaigns
on large groups of people simultaneously.</p>
      <p>
        The psychological and cognitive foundations influencing society's susceptibility to
misinformation stem from the natural difficulties humans face in distinguishing between real
and fake news. Two major factors make people naturally vulnerable to fake news. The first
factor named Naive realism [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] suggests that people tend to believe that their perception of
reality is the only true one, while others who disagree with this are considered ignorant,
irrational, or biased. Furthermore, according to the theory known as Confirmation bias [2], it is
challenging to correct a misperception once it has formed. Psychological studies indicate that
attempting to correct false information, such as fake news, by presenting true, factual
information is not only unhelpful in reducing misperceptions but can sometimes even
exacerbate them, particularly among ideological groups [3].
      </p>
      <p>All these technical and cognitive factors contribute to the potential for large-scale
misinformation campaigns conducted by large corporations or even by certain governments to
influence public opinion. The most sensitive areas affected by these campaigns often include
social division, public health, and economic impact [4].</p>
      <p>However, the most significant threats may arise from the political consequences of
misinformation campaigns. These campaigns can aim not only to alter election outcomes but
also to influence the course of wars. One notable and early example of the political
consequences of misinformation was highlighted by a spokesman for the German government
in January 2017. He stated that they confronted a wide array of Russian propaganda tools used
to conduct disinformation campaigns aimed at destabilizing the German government. He
remarked, 'We are dealing with a phenomenon of a dimension that has never been seen before".</p>
      <p>Furthermore, an indirect consequence of misinformation, in general, is that it is possible to
disrupt the authenticity balance within the news ecosystem, thereby altering people's
perceptions and responses to real news. These fosters doubt and confusion, making it more
challenging for individuals to differentiate between truth and falsehood. This year the European
Council has recognized misinformation, especially the one carried out by Russia, as a “a
longterm challenge for European democracies and societies”.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>Numerous definitions of misinformation exist; however, the crux of the matter can be succinctly
encapsulated as follows: Misinformation is intentionally and verifiably false information
published or posted to mislead readers [4, 5]. This definition comprises three pivotal concepts.
Authenticity revolves around the verification of information as either real or false. An
illustrative instance of misinformation may manifest in the form of unfounded claims or rumors
disseminated through social media platforms regarding medicines or health remedies for
treating or preventing the coronavirus. However, the veracity of such information can be
substantiated or debunked through reliance on credible and reputable sources, such as the
World Health Organization (WHO).</p>
      <p>The second indicator of misinformation involves intentionality, signifying the deliberate
dissemination of inaccurate information to achieve specific goals. The final critical aspect
incorporated into the definition of misinformation pertains to the dissemination—spreading
false or misleading information through various mediums, such as articles, social media posts,
websites, or any platform where information is publicly shared, is a mandatory requirement for
misinformation.</p>
      <sec id="sec-2-1">
        <title>2.1. Approaches for Misinformation Detection</title>
        <p>Currently, the majority of studies focused on detecting misinformation utilize Machine
Learning or Deep Learning approaches [6]. These studies typically involve four main steps:
data source selection, data collection, data cleaning, and the application of classification or
clustering techniques. In the case of Machine Learning approaches an additional step for feature
extraction is included. Figure 1 illustrates the general schema for misinformation detection
approaches.</p>
        <p>Most research focuses on specific types of data sources, often concentrating either on
misinformation detection in social media posts [7] or on fake news in articles on news websites
[8]. Additionally, a group of studies considers utilizing machine learning approaches for
misinformation detection by processing existing datasets of fake news.</p>
        <sec id="sec-2-1-1">
          <title>Data</title>
          <p>Source:
• Social media
• News articles
• Fact checking
websites
• Existing
datasets</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Data</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Collection:</title>
          <p>• Extracting
data (text,
images or video)
using API/Tools</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Data</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>Preprocessing:</title>
          <p>• Cleaning</p>
        </sec>
        <sec id="sec-2-1-6">
          <title>Machine</title>
        </sec>
        <sec id="sec-2-1-7">
          <title>Learning</title>
        </sec>
        <sec id="sec-2-1-8">
          <title>Supervised</title>
          <p>• Classification</p>
        </sec>
        <sec id="sec-2-1-9">
          <title>Deep</title>
        </sec>
        <sec id="sec-2-1-10">
          <title>Learning:</title>
          <p>• Classification</p>
        </sec>
        <sec id="sec-2-1-11">
          <title>Features</title>
        </sec>
        <sec id="sec-2-1-12">
          <title>Extraction</title>
        </sec>
        <sec id="sec-2-1-13">
          <title>Unsupervised</title>
          <p>• Clustering</p>
          <p>The selection of a specific data source type has an impact on the features that can be utilized
by machine learning models. For instance, features relevant to the propagation properties of
information can be extracted specifically from the social media context. This features group
includes users' profiles and various aspects of user demographics, such as registration age,
number of followers, number of tweets that the user has authored, and the average number of
followers, etc [9]. Network-based features, which are extracted to represent relationships
among relevant users and posts, also pertain to the group of propagation properties.</p>
          <p>
            Meanwhile, obtaining the propagation feature type from the news articles on the website is
nearly impossible. For misinformation detection in these data sources, style-based or
knowledgebased features are typically extracted [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. Style-based methods aim to identify fake news by
analyzing the manipulative elements present in the writing style of news content. The
extraction of style-based features relies on the assumption that information created to
intentionally deceive the public must sound 'more persuasive' compared to text without such
intentions. These specific characteristics serve to reinforce deceptive statements or claims in
news content, encompassing both psychological and linguistic aspects. Generally, for
misinformation detection ML models apply the same linguistic-based features as for the general
NLP tasks, such as text classification and clustering, or, for example, specific applications for
author identification [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. For instance, the average characteristics per sentence, subjective
verbs (e.g. "feel", "believe"), report verbs (e.g. "announce"), positive/negative words,
anxiety/angry/sadness words, and so on [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. Some rhetorical techniques, such as repetitions,
appeal to authority, exaggeration, or minimization, can be considered as psychological features
[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
          </p>
          <p>
            The knowledge-based features for misinformation detection rely on factual knowledge.
Traditionally, the unified standard definition of knowledge for their automated extraction is
that knowledge consists of a set of (Subject, Predicate, Object) triples extracted from the given
information, which well represents the provided information [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. In the context of
knowledgebased fake news detection, a commonly employed approach is fact-checking.
          </p>
          <p>
            Fact-checking involves evaluating the authenticity of news by comparing statements
extracted from the content to be verified with established factual knowledge. Utilizing
expertbased manual fact-checking, often employed in the creation of fake news datasets, yields highly
accurate results. However, this approach is expensive and becomes less efficient as the volume
of news content to be checked increases. It is relatively uncommon to utilize a crowd-sourced
approach for manual fact-checking. Crowd-sourced fact-checking involves a large number of
ordinary individuals who contribute as fact-checkers. For instance, the publicly available
largescale fake news dataset, CREDBANK, was created in that way [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. However, currently, this
approach, similar to automatic fact-checking, faces challenges such as redundancy, invalidity,
conflicts, and incompleteness, leading to relatively low accuracy and credibility [
            <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
            ].
          </p>
          <p>
            Many contemporary strategies for detecting misinformation center around extracting the
mentioned features and integrating them into supervised classification models. These models
are often based on Naïve Bayes, decision trees, logistic regression, k nearest neighbor (KNN),
and support vector machines (SVM). The final selection of the classifier is typically based on
comparing the performance of all utilized models [
            <xref ref-type="bibr" rid="ref16 ref17 ref18">16, 17, 18</xref>
            ].
          </p>
          <p>All these supervised approaches require a pre-annotated fake news ground truth dataset or
truth/false-annotated dataset to train a model. However, obtaining a reliable fake news dataset
is a very time-consuming process as that often requires expert skilled annotators to conduct a
meticulous examination of claims, along with assessing additional evidence, and reports from
authoritative sources. This highlights the primary reason why, despite supervised classification
methods potentially yielding more accurate models with a well-curated ground truth dataset
for training, unsupervised models can be more practical due to the ease of obtaining unlabeled
datasets.</p>
          <p>
            Moreover, the exploration of embedding techniques, such as word embedding and deep
neural networks, has attracted considerable attention in the extraction of textual features,
showing potential for yielding positive outcomes in misinformation detection. Within the realm
of Deep Learning (DL) models, news content is frequently subjected to word-level embedding
as an initial step [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. Subsequently, a proficiently trained neural network processes this
embedding [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ].
          </p>
          <p>While DL models offer various advantages, they similar to supervised Machine Learning
(ML) often perform better with large labeled datasets for training. However, acquiring and
reliably annotating such datasets can be challenging and is not always addressed in
misinformation detection. Moreover, DL models, especially when trained on limited or biased
datasets, may be prone to overfitting. This means that the model may perform well on the
training data but struggle to generalize to new data related to slightly offset topics that can be
very considerable for fake news or misinformation-covered broad themes [7].</p>
          <p>
            In recent years, the proliferation of visual content has become a significant tool for
propagating fake news. Visual features extracted from images are crucial indicators in
discerning fake news. At the same time, the rise of images and videos generated by neural
networks, commonly known as 'deep fake videos', adds a new layer of complexity.
Distinguishing between real and fake visual content becomes increasingly challenging. For
these studies, new techniques for following the trace of revision and generation in a video are
required [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Existing dataset</title>
        <p>As mentioned in Section 2.1, statistical approaches to misinformation detection are generally
constrained by the significant limitation of lacking labeled benchmark datasets.</p>
        <p>
          Existing labeled datasets primarily focus on political news and are annotated through
manual efforts [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] or by leveraging fact-checking websites like PolitiFact or GossipCop [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
For instance, the Buzzfeed dataset [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] consists of 1,627 articles verified through manual
factchecking by professional journalists at BuzzFeed. These articles were sourced from nine
prominent political publishers, three from the mainstream, hyperpartisan left-wing, and
hyperpartisan right-wing categories. In total, the corpus includes 299 instances of fake news,
with 97% originating from hyperpartisan publishers.
        </p>
        <p>In certain instances, authentic news sources were selected from a designated group of
reliable outlets, whereas fake news sources were drawn from known fake news lists, such as
"Business Insider’s Zimdars Fake News list" [25]. Another annotation approach for the fake
news dataset involved the AMT dataset [26], which comprises 480 articles annotated as either
fake or true. In this dataset, fake news articles were intentionally crafted by journalists, while
genuine news pieces were sourced from various domains.</p>
        <p>
          The datasets focusing on fake news related to conflicts or wars exhibit a distinct nature.
Take, for instance, the FA-KES dataset [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], which encompasses 804 news articles related to the
Syrian war gathered from sources like Reuters, Etilaf, and others. To determine the veracity of
the information, the creators employed a crowdsourcing platform, soliciting individuals for
details on the number of casualties, and when, and where the events occurred. The obtained
data was then compared with information from the Syrian Violation Documentation Center
(VDC), which meticulously records all deaths during specific events.
        </p>
        <p>
          Additionally, The CheckThat! initiative [5] [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], over its six iterations, has produced several
datasets to address specific subtasks of the fact-checking problem, such as recognizing if a
sentence should be checked [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] or if it contains a subjective perspective [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
        </p>
        <p>A summary of the most renowned annotated fake news datasets and their annotation
methodologies is provided in Table 1.</p>
        <p>Dataset
Buzzfeed
LIAR
FA-KES
dataset
AMT
Kaggle
FakeNews
Net
FakeCovid</p>
        <p>
          Understandably, we were not able to detect false/true annotated datasets related to the
Russian-Ukraine war, especially during the ongoing conflict. When considering the broader
issue of fact-checking, additional challenges arise, including biases in sources and the 'fog of
war' effect [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. However, several researchers tackled the issue of dataset collection from social
networks, mostly from Twitter, in the specific context of propaganda and fake news detection
related to the Russian invasion of Ukraine.
        </p>
        <p>
          For instance, the authors of [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] created a dataset containing 349,455 messages from Twitter
with pro-Russian hashtags and a pro-Russian stance. These messages were posted by 132,131
different users, of which 250,853 messages (71.78%) were retweets. Additionally, the dataset
includes 9,818,566 messages posted by 2,079,198 users, categorized as pro-Ukraine. The majority
of these messages (80.93%) were written in English and posted the period between February
2022 and July 2022. The creation of this dataset enabled the authors to develop an approach for
detecting bots on Twitter and suggested the presence of a large-scale Russian propaganda
campaign on social media, especially at the beginning of the Russian invasion of Ukraine.
        </p>
        <p>
          In the paper [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] provides a Twitter dataset of the 2022 Russo-Ukrainian war. The dataset
contains over 1.6 million tweets shared during the first week of the crisis. Over the past year, a
few studies with similar approaches and findings have been published.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In our study, we directed our attention to the scrutiny of disinformation campaigns related to
the ongoing Russia-Ukraine war. The articles were disseminated through established media
outlets as integral components of information warfare and propaganda endeavors.</p>
      <sec id="sec-3-1">
        <title>3.1. Data Collection</title>
        <p>We curated the RUWA (Russian-Ukraine WAr) dataset [31], which compiles news articles
covering key events related to the Russian-Ukraine war.</p>
        <p>To ensure a balanced representation of public opinion and journalistic perspectives, we
sourced texts from reputable global outlets spanning various regions. These include BBC,
Euronews, and The Guardian (European region); NBC News, CNN, and Bloomberg (USA region);
Ukrinform and Censor.net (Ukraine); and Russia Today, News-front.info (Russia), as well as
Al Jazeera and Reuters.</p>
        <p>To mitigate the risk of generating a topic detection model instead of a misinformation
detection model, we proactively identified nine widely acknowledged events in the global press,
such as 'The Beginning of the War', 'Bucha Massacre', and so on. Subsequently, articles about
these specific events were obtained from the sites above.</p>
        <p>The selection of articles for each event adhered to predefined criteria, including the
publication time interval and keyword lists. The time interval typically spanned from the date
of the specific event and extended three to four weeks thereafter. This approach aligns with the
common pattern in media, where dedicated coverage of a particular event tends to last no more
than two to three weeks.</p>
        <p>The keyword list comprised approximately 100 keywords. Due to distinct narratives, terms,
and concepts used in Ukraine, Russia, and the Western press to describe the same war events,
we identified keywords for each news website separately. Subsequently, we aggregated all these
keywords to effectively highlight articles across the sites. Primarily key words encompassed
geographical names (e.g., “Bucha” or “Olenivka”), specific buildings (e.g., “Kramatorsk train
station” or “Mariupol theatre”), organizational entities (e.g., “Red Cross”, “Nuclear Power Plant”),
personal names of politicians (e.g., “Zelenskyi”), and analogous proper nouns and phrases.
Certain keywords possessed the capacity to unequivocally identify specific events; for example,
the presence of the keyword "Moscow ship" within a defined temporal window accurately
attributed an article to the event "Sinking of the Moskva."</p>
        <p>However, a substantial subset of keywords pertained to general themes associated with the
Russian-Ukraine war (e.g., “war crimes”, “evacuation”, “a special operation”). These terms did
not serve as selective criteria for categorizing articles into distinct topics or events. In cases
where these general keywords coexisted with specific event-related keywords within an article,
we classified the article under the corresponding event. Conversely, articles lacking the
conjunction of specific event-related keywords, despite being published during the stipulated
timeframe and containing general keywords, remained unclassifiable about our predefined
topics or events. Thus, a considerable number of articles, exceeding 14,000, belonged to the
overarching theme of the "Russian-Ukraine war." However, these were systematically excluded
from further consideration. Table 2 displays the distribution of articles from various websites
in the RUWA dataset based on events and their descriptions, which correspond to the particular
headlines of leading news agencies related to each event.</p>
        <p>Russia says Azovstal siege is over, in full control
of Mariupol
NATO officials say Russian attack of Ukraine has
begun
Killing of civilians in Bucha and Kyiv condemned
as ‘terrible war crime’
Evacuations from Zaporizhzhia renew concerns
for nuclear power plant safety
‘Absolute evil’: inside the Russian prison camp
where dozens of Ukrainians burned to death
Ukraine missile attack: Dozens killed at
Kramatorsk railway station
Russia is losing the battle for the Black Sea
Russian missile strike kills 16 in shopping mall,
Ukraine says
Russia bombs theater where hundreds sought
shelter and ‘children’ was written on grounds
At present, the RUWA dataset comprises over 16,500 news articles documenting events of the
Russian-Ukraine war from February 2022 to September 2022. Figure 2 illustrates the percentage
distribution of articles across selected news websites.</p>
        <p>Clearly, Ukrainian websites published the highest number of articles related to the war
events that we distinguished. The Ukrainian website Ukrinform produced 6,750 articles, while
Censor.net produced approximately 5,100 articles. In contrast, the Russian websites, Russia
Today and News-front.info, together produced about 1,100 articles.</p>
        <p>Notably, the Reuters agency devoted more attention to the events of the war than any other
EU or USA news website, publishing around 2,000 articles covering the nine well-known events
of the Russian invasion of Ukraine.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Analysis</title>
        <p>As previously mentioned, acquiring information with one hundred percent certainty about
events during an ongoing war is practically unattainable. Any narrative or description of an
Source of the</p>
        <p>event
definition
Al Jazeera</p>
        <p>CBS News
The Guardian</p>
        <p>CNN
The guardian</p>
        <p>Al Jazeera
Economist</p>
        <p>Reuters</p>
        <p>CNN</p>
        <p>Number
of articles
1,816
6,490
1,429
3,373
578
1,466
175
436
761
16526
event inherently carries potential bias and reflects the author's perspective. Consequently, the
creation of a true/false annotated dataset covering the Russia-Ukraine war poses significant
challenges due to the inherent subjectivity and variability in how events are reported and
interpreted.</p>
        <p>Our approach involved constructing the events-aligned RUWA dataset, followed by the
application of unsupervised machine learning methods to address semantic similarity tasks.
Fig.3 provides an overview of the architecture of our approach.</p>
        <p>Al Jazeera,
BBC, Censor.Net,
NewsFront,
NBC News, Reuter,
Russia Today,
Ukrinform</p>
        <p>Azovstal,
Begining, Bucha,
Nuclear,
Prisoners, Railway,
Supermarket, Theatre,
Sinking of the Moskva</p>
        <p>RUWA
dataset
m\edia
repository
Pre-trained vectors for full texts
Article headings comparison
Utilization of extra knowledge</p>
        <p>Linguistic,
semantic, and
knowledge-based
features</p>
        <p>In our study, where we focused on addressing misinformation detection through semantic
similarity, we based our approach on several hypotheses. Primarily, we propose that news
shared by media outlets in the two nations actively involved in the conflict is likely to display
considerable differences. Information variations may be significant, even leading to conflicting
accounts of events, such as the acknowledgment or denial of incidents like residential area
bombings or civilian casualties. Consequently, we can expect that the semantic similarity
coefficient between texts from Russian and Ukrainian outlets should be minimal.</p>
        <p>Furthermore, we hypothesize that the semantic similarity coefficients among articles
covering a specific event from various outlets, excluding one or two websites, are generally
high. However, when comparing the semantic similarity of these one or two specific websites
with all others, we can observe a significant divergence. This discrepancy suggests that these
specific websites are likely to be untrustworthy.</p>
        <p>That involves three types of experiments for detecting semantic similarity: (1) comparing
the full texts of the articles, (2) analyzing article headings, and (3) comparing semantically
meaningful sentences within the articles. For the first experiment, we aggregated all articles
from the same source that covered one event as a single textual document and then pairwise
evaluated the semantic similarity of all outlets’ articles. In the second experiment, we calculated
the similarity between two sources by analyzing sets of article titles that covered the same
event. We achieved this by comparing each title from one source with the corresponding titles
of comparable articles from the other source, all related to the particular event.</p>
        <p>In the third step, we assessed the semantic similarity of semantic significant sentences from
various sources. These sentences contained keywords associated with the event under
consideration or verbs representing the actions linked to specific events. Compiling these lists
for each event, we relied on the existing list of words associated with the Russian-Ukrainian
war from [32] and added verbs extracted from the articles covering each specific event. For
example, for the "Moskva sinking" event, the list of related verbs comprises over 120 verbs,
while the "Mariupol Theatre" event includes about 60 verbs. This approach allowed us to focus
on texts that exclusively provided information about a particular event, excluding phrases with
similar meanings typically found in news articles from various web news sites like "witnesses
report" or 'it doesn't appear evident' and so on.</p>
        <p>For linguistic preprocessing, we employed stemming and stop-word removal. Additionally,
we eliminated numerous specific symbols commonly found in web-wrapped texts.</p>
        <p>To generate pre-trained vectors, we employed two types of language models (LM), based on
Spacy and FastText. The 'en_core-web_lg' language model provided by SpaCy consists of
300dimensional vectors and encompasses a vocabulary of 685,000 words. This model is trained on
diverse datasets, including content from Wikipedia, OSCAR (Open Super-large Crawled
Aggregated coRpus) totaling 1342 GB, and News-crawl data comprising 16.9 GB.</p>
        <p>While the model was trained on various datasets, including web news, that must have
resulted in minimal Out-of-Vocabulary (OOV) words in our articles, we observed that even after
preprocessing, our dataset might still contain words lacking proper lexicon or improperly
tokenized. To address this, we applied vectors generated by the fastText subworld-based
pretrained vectors from Facebook AI [33] in the subsequent step. In contrast to other LM, FastText
LM excels in predicting subwords or character n-grams. This model is particularly adept at
handling the challenges posed by scraped news outlets' texts, which may include disruptions
from pictures, links, quotes, and other insertions by default. Consequently, the texts may
potentially contain misspelled words, numbers, partial words, and single characters. For our
study, we utilized the FastText "wiki-news-300d-1M-subword" LM, encompassing 2 million
word vectors trained with subword information from the Common Crawl corpus, comprising
600 billion tokens.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and findings</title>
      <p>We conducted a comprehensive evaluation of semantic similarity among pairs of outlets for
nine events by analyzing full texts, article headings, and selected sentences from the articles.
To avoid building a topic model instead of a misinformation detection model, each of the nine
events was individually examined. Our experiments revealed that training vectors on the
FastText language model produced better distinguishability results compared to using the
'en_core-web_lg' model provided by SpaCy. As an example, in Table 3, we present the semantic
similarity scores for the full texts of the articles related to the Azovstal topic, utilizing the
en_subwords_wiki_lg LM.
bbs</p>
      <p>We observed that evaluating the semantic similarity of headlines encounters challenges,
particularly when dealing with distributive semantic similarity scores. Even headlines from
articles covering the same event and belonging to the same outlet yield relatively low similarity
values. Several factors contribute to this outcome. Primarily, the efficacy of comparing article
titles is significantly influenced by the number of articles published by each outlet for a specific
event. The RUWA dataset, however, is not well-balanced across events. In certain cases, a
website may have produced only a few articles related to a particular event, impacting the
reliability of the semantic similarity assessment. Furthermore, each headline frequently not only
neutrally conveys or describes an event but also mirrors the subjective perspectives and
sentiments of certain authors.</p>
      <p>As mentioned above, our third experimental direction focuses on processing the targeted,
relevant, and topic-specific portions of texts, steering clear of broad or generalized content in
articles. To achieve this goal, we specifically chose sentences containing keywords and verbs
related to the considered event. This approach allowed us to generate more specific and directly
relevant texts that are closely tied to the subject of the event Table 4 demonstrates an example
of the calculation of semantic similarity scores for texts obtained by concatenating all sentences
containing keywords related to the Mariupol Theatre topic from every outlet. Table 5 illustrates
an example of the semantic similarity for texts obtained by concatenating all sentences
containing particular verbs due to the Sinking of the warship Moskva topic.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The conducted experiments collectively validate our hypotheses. Specifically, an analysis of
news articles from outlets representing the two countries engaged in the war conflict, including
Cersornet, Ukrinform, News-front, and RT, reveals significant disparities in most events. These
differences are systematically reflected in the semantic similarity coefficients, underscoring the
distinctiveness in the reporting styles and perspectives adopted by these outlets in the context
of the ongoing conflict. We may infer with a certain degree of confidence that the semantic
similarity coefficient's value correlates with the likelihood of conveying a certain degree of
misinformation.
bbs</p>
      <p>However, the experiments revealed a significant impact of both the number of articles
covering an event and the size of the text used to formulate vectors on the semantic similarity
score. Specifically, outcomes for events like Kremenchuk Supermarket and Moskve Sinking,
which are covered by only a small amount of news (Tab. 2), often deviate from the general
trends observed.</p>
      <p>Additionally, we observed that the semantic similarity coefficients consistently fell within
the narrow range of 91% to 99%. This tight clustering suggests a high degree of similarity among
the articles, implying that they not only revolve around the same topic but also share a similar
stylistic approach. All these articles must adopt a journalistic style in presenting news events.</p>
      <p>The second group of experiments did not yield significant results. The comparison of articles'
headings revealed a notable dependence of semantic similarity coefficients on the number of
articles associated with a particular event. Furthermore, titles often incorporate authors' biased
opinions and feelings, aligning with the genre-specific nature of the outlet's titles.</p>
      <p>The incorporation of additional knowledge related to an event resulted in the optimal
handling of web news. Almost all nine events in the final experimental group, which involved
additional knowledge regarding actions specified by concrete verbs, provided a clear
confirmation of our initial hypothesis. This indicates that the semantic similarity coefficient is
notably lower between established outlets from countries engaged in the war on opposing sides.</p>
      <p>In our assessment, this finding not only underscores the distinctiveness and divergence in
the reporting styles and perspectives of news outlets representing countries with conflicting
interests in the ongoing war but also suggests the potential dissemination of misinformation by
one country regarding a specific event.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In our study, we introduced an innovative dataset focused on the Russian-Ukrainian war. This
RUWA dataset involves above 16,500 web news articles from established world outlets, covering
nine significant events of the Russian invasion of Ukraine that occurred from February to
September 2022. In order to avoid topic modeling and focus on misinformation detection
modeling, as well as to improve semantic coherence among articles from various news sources,
we aligned the dataset articles based on events. The dataset offers a comprehensive view of
diverse journalistic narratives surrounding the Russian-Ukrainian war, providing valuable
support for future research.</p>
      <p>Furthermore, our research contributes to illustrating how unsupervised machine learning
approaches, such as semantic similarity scores, can offer insights into potential misinformation
within news coverage of widely reported events across various outlets. We critically examined
the pros and cons of multiple methods for assessing the semantic similarity of news articles
discussing the same event across diverse reputable news outlets. Additionally, we showed that
while relying solely on semantic similarity analysis may not be enough for effective
misinformation detection, it offers valuable insights that can be synergistically combined with
other techniques to enhance overall accuracy in detection.</p>
      <p>Even though this exploration allows deepening our comprehension of the intricacies
associated with pinpointing misinformation in the context of the Russia-Ukraine war, it has
some limitations, namely:</p>
      <p>The dataset is centered around nine specific events concerning the Russian-Ukrainian
war occurring between February and September 2022. It comprises articles from a
limited set of well-known news outlets. However, it is essential to note that this
selection, while encompassing major events and reputable news sources, may not
capture every relevant occurrence during the specified timeframe. Moreover, the chosen</p>
      <p>outlets could introduce a bias, potentially overlooking alternative perspectives and
regional nuances.</p>
      <p>While semantic similarity is explored as a means of detecting misinformation, it has
inherent limitations. It may not capture nuanced contextual differences, and the
approach might be less effective in identifying subtle misinformation strategies.
Outcomes related to semantic similarity for misinformation detection may be specific
to the characteristics of the chosen events and news sources. Generalizing the approach
to different conflicts or regions requires caution.</p>
      <p>Additionally, the absence of labeled data for training models limits the ability to assess
the performance of the proposed approach against a ground truth.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was supported by the EU H2020 ICT48 project “Humane AI Net" under contract
#952026 and partially supported by the European Commission Next Generation EU programme,
PNRR-M4C2-Investimento 1.3, PE00000013-"FAIR'' Spoke 8 as well.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] [2] [3] [4] [5] [6] [7] [8]</source>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ross</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Ward</surname>
          </string-name>
          ,
          <article-title>Naive realism in everyday life: Implications for social conflict and misunderstanding, Values and knowledge</article-title>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>135</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Nickerson</surname>
          </string-name>
          ,
          <article-title>Confirmation bias: A ubiquitous phenomenon in many guises</article-title>
          ,
          <source>Review of general psychology</source>
          , vol.
          <volume>2</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>175</fpage>
          --
          <lpage>220</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Nyhan</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Reifler</surname>
          </string-name>
          ,
          <article-title>When corrections fail: The persistence of political misperceptions</article-title>
          ,
          <source>Political Behavior</source>
          , vol.
          <volume>32</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>303</fpage>
          --
          <lpage>330</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Huan</surname>
          </string-name>
          ,
          <article-title>Fake news detection on social media: A data mining perspective, ACM SIGKDD explorations newsletter</article-title>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Galassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Antici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Köhler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Leistra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Siegel and Türkmen, Overview of the CLEF-</article-title>
          2023
          <source>CheckThat! Lab: Task 2 on Subjectivity in News Articles, 24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN</source>
          , pp.
          <fpage>236</fpage>
          --
          <lpage>249</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Rastogi</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <article-title>A review on fake news detection 3T's: typology, time of detection, taxonomies</article-title>
          ,
          <source>International Journal of Information Security</source>
          , vol.
          <volume>22</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>177</fpage>
          -
          <lpage>212</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>M. R. Islam</surname>
            , S. Liu and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Deep learning for misinformation detection on online social networks: a survey and new perspectives, Social Network Analysis and Mining</article-title>
          , vol.
          <volume>10</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>J. C. S.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Correia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Murai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Veloso</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          ,
          <article-title>Supervised learning for fake news detection</article-title>
          ,
          <source>IEEE Intelligent Systems</source>
          , vol.
          <volume>34</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>76</fpage>
          -
          <lpage>81</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Jarrahi</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Safari</surname>
          </string-name>
          ,
          <article-title>Evaluating the effectiveness of publishers' features in fake news detection on social media</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          , vol.
          <volume>82</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>2913</fpage>
          -
          <lpage>2939</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Zafarani</surname>
          </string-name>
          ,
          <article-title>A survey of fake news: Fundamental theories, detection methods, and opportunities</article-title>
          ,
          <source>ACM Computing Surveys (CSUR)</source>
          , vol.
          <volume>53</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Anuja</surname>
          </string-name>
          ,
          <article-title>Linguistic feature based learning model for fake news detection and classification</article-title>
          ,
          <source>Expert Systems with Applications</source>
          , vol.
          <volume>169</volume>
          , p.
          <fpage>114171</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Reinartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>A stylometric inquiry into hyperpartisan and fake news</article-title>
          ,
          <source>arXiv preprint arXiv:1702.05638</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>K.-H. Huang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>McKeown</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Choi</surname>
            and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
          </string-name>
          ,
          <article-title>Faking fake news for real fake news detection: Propaganda-loaded training data generation</article-title>
          ,
          <source>arXiv preprint arXiv:2203.05386</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Maximilian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rosasco</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Poggio</surname>
          </string-name>
          ,
          <article-title>Holographic embeddings of knowledge graphs</article-title>
          ,
          <source>Proceedings of the AAAI conference on artificial intelligence</source>
          , vol.
          <volume>30</volume>
          , no.
          <issue>1</issue>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mitra</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Gilbert</surname>
          </string-name>
          ,
          <article-title>Credbank: A large-scale social media corpus with associated credibility annotations</article-title>
          ,
          <source>Proceedings of the international AAAI conference on web and social media</source>
          , vol.
          <volume>9</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>258</fpage>
          -
          <lpage>267</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Khanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Alwasel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sirafi</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mamoon</surname>
          </string-name>
          ,
          <article-title>Fake news detection using machine learning approaches</article-title>
          ,
          <source>IOP conference series: materials science and engineering</source>
          , vol.
          <volume>1099</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>012040</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <article-title>A review of fake news detection using machine learning techniques</article-title>
          ,
          <source>2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>The future of misinformation detection: new perspectives and trends</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .03654,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>E.</given-names>
            <surname>Aïmeur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amri</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Brassard</surname>
          </string-name>
          ,
          <article-title>Fake news, disinformation and misinformation in social media: a review, Social Network Analysis and Mining</article-title>
          , vol.
          <volume>13</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>30</fpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.-I.</given-names>
            <surname>Ilie</surname>
          </string-name>
          , C.
          <article-title>-</article-title>
          <string-name>
            <surname>O. Truică</surname>
            ,
            <given-names>E.-S.</given-names>
          </string-name>
          <string-name>
            <surname>Apostol</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Paschke</surname>
          </string-name>
          ,
          <article-title>Context-Aware Misinformation Detection: A Benchmark of Deep Learning Architectures Using Word Embeddings</article-title>
          , IEEE Access, vol.
          <volume>9</volume>
          , pp.
          <fpage>162122</fpage>
          --
          <lpage>162146</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fei</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>A survey on deepfake video detection</article-title>
          ,
          <source>Iet Biometrics</source>
          , vol.
          <volume>10</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>607</fpage>
          --
          <lpage>624</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>"liar, liar pants on fire": A new benchmark dataset for fake news detection</article-title>
          ,
          <source>arXiv preprint arXiv:1705.00648</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mahudeswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media, Big data</article-title>
          , vol.
          <volume>8</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>188</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <article-title>Linguistic feature based learning model for fake news detection and classification</article-title>
          ,
          <source>Expert Systems with Applications</source>
          , vol.
          <volume>169</volume>
          , p.
          <fpage>114171</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Janicka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pszona</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Wawer</surname>
          </string-name>
          ,
          <article-title>Cross-domain failures of fake news detection</article-title>
          ,
          <source>Computación y Sistemas</source>
          , vol.
          <volume>23</volume>
          , no.
          <issue>3</issue>
          , pp.
          <source>Cross-domain failures of fake news detection</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Reinartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>A stylometric inquiry into hyperpartisan and fake news</article-title>
          ,
          <source>arXiv preprint arXiv:1702.05638</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>F. K. A.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Feel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Elbassuoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jaber</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Farah</surname>
          </string-name>
          ,
          <article-title>Fa-kes: A fake news dataset around the syrian war</article-title>
          ,
          <source>Proceedings of the international AAAI conference on web and social media</source>
          , vol.
          <volume>13</volume>
          , pp.
          <fpage>573</fpage>
          --
          <lpage>582</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          , G. Da San Martino, T. Elsayed,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Nath</given-names>
            <surname>Nandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Cheema</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Azizov</surname>
          </string-name>
          ,
          <article-title>The clef-2023 checkthat! lab: Checkworthiness, subjectivity, political bias, factuality, and authority</article-title>
          ,
          <source>European Conference on Information Retrieval</source>
          , pp.
          <fpage>506</fpage>
          --
          <lpage>517</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Cheema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hakimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF2023 CheckThat! lab task 1 on check-worthiness in multimodal and multigenre content</article-title>
          ,
          <source>Working Notes of CLEF</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barron-Cedeno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Azizov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Cheema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Ruggeri and Struß, Overview of the CLEF-</article-title>
          2023 CheckThat! Lab on Checkworthiness, Subjectivity, Political Bias, Factuality, and
          <article-title>Authority of News Articles and Their Source, International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>275</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>