<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dataset Landscape for Automated Propaganda Detection: A Data-Centric Insight</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Usai</string-name>
          <email>marco.usai6@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Antonio Mura</string-name>
          <email>davideantonio.mura@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Loddo</string-name>
          <email>andrea.loddo@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuela Sanguinetti</string-name>
          <email>manuela.sanguinetti@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Zedda</string-name>
          <email>luca.zedda@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cecilia Di Ruberto</string-name>
          <email>cecilia.dir@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Atzori</string-name>
          <email>atzori@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Propaganda Detection, Span Identification, Dataset Benchmarking</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, University of Cagliari</institution>
          ,
          <addr-line>Via Ospedale 72, Cagliari, 09124</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>9</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>The increasing spread of propaganda in digital media has intensified research eforts toward the development of automated detection systems. Central to this task is the availability and quality of annotated datasets, which directly impact model performance, generalizability, and real-world applicability. In this paper, we present a data-centric insight into the current landscape of datasets used for automated propaganda detection. We analyze a representative set of publicly available corpora with respect to key factors such as annotation schemes, label granularity, domain coverage, linguistic diversity, and class balance. This work aims to guide researchers toward more robust, inclusive, and scalable approaches to propaganda detection by emphasizing the foundational role of data quality and structure.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Propaganda is a deliberate and systematic form of communication that aims to influence the opinions,
beliefs, and behaviours of a target audience. This influence is often exerted through the selection,
omission, or distortion of information. The employment of sophisticated persuasive techniques by the
sender is evident, with the utilisation of emotional appeals, oversimplification, and the use of stereotypes
and slogans. The outcomes of this communication are primarily oriented towards the interests of the
sender rather than those of the recipient.</p>
      <p>Historically, propaganda has played a central role in a variety of contexts, from religion and politics
to modern mass media and public relations systems. In the contemporary digital age, the capacity to
discern propaganda is of paramount importance. Hyperconnectivity and the deluge of information
have led to an escalation in the risk of manipulation through misinformation, fake news, and targeted
messaging designed to polarise public opinion.</p>
      <p>In such contexts, the ability to discern propaganda is crucial for promoting critical reflection,
preserving the integrity of public discourse, and ensuring the proper functioning of democratic processes.
This paper presents a mini-review and analysis of 10 research studies on propaganda detection,
published between 2021 and 2025. For each work, an examination is conducted of the employed dataset,
including its annotation scheme and characteristics, the methodological approach adopted to address
the problem, and the results achieved. Due to limitations in available space, the decision has been made
to select primarily works that address the problem using a specific annotation scheme, namely span
identification.</p>
      <p>This choice is motivated by the fact that span detection enables a more fine-grained and interpretable
analysis of propaganda techniques. Unlike binary classification, which merely labels entire texts or
documents as propagandistic or not, span detection allows researchers to pinpoint the exact textual</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
segments where propaganda occurs, along with the specific technique being used. This granularity is
essential for understanding how propaganda is constructed and communicated within a text,
facilitating more informative downstream applications such as fact-checking, media literacy education, and
automated content moderation.</p>
      <p>
        Recent advancements in AI have led to the development of data-centric AI (DCAI), in which data,
rather than models, is recognized as the primary driver of robust and adaptable systems. As outlined
by Malerba and Pasquadibisceglie [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], central to DCAI is the understanding that datasets are dynamic,
evolving resources that require ongoing curation, enrichment, and validation. This principle is
particularly critical for propaganda detection because the tactics, linguistic nuances, and dissemination
strategies of propaganda actors are fluid and adapt quickly to sociopolitical contexts. Therefore, datasets
for automated propaganda detection must be conceptualized as living resources that are continuously
refined to maintain relevance and efectiveness in a rapidly changing landscape.
      </p>
      <p>Our review of ten prominent propaganda detection corpora reveals a consistent pattern. Although
annotation schemes have become more detailed and multilingual coverage has improved, most datasets
remain largely static after release, with little evidence of continued maintenance or updates. The
discrepancy between dataset stasis and task dynamism underscores the necessity of aligning future
dataset development with the DCAI paradigm.</p>
      <p>The structure of the paper is thus organized as follows: Section 2 outlines the criteria adopted for
selecting the reviewed works and describes the research methodology, Section 3 provides an overview of
the selected studies, while Section 4 ofers a synthesis of key findings and discusses broader implications
and conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Search Protocol</title>
      <p>The primary criterion for selecting the literature reviewed in this survey was the relevance and
contribution of existing datasets for automatic propaganda detection, with particular attention to those
supporting span-level annotations. Rather than focusing solely on methodological innovations, the
survey emphasizes corpora that have played a significant role in shaping recent research directions and
enabling fine-grained analysis of propaganda in online content.</p>
      <p>We thus conducted a literature search primarily focusing on Google Scholar and using a specific
set of keywords, i.e., “computational propaganda detection”, “social media manipulation detection”,
“automated propaganda detection”, and “misinformation detection”. These keywords were selected
to encompass a wide range of approaches that address the detection of persuasive, manipulative, or
misleading content on social media and other digital platforms. Priority was given to peer-reviewed
papers that introduce documented and publicly available datasets, either accessible via direct download
or obtainable upon request from the dataset curators. The selection focused in particular on resources
with the following main characteristics:
• Span-level annotations, allowing for fine-grained identification of propagandistic content within
texts;
• Rich annotation schemes capturing related phenomena, such as persuasion techniques,
manipulative strategies, or framing categories.</p>
      <p>These criteria aimed to ensure the inclusion of high-quality datasets that support both detailed
analysis and replicability in propaganda detection research.</p>
      <p>Due to the very focused search scope, we found 10 datasets, all released between 2019 and early 2025.
Despite their recent release, many of them have been widely used as benchmark datasets to develop or
test computational approaches, thus highlighting the wide interest in this topic and the related tasks
that these resources aim to support. To prove this, Table 1 summarizes the results of this search, along
with the publication year and number of citations as available on Google Scholar.</p>
      <p>While the next section briefly outlines the datasets found in our search, Section 4 aims to broadly
discuss the main findings in terms of specific key factors, particularly oriented to a more data-centric
perspective.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Datasets Overview</title>
      <p>This section summarizes the retrieved information about the identified datasets, also providing Table 2
as a reference table.</p>
      <p>
        Propaganda Techniques Corpus (PTC) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] PTC is a corpus designed for fine-grained propaganda
detection in news articles. It contains 451 English news articles annotated at the span level, where each
propagandistic fragment is marked and labeled with one of 18 propaganda techniques. The annotations
include precise character ofsets for each span. The articles were sourced from both mainstream and
suspicious outlets, and the annotations were conducted by trained experts.
      </p>
      <p>
        PTC-SemEval 2020 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] The dataset was proposed for the SemEval-2020 Task 11 on Propaganda
Detection, and it precisely builds upon the theoretical framework devised initially for the seminal PTC
dataset. It thus focuses on detecting propaganda techniques in English news articles, and it was designed
to address two subtasks: span identification (detecting specific text spans containing propaganda) and
technique classification (assigning one of 14 propaganda technique labels to each span). It contains over
500 articles, annotated by experts. The annotations are fine-grained, marking exact character ofsets,
which allows for a detailed analysis of propaganda within the text.1
WANLP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] The dataset was developed for the WANLP 2022 Shared Task 3 on Shared Task 3:
Propaganda Detection in Arabic Social Media Text. It precisely consists of Arabic news articles annotated
for propaganda techniques at the span level. Inspired by the PTC dataset, this is the first of its kind in
Arabic, aiming to support fine-grained propaganda detection in a low-resource language. 2
SemEval 2023 Task 3 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] Similarly to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], this dataset is a benchmark for detecting persuasion
techniques in news articles. The main novelty introduced in this version is that it includes news
articles in six languages, i.e., English, French, German, Italian, Polish, and Russian. The resource
comprised data collected from 2020 to mid-2022. It covers a wide range of topics, ranging from
COVID19 to the Russo-Ukrainian war. he dataset draws from both mainstream and alternative media, with
many alternative sources flagged by fact-checkers as potential disinformation spreaders. Articles
were gathered using news aggregators (e.g., Google News, EMM) and credibility-rating platforms (e.g.,
NewsGuard, MediaBiasFactCheck).3
1https://huggingface.co/datasets/SemEvalWorkshop/sem_eval_2020_task_11
2https://sites.google.com/view/wanlp2022/shared-tasks?authuser=0
3https://propaganda.math.unipd.it/semeval2023task3/
BanMANI [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] This is a novel dataset of 800 Bangla social-media posts paired with 500 reference news
articles, annotated for binary manipulation and manipulation spans; 530 instances being manipulated
and 270 non-manipulated. The authors propose a semi-automatic annotation pipeline tailored for
low-resource NLP settings, ensuring balanced coverage of manipulated versus benign posts.4
Salman et al. [7] This dataset consists of 1,030 English–Roman Urdu code-switched social media
snippets, each annotated at the fragment level with one or more of 20 propaganda techniques. This
dataset was developed to support a novel task focused on the fine-grained detection of propaganda in
multilingual and informal online discourse.5
ArAIEval [8] The ArAIEval Shared Task dataset covers both unimodal (Task 1) and multimodal (Task
2) Arabic content. We focus here on Task 1 data in particular, as it is designed precisely for span-level
propaganda detection. The resource comprises 9,000 Arabic text snippets—1,500 tweets and 7,500
news paragraphs—annotated with 23 persuasion techniques, marking precise character-level spans.
Annotations were performed by three annotators with expert consolidation across iterative stages.6
ArMPro [9] The ArMPro dataset consists of 20,487 annotated spans across train (15,437), development
(1,699), and test (3,351) sets. Each span is labeled with one or more of 23 propaganda techniques.7
ZenPropaganda [10] ZenPropaganda is a Russian-language dataset focused on COVID-19-related
online media, containing 125 texts and nearly 2,400 annotated propaganda fragments, labeled with 36
diferent propaganda techniques. The annotations follow a fine-grained schema inspired by the one
used in SemEval 2020, assigning each span a specific propaganda technique. 8
ManiTweet [11] The paper focuses on detecting tweets that intentionally misrepresent information
from associated news articles. ManiTweet comprises 3,600 tweet-article pairs annotated with binary
labels, manipulation spans, and types of manipulation.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Propaganda detection remains a complex and evolving task, with challenges arising across datasets,
models, and evaluation frameworks. From the analysis of these texts, several key factors emerge that
are consistently observed across diferent studies.</p>
      <p>First of all, a consistent issue across nearly all corpora is class imbalance [12, 10, 9]. Techniques such
as Loaded Language and Name Calling are overrepresented, while rarer strategies (e.g., Straw Man,
Whataboutism) often sufer from near-zero F1-scores [ 13]. This imbalance impacts model reliability,
particularly for zero-shot and few-shot settings with LLMs [14].</p>
      <p>Another critical factor that emerges is the variability in annotation schemes across datasets.
Although span-level annotation, which aims to identify specific persuasion techniques, is the most
widely adopted strategy, the degree of granularity and the definition of labels vary considerably.
Many datasets build upon the fine-grained labeling introduced by SemEval 2020 Task 11, yet notable
divergences remain. For example, ZenPropaganda defines 36 distinct techniques organized into broader
meta-classes [10], whereas ArPro adopts a set of 23 techniques and introduces a dedicated no technique
label for neutral spans [9]. Annotation quality also presents challenges: for example, the ManiTweet
dataset [11] requires precise span-level alignment between tweets and their referenced articles, a process
inherently afected by subjectivity and interpretative variation.
4https://github.com/kamruzzaman15/banmani
5https://github.com/mbzuai-nlp/propaganda-codeswitched-text
6https://gitlab.com/araieval/araieval_arabicnlp24/-/tree/main/task1/data
7https://github.com/MaramHasanain/ArMPro
8https://github.com/aschern/ru_zen_prop</p>
      <p>Linguistic diversity is another critical factor. While many datasets remain English-centric, eforts
like WANLP [15], ZenPropaganda [10], and ArPro [9] expand the scope to Russian, Arabic, Polish,
Bangla, and code-switched content. These studies highlight the limitations of LLMs such as GPT-3.5
and GPT-4 in zero-shot or cross-lingual settings. In particular, BanMANI revealed performance gaps
caused by linguistic and cultural mismatches, while ArPro and ArAIEval [14] confirmed that fine-tuned,
language-specific models still outperform LLMs on span-based and multi-label tasks.</p>
      <p>Another important aspect to highlight is the issue of domain coverage. News media remains the
most frequently targeted domain, especially in benchmark datasets such as SemEval 2020 [16, 12],
SemEval 2023 [17], and ArPro [9]. These resources typically consist of formal texts with span-level
annotations of persuasive techniques. However, they vary significantly in geographical and linguistic
scope: ArPro focuses on Arabic news, while SemEval 2023 includes six languages, although practical
experiments are limited to English and Russian due to tool availability. ZenPropaganda [10], in contrast,
narrows the scope to Russian COVID-related media, ofering a domain-specific perspective but with
limited generalizability.</p>
      <p>
        In contrast, social media presents a markedly diferent and more challenging environment,
characterized by short, informal, and noisy content. Datasets like BanMANI [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the code-switched
corpus by Salman et al. [7] focus on low-resource and linguistically diverse contexts such as Bangla
and English–Roman Urdu, respectively. Similarly, ManiTweet [11] addresses the manipulation of news
content within tweets, combining brevity with factual misrepresentation. These examples illustrate the
challenges of transferring models trained on formal, monolingual text to informal, multilingual, and
often culturally nuanced settings.
      </p>
      <p>When framed within the DCAI paradigm, it is clear that none of the identified datasets provide
evidence of life-cycle maintenance or regular updates beyond their initial publication. The static nature
of these corpora poses a significant challenge because, as propaganda techniques, languages, and media
landscapes evolve, models trained on outdated or unrefreshed data are at risk of poor generalization,
reduced robustness, and susceptibility to emerging manipulation strategies. However, the SemEval
series and PTC are community benchmarks that have driven progress. Nevertheless, their underlying
data has not been periodically re-annotated or augmented to reflect current events or techniques beyond
the original release window.</p>
      <p>To complement this brief overview, we also examined a selection of research works—identified using
the same keywords mentioned in Section 2—that employed these benchmarks to develop and evaluate
their own models. The goal is to illustrate the types of models and approaches these datasets have
supported, while also highlighting the performance achieved and the main limitations reported in the
literature. Table 3 provides a final synthesis of the analyzed works and the corresponding employed
datasets for the span identification task, while Table 2 thus summarizes all the information about the
datasets identified, models applied, methods employed, limitations encountered, and results achieved in
each cited study.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>This mini-review has provided a comprehensive overview of recent research eforts in propaganda
detection, with a particular focus on span-level annotation schemes. The analysis of 10 key studies
reveals several recurring challenges and patterns. Chief among these is the pervasive issue of class
imbalance, which consistently hinders the detection of less frequent persuasion techniques and impacts
model robustness, particularly in zero-shot and cross-lingual scenarios. Additionally, the lack of
standardization in annotation granularity and label definitions complicates cross-dataset comparisons
and model generalization.</p>
      <p>Strategies for evolving propaganda detection datasets should emphasize regular updates and
continuous improvement. This can be achieved by periodically re-annotating data to capture emerging linguistic
trends and evolving propaganda tactics; validating and refining annotations for quality and consistency;
and encouraging community-driven contributions to enrich the dataset. Adopting modular dataset
designs and leveraging automated monitoring for distributional changes enables flexible adaptation to
new challenges. These approaches are essential to ensuring that datasets remain relevant and robust by
aligning them with the dynamic nature of language and online propaganda through data-centric AI
principles.</p>
      <p>In the future, it would be worthwhile to explore several directions. Firstly, it is imperative to develop
richer, multilingual, and cross-domain datasets with consistent annotation guidelines to advance model
generalization. Greater emphasis should be placed on annotator agreement and validation procedures
to ensure reliability and reduce label noise. Secondly, enhancing the adaptability of LLMs through
lightweight fine-tuning or hybrid architectures could help narrow the performance disparity with
taskspecific models, particularly in settings with limited resources. Finally, the incorporation of multimodal
data, such as text and images in memes, and context-aware features, such as discourse structures or
user metadata, has the potential to further enhance the accuracy of detection in real-world applications,
including content moderation, media literacy tools, and automated fact-checking systems.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported in part by project SERICS (PE00000014), under the MUR NRRP funded by the
EU - NextGenerationEU and in part by the project DEMON ‘‘Detect and Evaluate Manipulation of ONline
information’’ funded by MIUR, Italy under the PRIN 2022 grant 2022BAXSPY (CUP F53D23004270006,
NextGenerationEU).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT to paraphrase and reword. After using
this tool/service, the authors reviewed and edited the content as needed and take full responsibility for
the publication’s content.
S. Sharof (Eds.), Proceedings of the Workshop on Computational Terminology in NLP and
Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable
Corpora (BUCC), INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria, 2023, pp. 51–58. URL:
https://aclanthology.org/2023.contents-1.7/.
[7] M. U. Salman, A. Hanif, S. Shehata, P. Nakov, Detecting propaganda techniques in code-switched
social media text, in: H. Bouamor, J. Pino, K. Bali (Eds.), Proceedings of the 2023 Conference on
Empirical Methods in Natural Language Processing, Association for Computational Linguistics,
Singapore, 2023, pp. 16794–16812. URL: https://aclanthology.org/2023.emnlp-main.1044/. doi:10.
18653/v1/2023.emnlp-main.1044.
[8] M. Hasanain, M. A. Hasan, F. Ahmed, R. Suwaileh, M. R. Biswas, W. Zaghouani, F. Alam, ArAIEval
Shared Task: Propagandistic techniques detection in unimodal and multimodal arabic content, in:
Proceedings of the Second Arabic Natural Language Processing Conference (ArabicNLP 2024),
Association for Computational Linguistics, Bangkok, 2024.
[9] M. Hasanain, F. Ahmad, F. Alam, Can GPT-4 identify propaganda? annotation and detection
of propaganda spans in news articles, in: N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti,
N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics,
Language Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL, Torino, Italia, 2024,
pp. 2724–2744. URL: https://aclanthology.org/2024.lrec-main.244/.
[10] A. Chernyavskiy, S. Shomova, I. Dushakova, I. Kiriya, D. Ilvovsky, ZenPropaganda: A
comprehensive study on identifying propaganda techniques in Russian coronavirus-related media,
in: N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of the
2024 Joint International Conference on Computational Linguistics, Language Resources and
Evaluation (LREC-COLING 2024), ELRA and ICCL, Torino, Italia, 2024, pp. 17795–17807. URL:
https://aclanthology.org/2024.lrec-main.1548/.
[11] K.-H. Huang, H. P. Chan, K. McKeown, H. Ji, ManiTweet: A new benchmark for identifying
manipulation of news on social media, in: O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. D.
Eugenio, S. Schockaert (Eds.), Proceedings of the 31st International Conference on Computational
Linguistics, Association for Computational Linguistics, Abu Dhabi, UAE, 2025, pp. 11161–11180.</p>
      <p>URL: https://aclanthology.org/2025.coling-main.739/.
[12] W. Li, S. Li, C. Liu, L. Lu, Z. Shi, S. Wen, Span identification and technique classification of
propaganda in news articles, Complex &amp; Intelligent Systems 8 (2022) 3603–3612.
[13] P. N. Ahmad, L. Yuanchao, K. Aurangzeb, M. S. Anwar, Q. M. u. Haq, Semantic web-based
propaganda text detection from social media using meta-learning, Service Oriented Computing
and Applications (2024) 1–15.
[14] M. Hasanain, M. A. Hasan, F. Ahmad, R. Suwaileh, M. R. Biswas, W. Zaghouani, F. Alam, ArAIEval
shared task: Propagandistic techniques detection in unimodal and multimodal Arabic content, in:
N. Habash, H. Bouamor, R. Eskander, N. Tomeh, I. Abu Farha, A. Abdelali, S. Touileb, I. Hamed,
Y. Onaizan, B. Alhafni, W. Antoun, S. Khalifa, H. Haddad, I. Zitouni, B. AlKhamissi, R. Almatham,
K. Mrini (Eds.), Proceedings of the Second Arabic Natural Language Processing Conference,
Association for Computational Linguistics, Bangkok, Thailand, 2024, pp. 456–466. URL: https:
//aclanthology.org/2024.arabicnlp-1.44/. doi:10.18653/v1/2024.arabicnlp-1.44.
[15] A. S. Hussein, A. B. S. Mohammad, M. Ibrahim, L. H. Afify, S. R. El-Beltagy, NGU CNLP
atWANLP 2022 shared task: Propaganda detection in Arabic, in: H. Bouamor, H. Al-Khalifa,
K. Darwish, O. Rambow, F. Bougares, A. Abdelali, N. Tomeh, S. Khalifa, W. Zaghouani (Eds.),
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), Association
for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 2022, pp. 545–550. URL:
https://aclanthology.org/2022.wanlp-1.66/. doi:10.18653/v1/2022.wanlp-1.66.
[16] J. Szwoch, M. Staszkow, R. Rzepka, K. Araki, Limitations of large language models in propaganda
detection task, Applied Sciences 14 (2024) 4330.
[17] A. Chernyavskiy, D. Ilvovsky, P. Nakov, Unleashing the power of discourse-enhanced transformers
for propaganda detection, in: Proceedings of the 18th Conference of the European Chapter of the
Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 1452–1462.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Malerba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pasquadibisceglie</surname>
          </string-name>
          ,
          <article-title>Data-centric ai</article-title>
          ,
          <source>Journal of Intelligent Information Systems</source>
          <volume>62</volume>
          (
          <year>2024</year>
          )
          <fpage>1493</fpage>
          -
          <lpage>1502</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Da San Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Fine-grained analysis of propaganda in news articles</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>5636</fpage>
          -
          <lpage>5646</lpage>
          . URL: https://aclanthology.org/D19-1565/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          - 1565.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Da San Martino</surname>
          </string-name>
          , A.
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Petrov</surname>
          </string-name>
          , P. Nakov, SemEval-2020 task 11:
          <article-title>Detection of propaganda techniques in news articles</article-title>
          , in: A.
          <string-name>
            <surname>Herbelot</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>May</surname>
          </string-name>
          , E. Shutova (Eds.),
          <source>Proceedings of the Fourteenth Workshop on Semantic Evaluation</source>
          , International Committee for Computational Linguistics,
          <source>Barcelona (online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1377</fpage>
          -
          <lpage>1414</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .semeval-
          <volume>1</volume>
          .186/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .semeval-
          <volume>1</volume>
          .
          <fpage>186</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Da San Martino, Overview of the WANLP 2022 shared task on propaganda detection in Arabic</article-title>
          ,
          <source>in: Proceedings of the Seventh Arabic Natural Language Processing Workshop</source>
          , Association for Computational Linguistics, Abu Dhabi,
          <string-name>
            <surname>UAE</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Piskorski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Stefanovitch</surname>
          </string-name>
          , G. Da San Martino, P. Nakov, SemEval
          <article-title>-2023 task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup</article-title>
          , in: A.
          <string-name>
            <surname>K. Ojha</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          <string-name>
            <surname>Doğruöz</surname>
            , G. Da San Martino, H. Tayyar Madabushi,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          , E. Sartori (Eds.),
          <source>Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>2343</fpage>
          -
          <lpage>2361</lpage>
          . URL: https: //aclanthology.org/
          <year>2023</year>
          .semeval-
          <volume>1</volume>
          .317/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .semeval-
          <volume>1</volume>
          .
          <fpage>317</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kamruzzaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. M. I.</given-names>
            <surname>Shovon</surname>
          </string-name>
          , G. Kim,
          <article-title>BanMANI: A dataset to identify manipulated social media news in Bangla</article-title>
          , in: A. H.
          <string-name>
            <surname>Haddad</surname>
            ,
            <given-names>A. R.</given-names>
          </string-name>
          <string-name>
            <surname>Terryn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Mitkov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Rapp</surname>
          </string-name>
          , P. Zweigenbaum,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>