<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEPLN</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Corpus Resulting From the DEFALAC Project on Fallacies Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fermín Cruz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José A. Troyano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando Enríquez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>F. Javier Ortega</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Languages and Systems. School of Computer Engineering. University of Seville.</institution>
          <addr-line>Av. Reina Mercedes s/n. 41012, Seville</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>41</volume>
      <fpage>23</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>The DEFALAC project focuses on the automatic detection and generation of argumentative fallacies in Spanish using language models based on deep learning. Its main objective is to improve public debate and address the lack of resources in Spanish. This article presents the corpora generated during the execution of the project: FallacyES, composed of prototypical fallacies translated and revised from the Logic corpus, along with spontaneous fallacies extracted from comments on meneame.net; FallacyES-Political, containing approximately 2,000 fallacies extracted from 19 Spanish electoral debates over 30 years, classified into 16 types; and DebatES, based on the same transcripts as FallacyES-Political, enriched with additional information for automated analysis and manual exploration.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Detection of fallacies</kwd>
        <kwd>annotated corpus</kwd>
        <kwd>large language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Description and objectives of the DEFALAC project</title>
      <p>social contexts. Furthermore, the lack of studies in Spanish represents an opportunity to generate
resources and methodologies specific to this language. The main objectives of the project are:</p>
      <p>This work focuses on Objective 2, presenting the resources generated during the project’s execution,
which have been published in various repositories. In the following sections, we present the main
features of three resources: the FallacyES corpus, the FallacyES-Political corpus, and the DebatES corpus.</p>
      <p>The FallacyES corpus is the first annotated fallacy corpus published in Spanish, central to the DEFALAC
project. It contains two collections: prototypical fallacies, translated and reviewed, and spontaneous
fallacies from meneame.net comments. Both include non-fallacious examples, crucial for training
fallacy detectors. While prototypical covers 12 types, spontaneous examples focus on 8. Preliminary
experiments highlighted the inherent complexity of detecting spontaneous fallacies.</p>
      <p>The FallacyES-Political corpus comprises 1,965 fallacy instances extracted from 19 Spanish General
Election debates spanning 30 years. Transcribed and annotated by experts, each entry provides the
fallacy text, type, speaker, and crucial contextual information. This dataset classifies fallacies into 16
distinct types, including new categories like Appeal to Authority. Benchmark evaluations underscored
the complexity of the classification task, highlighting benefits of fine-tuning domain-specific models
for accuracy.</p>
      <p>The DebatES corpus, still under development, will facilitate access and analysis of political
information. It is based on transcriptions from the same 19 Spanish electoral debates as FallacyES-Political
corpus, enriched with various data for automatic and manual analyses. Distributed in XML and HTML,
annotations include linguistic metrics (e.g., lexical diversity) and political-domain insights. It will also
integrate semantic and discourse information, such as thematic blocks and emotions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The FallacyES corpus</title>
      <p>
        The first annotated resource produced by the DEFALAC project was the FallacyES corpus 1 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To
our knowledge, this was the first annotated fallacy corpus published in Spanish. It consists of two
collections of fallacies: (1) prototypical fallacies, translated and reviewed from the Logic corpus [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
supplemented with manually created non-fallacious examples; and (2) spontaneous fallacies, extracted
from online comments on meneame.net, also accompanied by non-fallacious examples taken from the
same discussions.
      </p>
      <p>The types of fallacies included in the prototypical collection are as follows:
• Hasty generalization: drawing a general conclusion based on one or few cases.
• Ad hominem: attacking the person or entity making the argument instead of reasoning about
the argument itself.
• Ad populum: basing the truth (or falsehood) of an argument on its popularity.
• False causality: establishing a causal relationship between two phenomena without providing
evidence.
• Circular reasoning: erroneous reasoning in which the premise and conclusion rely on each
other for validation.
• Appeal to emotions: attempting to manipulate the audience’s emotions to win the debate.
• Red herring: introducing a new, unrelated topic to distract from the original debate.
1https://idus.us.es/items/
170398a1-81b7-4e08-93cc-ccff0cabd506
• Invalid deduction: presenting a seemingly logical argument with formal errors (false analogy,
denying the antecedent, afirming the consequent, etc.).
• Credibility: basing the truth (or falsehood) of an argument on the opinion of an authority
(argument from authority) or on traditionally accepted opinion (ad antiquitatem fallacy).
• False dilemma: presenting two options as the only possible choices, when in fact more exist.
• Straw man: reformulating the opponent’s argument in an exaggerated or caricatured way, then
attacking that distorted version.
• Intentional: any other type of fallacy not included in the above categories, in which the speaker’s
intent to win the debate without sound reasoning is evident.</p>
      <p>
        For the spontaneous fallacy corpus, examples were collected from comments on the website
meneame.net, a news aggregator where users discuss current events. To identify fallacies, following
the approach used by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we searched for comments that mentioned types of fallacies (accusatory
comments) and retrieved the messages they referred to (candidate comments). These candidates were
then manually reviewed to identify fragments that truly constituted fallacies.
      </p>
      <p>Not all accusations were substantiated, and some comments lacked enough context to clearly identify
a fallacy. In total, more than 14,000 candidate comments were analyzed. The aim was to classify fallacies
into the same 12 categories as the prototypical collection, but a significant number of examples were
found in only 8 categories. The remaining 4 were excluded due to insuficient accusatory comments
(fewer than 30) or lack of context needed for accurate classification.</p>
      <p>In addition to the type and text, each instance in the resource also includes the headline and summary
of the news article from which the comment was extracted. This context could be included as input for
fallacy detection and classification models. Table 1 shows the total number of instances for each fallacy
type.</p>
      <p>A significant contribution of this corpus is the inclusion of non-fallacious examples with similar
themes to the fallacies in both corpus sections, which is very useful for training and evaluating fallacy
detectors.</p>
      <p>
        In addition to the resource release, preliminary experiments on fallacy detection and classicfiation in
Spanish using the FallacyES corpus and the language models Flan-T5 and RoBERTa-BNE are detailed
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The results of these experiments highlighted the inherent complexity of the task, especially
when dealing with spontaneous fallacies. The performance of transformer-based models was also
compared with a logistic regression model using bag-of-words (TF-IDF), showing the importance of the
transformers’ ability to capture discourse structure in fallacy identification. Finally, experiments were
conducted to evaluate the capacity of models trained on prototypical fallacies to classify spontaneous
fallacies, revealing limited utility in this scenario.
3. The FallacyES-Political corpus
The FallacyES-Political dataset2 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] consists of examples of fallacies extracted from 19 debates between
candidates in Spain’s General Elections, broadcast on national radio or television. All debates with a
publicly available recording were processed, from the first televised debate in 1993 to those of the 2023
General Elections.
      </p>
      <p>
        The debates were transcribed using WhisperX [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and analyzed by three annotators with backgrounds
in journalism and philosophy. The resulting dataset comprises 1,965 instances extracted from the
speeches of 33 representatives belonging to 11 diferent political parties. For each instance, the dataset
includes the text, type of fallacy, the debate it was extracted from, and the speaker’s identity. Additionally,
contextual information surrounding the excerpt is provided, as it is essential in some cases to correctly
identify the type of fallacy. The fallacy types include 8 that were already present in the FallacyES
corpus (ad hominem, ad populum, appeal to emotions, red herring, false cause, false dilemma, hasty
generalization, and straw man), in addition to the following 8:
• Appeal to authority: Refers to a supposed authority (a person, organization, or group) who
agrees with or supports a claim, without providing concrete evidence beyond the citation of that
authority.
• Appeal to fear: A subtype of emotional appeal, in this case to fear. Seeks to persuade the audience
that, unless they accept the claim or act in a certain way, negative consequences, disaster, or
catastrophe will ensue.
• Complex question: A question that contains an implicit assertion or unproven premise, such
that answering it implies accepting the hidden assumption.
• False analogy: Compares elements or situations that are not truly comparable (or lacks suficient
reasoning to make them comparable), projecting characteristics from one element onto another
and drawing conclusions based on this.
• Flag-waving: A subtype of emotional appeal that invokes group identity, encouraging audiences
who identify with that group to accept the argument uncritically.
• Poisoning the well: An intensified form of ad hominem. It consists of a prolonged sequence of
negative accusations, or a single extremely harsh accusation, directed at an opponent or group in
order to discredit or ridicule them.
• Slippery slope: Suggests an improbable or exaggerated outcome as the consequence of a certain
action. It typically omits intermediate premises, using an initial assumption as the first step
toward an exaggerated conclusion.
• Slogan: A short and impactful phrase used to evoke emotions in the audience, often accompanied
by another fallacy known as argument by repetition.
      </p>
      <p>The corpus annotation process was divided into several phases. The first step involved creating
a catalog of potential fallacy types, including definitions and examples based on existing literature.
With this catalog, three annotators analyzed two full debates, identifying as many fallacies as possible.
Subsequently, the detected cases were discussed, and 16 fallacy types were selected for the dataset. An
initial version of the annotation guide was developed and refined through regular meetings.</p>
      <p>The remaining 17 debates were independently annotated by two annotators per debate, aiming
to maximize fallacy coverage. Cross-validation was then performed to assess annotation reliability,
resulting in an average agreement rate of 47.67%.</p>
      <p>Discrepancies were analyzed in meetings between annotators and project researchers, identifying
three main causes: (1) overlapping definitions of some fallacies, which led to guide adjustments; (2)
subjectivity in identifying certain fallacies, such as false cause; (3) the presence of multiple fallacies
within a single text segment, which led to the use of multilabel annotation. Finally, the project researchers
conducted a final curation, removing doubtful cases and ensuring the text segments were as concise as
possible.</p>
      <p>Table 2 shows the total number of instances per fallacy type.</p>
      <p>To assess the dataset’s utility, a benchmark evaluation of state-of-the-art language models was
conducted in a zero-shot classification setting. The results of the experiments underscored the complexity
of the fallacy classification task and the limitations of current LLMs in understanding context-dependent
reasoning. Additionally, the experiments demonstrated the benefits of fine-tuning a compact,
domainspecific model instead of relying solely on general-purpose LLMs, achieving signicfiant improvements
in classification accuracy with a more sustainable approach.</p>
    </sec>
    <sec id="sec-3">
      <title>4. The DebatES corpus</title>
      <p>The analysis of political debates is a task that has attracted significant research interest, as these types of
texts exhibit many features that make them particularly suitable for statistical and automated analysis.
These include a clear argumentative structure, the repetition of key themes, the use of rhetorical and
persuasive techniques, and a wide variety of approaches and styles.</p>
      <p>
        The DebatES corpus is the final resource we plan to produce as part of the DEFALAC project. It is
still under development and therefore not yet publicly available. This corpus aims to serve as a tool
to facilitate access to and analysis of political information, particularly the messages that political
parties direct at potential voters in the weeks leading up to an election. The resource is based on
the textual transcriptions of the same collection of debates used in the FallacyES-Political corpus and
will include diverse types of information to enable automatic analyses, as well as tools for manual
exploration. Specifically, two versions will be distributed: one in XML format for automatic processing,
and another in HTML format (with charts, sections, hyperlinks, filters, etc.) to allow convenient
navigation throughout the resource. The corpus will include three types of annotations:
1. Descriptive information about the debates, following the style of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
2. Linguistic information obtained using natural language processing tools
3. Automatically obtained political-domain annotations, manually reviewed
      </p>
      <p>
        Regarding linguistic information, the following metrics will be included, automatically computed
using the spaCy library [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], at both the intervention and participant level for each debate:
• Lexical diversity index
• Proportion of stopwords
• Average sentence length
• Average number of dependencies per verb
• Proportion of punctuation in the text
• Proportion of adjectives
• Proportion of adverbs
      </p>
      <p>
        To enrich the resource with semantic and discourse-related information, the texts are being processed
using the gemini-2.0-flash-exp language model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], with diferent prompts to automatically obtain
the following annotations, which will then be manually validated:
• Thematic blocks
• Summaries of topics addressed at the intervention level
• Relevant mentions of entities
• Concrete proposals made by speakers
• Assertions made by speakers
• Emotions associated with the messages according to Ekman’s scale [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
      </p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusions</title>
      <p>In this work, we have presented the annotated resources developed within the DEFALAC project. The
project focuses on the automatic detection and generation of argumentative fallacies using large-scale
language models based on deep learning, with a particular emphasis on the Spanish language. The main
objective of the project is to improve the quality of public debate through the development of tools that
can identify and generate fallacies, addressing the current lack of Spanish-language resources for this
task. Publishing such resources and making them available to the scientific community contributes
to open science and enables other researchers to use them for training new models or evaluating text
analysis systems.</p>
      <p>We have presented three resources with diferent characteristics and objectives. The FallacyES
corpus is composed of two collections: prototypical fallacies translated and reviewed from the Logic
corpus, with manually created non-fallacious counterparts; and spontaneous fallacies, extracted from
online comments on meneame.net, also with non-fallacious examples. The second corpus is
FallacyESPolitical, which contains fallacy examples extracted from 19 Spanish electoral debates held over three
decades, classified into 16 types of fallacies. The third resource, still under construction, is DebatES,
based on the textual transcriptions of the same collection of debates used in the FallacyES-Political
corpus. It will be enriched with various types of information to enable qualitative, quantitative, and
automatic analyses.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This publication was produced by members of the ITALICA and TIC-134 research groups at the
University of Seville. It was funded through the R&amp;D project PID2021-123005, financed by the Ministry
of Science, Innovation and Universities of the Government of Spain (MCIN) AEI/10.13039/501100011033/
and “FEDER A way of making Europe”.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT in order to: Grammar and spelling
check. The authors reviewed and edited the content as needed and takes full responsibility for the
publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Troyano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Enríquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Ortega</surname>
          </string-name>
          , Detección y clasificación de falacias prototípicas y espontáneas en español,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          )
          <fpage>53</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lalwani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vaidhya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          , Logical fallacy detection, https://doi.org/10.48550/arXiv.2202.13758,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Sahai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Balalau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Horincar</surname>
          </string-name>
          ,
          <article-title>Breaking down the invisible wall of informal fallacies in online discussions, in: ACL-IJCNLP 2021-Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th</article-title>
          <source>International Joint Conference on Natural Language Processing</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Enríquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Troyano</surname>
          </string-name>
          ,
          <article-title>Fallacyes-political: A multiclass dataset of fallacies in spanish political debates</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>74</volume>
          (
          <year>2025</year>
          )
          <fpage>127</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huh</surname>
          </string-name>
          , T. Han,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Zisserman,
          <article-title>WhisperX: Time-accurate speech transcription of long-form audio</article-title>
          ,
          <source>INTERSPEECH</source>
          <year>2023</year>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Fiva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Nedregård</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Øien</surname>
          </string-name>
          ,
          <article-title>The norwegian parliamentary debates dataset</article-title>
          ,
          <source>Scientific Data</source>
          <volume>12</volume>
          (
          <year>2025</year>
          )
          <article-title>4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          , I. Montani,
          <string-name>
            <given-names>S. Van</given-names>
            <surname>Landeghem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boyd</surname>
          </string-name>
          , spaCy:
          <article-title>Industrial-strength natural language processing in python (</article-title>
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .5281/zenodo.1212303.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>G. DeepMind</surname>
          </string-name>
          , Gemini 2.0 flash,
          <year>2024</year>
          . URL: https://deepmind.google/technologies/gemini/flash/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ekman</surname>
          </string-name>
          ,
          <article-title>An argument for basic emotions</article-title>
          ,
          <source>Cognition and Emotion</source>
          <volume>6</volume>
          (
          <year>1992</year>
          )
          <fpage>169</fpage>
          -
          <lpage>200</lpage>
          . doi:
          <volume>10</volume>
          . 1080/02699939208411068.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>