Automatic Contradiction Detection in Spanish Robiert Sepúlveda-Torres Department of Software and Computing Systems, University of Alicante, Apdo. de Correos 99, E-03080 Alicante, Spain Abstract This paper addresses the lack of automated contradiction detection systems for the Spanish language. The ES-Contradiction dataset was created and contains examples with two pieces of information classi- fied as Compatible, Contradiction, or Unrelated. To the author’s knowledge, a Spanish-language contradiction dataset is non-existent and therefore, the ES-Contradiction dataset fills an important research gap, given Spanish being one of the most widely spoken languages. Moreover, the dataset built includes a fine-grained annotation of the different types of contradictions in the dataset. A baseline system was designed to validate the effectiveness of the dataset. The BETO transformer model was used to build this baseline system, which obtained a good result to detect the three class labels Compatible, Contradiction, or Unrelated. Keywords contradiction dataset, contradiction detection, natural language processing, deep learning, fake news 1. Introduction A huge amount of fake news is generated and distributed by digital media every day. Hence, the manual evaluation of its veracity is practically impossible in a reasonable time frame [1]. Fake news has existed for a long time [2], but the term “fake news” is relatively new, and it was defined by The New York Times as a “made up story with the intention to deceive, often with monetary gain as a motive” [3]. Artificial Intelligence techniques have been used in recent years to tackle the fake news problem [4, 5]. When a news item is misleading, it introduces contradictory information to a true news item and therefore, the detection of contradictions is a fundamental task to identify fake news [6]. One of the open problems raised by previous research [6] related to contradiction detection was to expand the search for resources (dataset, models, systems) that would allow the creation of contradiction detection systems in other languages or with cross-lingual approaches. This paper is a summary of a key section of my PhD whose aim is to design a generic architecture for fake news detection. For this purpose, a deep search for resources was carried out on contradiction detection from which we concluded that most of the resources and systems for contradiction detection are developed in English [7, 8, 9, 10]. Moreover, despite Spanish being one of the most widely spoken languages in the world, there are no powerful resources in the Spanish language to carry out the task of detecting contradictions. To fill this research gap and address the lack of Doctoral Symposium on Natural Language Processing from the PLN.net network 2021 (RED2018-102418-T), 19-20 October 2021, Baeza (Jaén), Spain $ rsepulveda@dlsi.ua.es (R. Sepúlveda-Torres)  0000-0002-2784-2748 (R. Sepúlveda-Torres) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 23 resources in the Spanish language, the main contributions of this research are the following: • First, a new Spanish dataset is built with different types of compatible, contradictory, and unrelated information for the purpose of creating a language model that is capable of automatically detecting contradictions between two pieces of information in this language. • Second, in a new Spanish dataset each contradiction is annotated with a fine-grained annotation, differentiating between different types. Specifically, four of the types of contradictions defined in [11] are covered: antonymy, negation, date/number mismatch, and structural. • Third, a set of experiments using the BETO [12] pretrained model has been applied to build the language model and validate its effectiveness. 2. Related Work The research presented in this paper is focused on contradiction detection by using computa- tional models. The most common approaches to contradiction detection in texts use the linguistic features extracted from texts to build a classifier by training from the annotated examples, such as the works in [13, 11, 14]. Linguistic evidences such as polarity, numbers, dates and times, antonymy, structure, factivity, and modality features were used by the authors of [11] to detect contradiction. Simple text similarity metrics (cosine similarity, f1 score, and local alignment) were used as baseline in [14], obtaining good results for contradiction classification. [13] tackled contradictions by means of three types of linguistic information: negation, antonymy, and semantic and pragmatic information associated with discourse relations. Currently, the availability of large annotated datasets for contradiction detection are mainly present in English [15], such as SNLI [7], MultiNLI (including multiple genres) [8], or even the cross-lingual dataset XNLI [16]. These datasets have allowed the training of complex deep learning systems, which require very large corpora to obtain successful results. There are a few studies that address the detection of contradictions in languages other than English, such as: 1. Machine translation of SNLI from English into German. A model was built using the German version of SNLI and the results of the predictions are very similar to the same model trained on the original SNLI version in English [15]. 2. A large-scale database of contradictory event pairs in the Japanese language has been created. This database is used to generate coherent statements for a dialogue system [17]. 3. Baseline systems are introduced to detect contradictions in the Persian language. Au- tomatic translation into the Persian language of SNLI and MNLI dataset samples was performed [18]. 3. ES-Contradiction: A New Spanish Contradiction Dataset Our dataset (ES-Contradiction) is focused on contradictions that are likely to appear in traditional news items written in the Spanish language. Unlike other datasets, for the dataset proposed in 24 this work, contradictions are annotated by distinguishing the type of contradiction according to its specific characteristics [19]. In order to create the ES-Contradiction dataset, news articles from a renowned Spanish source were automatically collected, including the headline and body text. According to the journalistic structure of a news item, the headline is the title of the news article, and it provides the main idea of the story. A headline is expected to be as effective as possible, without losing accuracy or becoming misleading [20]. Therefore, finding contradictions between headlines and body texts is a crucial task in the fight against the spread of disinformation. The ES-Contradiction dataset is annotated according to the following classes: • Compatible (two pieces of texts that address the same fact contain compatible informa- tion at the same time frame) • Contradiction (two texts that address the same fact contain incompatible information or two texts that address antonym facts contain compatible information at the same time) • Unrelated (two texts that address different facts) 3.1. Dataset Annotation Stages The dataset was built in four stages, subsequently outlined and detailed: 1. Extracting information from data source: The headline, body text, and date of the news item are extracted from a reliable data source. In this case, the news agency EFE 2. 1 . The news are extracted assuming that the headlines and body texts are compatible, although in the third stage this relationship is verified. 3. Modifying news headlines: The aim of this stage is to make the news headline contra- dictory to the body text by including simple alterations to the headline structure. These alterations will change the semantic content of the sentence, making it contradictory to the previous headline and body text. The changes to the headline together with some examples are given as follows: • NEGATION (Con_Neg): This alteration consists of negating the headline of the news item. a) Original headline: “El Gobierno pide permanecer en los domicilios porque la situación es grave” (“The government asks people to stay home as the situation is serious”) b) Modified headline: “El Gobierno no pide permanecer en los domicilios porque la situación no es grave” (“The government doesn’t ask people to stay home as the situation isn’t serious”) • ANTONYM (Con_Ant): This transformation consists of replacing the verb denoting the main event of the headline with an antonym. a) Original headline: “Las licencias VTC aumentan un 19% en lo que va de año” (“VTC licences increase by 19% this year”) b) Modified headline: “Las licencias VTC caen un 19% en lo que va de año” (“VTC licences fall by 19% this year”) 1 https://www.efe.com/efe/espana/1, accessed on 15 June 2021 25 • NUMERIC (Con_Num): This amendment consists of changing numbers, dates, or times appearing in the headline. a) Original headline: “Accionistas de Nissan aprueban la incorporación de cuatro nuevos consejeros” (“Nissan shareholders approve the appointment of four new board directors”) b) Modified headline: “Accionistas de Nissan aprueban la incorporación de diez nuevos consejeros” (“Nissan shareholders approve the appointment of ten new board directors”’) • STRUCTURE (Con_Str): This modification consists of changing the position of one word for another or substituting words in the sentence. a) Original headline: “Tokio avanza optimista por el diálogo entre EEUU y China” (“Tokyo is optimistic about the dialogue between USA and China”) b) Modified headline: “Tokio avanza optimista por el diálogo entre EEUU y UE” (“Tokyo is optimistic about the dialogue between USA and EU”) 4. Classifying the relationship between the headline and the body text: The seman- tic relationship between the headline and the body text was annotated in two phases: The first phase consisted of classifying the information into Compatible (compatible information) or Contradiction (contradictory information). In the second phase, in the case of Contradiction, the type of contradiction was also annotated (Negation, Antonym, Numeric, Structure). 5. Aleatory mixing headline and body text: The news items reserved in the first stage were used to generate unrelated examples (Unrelated). The headline was separated from the corresponding body text and all the headlines were randomly mixed with the body texts. In the mixing process, it was verified that the headline is not mixed with the corresponding body text. This step was done automatically without the intervention of the annotators. 3.2. Dataset Description The dataset consists of 7403 news items, of which 2431 contain Compatible headline–body news items, 2473 contain Contradiction headline–body news items, and 2499 are Unrelated headline–body news items. This represents a balanced dataset with three main classification items. The dataset split sizes for each annotated class are presented in Table 1. We partitioned the annotated news items into training and test sets. The dataset is available at Zenodo2 . As can be seen in Table 2, our dataset contains examples of each type of contradiction. However, it is important to clarify that there are few examples of structure contradiction, given the complexity of finding sentences that allow for this type of modification. 3.3. Dataset Validation Due to the particularities of the dataset annotation process, it was necessary to validate the second and third stages of the process. For the second stage, a super-annotator validation 2 https://zenodo.org/badge/latestdoi/344923645 26 Table 1 Dataset split sizes for each class. Split Compatible Contradiction Unrelated Training 1703 1733 1755 Test 728 740 744 Total items 2431 2473 2499 Table 2 Contradiction types in the dataset. Split Con-Neg Con-Ant Con-Num Con-Str Training 674 552 430 77 Test 287 236 184 33 Total items 961 788 614 110 was conducted, while for the third stage, an inter-annotator agreement was carried out. We randomly selected 4% of the Compatible and Contradiction pairs (n = 200) to carry out the dataset validations. 3.3.1. Super-Annotator Validation For the second stage, it was not possible to make an inter-annotator agreement because this stage consists of headline modifications and the possible variations are infinite. In this case, a manual review of the modified headlines is performed by the Super-Annotator to detect inconsistencies with the indications in the annotation guide. Only 2% of the analyzed examples present inconsistencies with the annotation guide, corroborating the validity of this stage. 3.3.2. Inter-Annotator Agreement In order to measure the quality of the third stage annotation, an inter-annotator agreement between two annotators was performed. In cases where there was no agreement, a consensus process was carried out among the annotators. Using Cohen’s 𝑘𝑎𝑝𝑝𝑎 [21] a 𝑘 = 0.83 was obtained, which validates the third-stage labeling. 4. Experiments and results To test the validity of the newly created Spanish contradiction dataset in this task, a baseline system was created that is based on the BETO3 model described in [12] that was previously pretrained in a Spanish dataset. The system was implemented using the Simple Transformer4 and 3 https://github.com/dccuchile/beto, accessed on 22 March 2021 4 https://simpletransformers.ai/, accessed on 15 June 2021 27 PyTorch5 libraries. In our experiments, the hyperparameter values of the model are maximum sequence length of 512, batch size of 4, training rate of 2e-5, and training performed for 3 epochs. The main objective of the experimentation proposed in this research is to demonstrate that a model is able to learn how to automatically detect contradiction types and contradictions with high accuracy from the ES-Contradiction dataset. 4.1. Predicting all classes This experiment is performed on the entire dataset to predict Compatible, Contradiction, or Unrelated for each example. The system created is capable of detecting the Unrelated class with a high level of precision and achieves significantly good results in the Compatible and Contradiction classes. Table 3 presents the results. Table 3 Results obtained from Experiment 1: Predicting Compatible, contradiction, or Unrelated information. 𝐹1 Score (%) 𝐹1 𝑚 (%) Acc (%) System Compatible Contradiction Unrelated BETO-All classes 88.70 89.12 99.59 92.47 92.49 The results obtained in the Unrelated class indicate that the system is capable of detecting with excellent 𝐹1 𝑚 these types of examples, corroborating the results obtained in the literature on this type of semantic relation between texts [22]. The other two classes have room for improvement, by using, for instance, external knowledge. 4.2. Detecting Contradiction vs. Compatible Information In this experiment, the Unrelated class is removed from the ES-Contradiction dataset to measure the accuracy of the approach in terms of distinguishing between compatible or contradiction information, assuming that the information is related. The results are shown in Table 4. The approach obtains similar results in both predicted classes. This is due to the quality of the training examples and the balanced number of examples from each class in this dataset. Table 4 Results obtained from Experiment 2: Detecting between compatible and contradiction information when the texts are related. 𝐹1 Score (%) 𝐹1 𝑚 (%) Acc (%) System Compatible Contradiction BETO-Contra_Comp 88.63 88.75 88.69 88.69 5 https://pytorch.org/, accessed on 15 June 2021 28 4.3. Detecting Specific Types of Contradictions This experiment aims to analyze the detection capability of the approach by contradiction types. Table 5 shows the results obtained exclusively for the detection of contradiction types. Table 5 Results obtained from Experiment 3: Detecting each specific type of contradiction treated. 𝐹1 Score (%) 𝐹1 𝑚 (%) Acc System Con-Neg Con-Ant Con-Num Con-Str BETO-Type of contradictions 97.90 93.20 92.39 68.75 88.06 93.78 The structural Contradiction class (Con_Str) is the one that obtains the lowest accuracy results and 𝐹1 𝑚. This contradiction type is considered one of the most complicated contradictions to detect compared with the other contradictions [11], which is in line with our results. In addition, the Con_Str class, due to the scarcity of training examples, contains the lowest number of examples in this dataset, so the model can learn more about other more representative classes. 5. Discussion The experiments carried out have validated the effectiveness of a contradiction detection system using the ES-Contradiction dataset. However, the experiments evidenced one of the deficiencies in the ES-Contradiction dataset namely, the lack of examples of the Con_Str class. It would also be interesting to include other types of contradictions that are not being taken into account in this dataset. For the purpose of improving the results of the Contradiction class, we can test by including resources that detect antonyms and synonyms in line with [23]. Furthermore, including syntactic and semantic information could improve the detection of other more complex contradictions, such as structural ones, without the need for such large datasets. It is highly likely that contradictions such as the structure contradiction need external semantic knowledge to improve detection results, similar to the introduction of SRL [10] and the use of Wordnet relations [9], both of which improve the results of deep learning models. 6. Conclusions This work has built the ES-Contradiction dataset, a new Spanish language dataset that contains Contradiction, Compatible, and Unrelated information. Unlike other datasets, in the ES- Contradiction dataset, contradictions are annotated with a fine-grained annotation. We used the BETO model to create an automated contradiction detection system. The results obtained by our system show that the created Spanish contradictions dataset is a good option for generating a language model that is able to detect contradictions in the Spanish language. In order to create a powerful contradiction detection system in Spanish, it is necessary to extend our dataset with other types of contradictions and add specific features. 29 Finally, the creation of this dataset will make it possible to validate the effectiveness of a contradiction detection architecture in the Spanish language that will be created in future works. Acknowledgments This research work has been partially funded by Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089, by the Spanish Government through project RTI2018-094653- B-C22: “Modelang: Modeling the behavior of digital entities by Human Language Technologies”, as well as being partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER) and the LIVING-LANG project (RTI2018-094653-B-C21) from the Spanish Government. References [1] G. Tsipursky, F. Votta, K. M. Roose, Fighting Fake News and Post-Truth Politics with Behavioral Science: The Pro-Truth Pledge, Behavior and Social Issues 27 (2018) 47–70. URL: https://doi.org/10.5210/bsi.v27i0.9127. doi:10.5210/bsi.v27i0.9127. [2] H. Allcott, M. Gentzkow, Social media and fake news in the 2016 election, Journal of economic perspectives 31 (2017) 211–36. [3] S. Tavernisen, As fake news spreads lies, more readers shrug at the truth, New York Times (2019). URL: http://nyti.ms/2lw56HN. [4] A. Alonso-Reina, R. Sepúlveda-Torres, E. Saquete, M. Palomar, Team GPLSI. approach for automated fact checking, in: Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 110–114. URL: https://www.aclweb.org/anthology/D19-6617. doi:10.18653/v1/ D19-6617. [5] R. Sepúlveda-Torres, M. Vicente, E. Saquete, E. Lloret, M. Palomar, Exploring summarization to enhance headline stance detection, in: E. Métais, F. Meziane, H. Horacek, E. Kapetanios (Eds.), Natural Language Processing and Information Systems, Springer International Publishing, Cham, 2021, pp. 243–254. [6] R. Sepúlveda-Torres, Identification of fake news by contradiction detection in texts, CEUR Workshop Proceedings 2802 (2020) 38–43. [7] S. R. Bowman, G. Angeli, C. Potts, C. D. Manning, A large annotated corpus for learning natural language inference, in: Proceedings of the 2015 Conference on Empirical Meth- ods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, 2015, pp. 632–642. URL: https://www.aclweb.org/anthology/D15-1075. [8] A. Williams, N. Nangia, S. Bowman, A broad-coverage challenge corpus for sentence under- standing through inference, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo- gies, Volume 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 1112–1122. URL: https://www.aclweb.org/anthology/N18-1101. [9] Q. Chen, X. Zhu, Z.-H. Ling, D. Inkpen, S. Wei, Natural language inference with external knowledge, arXiv preprint arXiv:1711.04289 42 (2017) 43. 30 [10] Z. Zhang, Y. Wu, H. Zhao, Z. Li, S. Zhang, X. Zhou, X. Zhou, Semantics-aware bert for language understanding, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2020, pp. 9628–9635. [11] M.-C. de Marneffe, A. N. Rafferty, C. D. Manning, Finding contradictions in text, in: Proceedings of ACL-08: HLT, Association for Computational Linguistics, Columbus, Ohio, 2008, pp. 1039–1047. URL: https://www.aclweb.org/anthology/P08-1118. [12] J. Canete, G. Chaperon, R. Fuentes, J. Pérez, Spanish pre-trained bert model and evaluation data, PML4DC at ICLR 2020 (2020). [13] S. Harabagiu, A. Hickl, F. Lacatusu, Negation, contrast and contradiction in text processing, in: Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1, AAAI’06, AAAI Press, 2006, p. 755–762. [14] P. Lendvai, U. D. Reichel, Contradiction Detection for Rumorous Claims, ArXiv (2016). URL: http://arxiv.org/abs/1611.02588. [15] R. Sifa, M. Pielka, R. Ramamurthy, A. Ladi, L. Hillebrand, C. Bauckhage, Towards contra- diction detection in german: a translation-driven approach, in: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), 2019, pp. 2497–2505. [16] A. Conneau, G. Lample, R. Rinott, A. Williams, S. R. Bowman, H. Schwenk, V. Stoyanov, XNLI: evaluating cross-lingual sentence representations, CoRR abs/1809.05053 (2018). URL: http://arxiv.org/abs/1809.05053. [17] Y. Takabatake, H. Morita, D. Kawahara, S. Kurohashi, R. Higashinaka, Y. Matsuo, Classifi- cation and acquisition of contradictory event pairs using crowdsourcing, in: Proceedings of the The 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representa- tion, Association for Computational Linguistics, Denver, Colorado, 2015, pp. 99–107. URL: https://www.aclweb.org/anthology/W15-0813. [18] Z. Rahimi, M. ShamsFard, Contradiction Detection in Persian Text (2021). URL: https: //arxiv.org/abs/2107.01987v1. arXiv:2107.01987. [19] R. Sepúlveda-Torres, A. Bonet-Jover, E. Saquete, “here are the rules: Ignore all rules”: Automatic contradiction detection in spanish, Applied Sciences 11 (2021) 3060. URL: https://doi.org/10.3390%2Fapp11073060. doi:10.3390/app11073060. [20] J. Kuiken, A. Schuth, M. Spitters, M. Marx, Effective headlines of newspaper articles in a digital environment, Digital Journalism 5 (2017) 1300–1314. [21] J. Cohen, A Coefficient of Agreement for Nominal Scales, Educational and Psychological Measurement 20 (1960) 37. [22] Q. Zhang, S. Liang, A. Lipani, Z. Ren, E. Yilmaz, From stances’ imbalance to their hierar- chical representation and detection, in: The World Wide Web Conference, ACM, 2019, pp. 2323–2332. [23] X. Kang, B. Li, H. Yao, Q. Liang, S. Li, J. Gong, X. Li, Incorporating synonym for lexical sememe prediction: An attention-based model, Applied Sciences 10 (2020) 5996. 31