Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish Mario Ezra Aragóna , Horacio Jarquín-Vásqueza , Manuel Montes-y-Gómeza , Hugo Jair Escalantea , Luis Villaseñor-Pinedaa,b , Helena Gómez-Adornoc , Juan-Pablo Posadas-Duráne and Gemma Bel-Enguixd a Laboratorio de Tecnologías del Lenguaje (INAOE), Mexico b Centre de Recherche en Linguistique Française GRAMMATICA (EA 4521), Université d’Artois, France c Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (UNAM), Mexico d Instituto de Ingeniería (UNAM), Mexico e Escuela Superior de Ingeniería Mecánica y Eléctrica, Unidad Zacatenco (IPN), Mexico Abstract This paper presents the overview of MEX-A3T 2020, the third edition of this lab under the IberLEF conference. The main purpose of MEX-A3T is to explore different methodologies and strategies related to the analysis of social media content in Mexican Spanish. This year edition focuses in the identification of fake news and the detection of aggressive tweets. For this purpose, we provided different news from verified web sources and a corpus of tweets from Mexican users. Keywords Fake news detection, aggressiveness detection, MEX-A3T, IberLEF 1. Introduction The goal of the third edition of MEX-A3T is to further improve the research in NLP tasks as well as to continue pushing the computational treatment of the Mexican Spanish. As a novelty, this year’s proposal introduces a new track on fake news detection and an improved corpus for the aggressive language detection track. The MEX-A3T@IberLEF2020 has the following two tracks: Aggressiveness Detection Track: Social networks represent a significant threat to users who are exposed to many risks and potential attacks. One of such threats is aggressive comments, which can produce long-term harm to victims, in the more accurate cases they can lead to suicide. This track follows up on last year’s evaluation task; it focuses on the detection of aggressive Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) email: mearagon@inaoep.mx (M.E. Aragón); horacio.jarquin@inaoep.mx (H. Jarquín-Vásquez); mmontesg@inaoep.mx (M. Montes-y-Gómez); hugojair@inaoep.mx (H.J. Escalante); villasen@inaoep.mx (L. Villaseñor-Pineda); helena.gomez@iimas.unam.mx (H. Gómez-Adorno); jposadasd@ipn.mx (J. Posadas-Durán); gbele@iingen.unam.mx (G. Bel-Enguix) orcid: 0000-0002-8213-957X (M.E. Aragón); 0000-0000-0000-0000 (H. Jarquín-Vásquez); 0000-0002-7601-501X (M. Montes-y-Gómez); 0000-0003-4603-3513 (H.J. Escalante); 0000-0003-1294-9128 (L. Villaseñor-Pineda); 0000-0002-6966-9912 (H. Gómez-Adorno); 0000-0001-9496-1328 (J. Posadas-Durán); 0000-0002-1411-5736 (G. Bel-Enguix) © 2020 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IberLEF 2020, September 2020, Málaga, Spain. CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) tweets in Mexican Spanish. However, for this year, the criteria for identifying aggression have been revised and a new enhanced data set has been created. Fake News Detection Track: Fake news provide information that aims to manipulate people for different purposes: terrorism, political elections, advertisement, among others. In social networks, misinformation extends in seconds among thousands of people, so it is necessary to develop tools that help control the amount of false information on the web. Particularly, fake news detection systems aim to help users to detect and filter out potentially deceptive news. The Fake News Detection Track consists in classifying a given set of news written in Mexican Spanish between true and fake. The remainder of this paper is as follows: Section 2 covers a brief description of the previous edition of MEX-A3T. Section 3 presents the evaluation framework used at MEX-A3T 2020. Section 4 shows an overview of the participating approaches. Section 5 reports and analyzes the results obtained by the participating teams. Finally, Section 6 presents our conclusions from this evaluation exercise. 2. MEX-A3T 2019 MEX-A3T is a forum for the analysis of social media content in Mexican Spanish. Last year, we organized the second edition of the MEX-A3T shared task [1], focusing on the problems of author profiling and aggressiveness detection. A variety of methods were proposed by the participants, comprising content-based (bag of words, word n-grams, term vectors, dictionary words, and so on), stylistic-based features (frequencies, punctuation, POS, Twitter-specific elements, slang words, and so forth), and approaches based on neural networks (CNN, LSTM, and others). For author profiling, as a novelty of previous year’s edition, it was considered the use of text and images as information sources. Our purpose was to study the relevance and complementarity of multimodal data for profiling social media users. Sadly, participants could not find an effective way to take advantage of both types of information, and did not outperform the baselines proposed. In the case of the aggressiveness identification, the top-ranked team was UACh [2]. This team used two main kinds of features, character n-grams and word embeddings, and employed two different classifiers, an SVM and a multilayer perceptron. The main idea of their participation was the inclusion of features for giving context to the text messages, and to explore if people verbally attack differently depending on their traits and overall environment. 3. MEX-A3T 2020 Evaluation Framework This section outlines the construction of the two used corpora, highlighting particular properties, challenges, and novelties. It also presents the evaluation measures used for both tasks. 223 Aggressive Non-Aggressive Y por que disculpa? El otro joto comenzó... que tiene de especial la tonta texas Le hubieras dado dos, no mas por joto y metiche Yo voy cualquiera que no seas tu @USUARIO No puedo creer que nos sig- @USUARIO . Viejo ladrón HDP. amos matando por tontas ideologías Indios estupidos no saben pa que putas las tontas no van al cielo es mi religión es la pasarela... una vez vi cruzar unas cabras por ahi.. no entiendo como ellas si entienden Table 1: Aggressive and Non-Aggressive Tweets. 3.1. Aggressiveness Detection Social networks represent a significant threat to users exposed to many risks and potential attacks. One such threat is aggressive comments, which can produce long-term harm to victims, and in some cases, they can lead to suicide. This track focuses on the detection of aggressive comments on Twitter, a topic with little study in the Ibero-American community. Participants have to develop methods to determine whether a tweet is aggressive or not. The track is challenging by the fact that tweets come from Mexican users with a variety of backgrounds and social expressions. We built a corpus of tweets for the task of aggressiveness detection from Mexican accounts. First, we selected a set of terms that served as seeds for extracting the tweets. We used the words classified as vulgar and non-colloquial in the Diccionario de Mexicanismos de la Academia Mexicana de la Lengua, as well as words and hashtags identified by the Instituto Nacional de las Mujeres. Tweets were collected considering their geolocation. We considered Mexico City as the center and extracted all tweets that were within a radius of 500 km. We annotated the corpus using the scheme proposed in [3]. The annotation provides a specific criteria to separate a tweet from aggressive, offensive and vulgar, based on the linguistic characteristics and intent of the message. Table 1 shows some examples labeled as aggressive and non-aggressive. As can be intuited, the task of labeling aggressiveness is challenging, especially because in most cases it is necessary to interpret the message in a given context. The collected corpus consists of more than 10 thousand tweets. For the evaluation exercise, we divided the corpus into two parts, one for training and the other for the test. Table 2 shows the distribution of this corpus. The non-aggressive class is the majority class in both partitions. For readers interested in more details, [4] describes the methodology followed for the construction of the Mexican Aggressiveness Corpus. 3.2. Fake News Detection The Spanish Fake News Corpus is a collection of news compiled from several web sources: established newspaper websites, media companies websites, special websites dedicated to 224 Class Training Corpus Test Corpus Not Aggressive 5222 2238 Aggressive 2110 905 Σ 7332 3143 Table 2: Mexican aggressiveness corpus: distribution of the classes. validating fake news, websites designated by different journalists as sites that regularly publish fake news. The news were collected from January to July of 2018 and all of them were written in Mexican Spanish [5]. The assembled corpus has 971 news. The corpus was manually labeled using two classes (true or fake), and considering the following criteria: • A news report is true if there is evidence that it has been published in reliable sites. • A news report is fake if there are news from reliable sites or specialized websites in the detection of deceptive content that contradict it, or if no other evidence was found about the news besides the given source. The data collection includes true-fake news pairs of different events to have a corpus as balanced as possible. Additionally, in order to avoid topic bias, the corpus covers news from 9 different topics: Science, Sport, Economy, Education, Entertainment, Politics, Health, Security, and Society. As can be seen in Table 3, the number of fake and true news is balanced; approxi- mately 70% are used as training corpus (676 news), and 30% as the test corpus (295 news). For readers interested in more details, [5] describes the methodology followed for the construction of the Spanish Fake News corpus. Training corpus Testing corpus Category True Fake True Fake Science 32 30 14 13 Sport 45 41 21 17 Economy 18 12 6 7 Education 6 9 4 3 Entertainment 48 55 22 23 Politics 121 105 54 43 Health 16 16 7 7 Security 11 18 6 7 Society 41 52 19 22 Σ 338 338 153 142 Table 3: Spanish Fake News Corpus: distribution of the classes. 225 Idiap-UAM UMUTeam DeepMath ITCG-SD UGalileo Intensos CIMAT UACh Ares UPB Approach Transformers X X X X Traditional Deep Neural Networks X X X BoW, n-grams, Stylometrics X X X X X Table 4: General approach of each participating team. 3.3. Performance Measure For both tracks, the final score corresponds to the 𝐹1 -measure for the target class, that is, fake news and aggressive messages respectively. 4. Overview of the Submitted Approaches At this edition, eleven teams submitted one or more solutions; six teams participated in the fake news detection task, and nine participated in the aggressiveness identification task. This section presents a summary of their approaches regarding preprocessing steps, features, and classification algorithms. In Table 4 we indicate the general approach used for each team. It can be appreciated that participants used three general approaches: transformers, deep neural networks, and traditional representations like BoW and n-grams feeding a SVM classifier. Fol- lowing, we briefly describe each of the participating methods. • Idiap and UAM Participation at MEX-A3T Evaluation Campaign [6] – Tasks: Fake News Detection; Aggressiveness Detection. – Team name: Idiap-UAM – Summary: The authors used a Supervised Autoencoder (SAE), that is, a neural network that learns a representation (encoding) of input data and then learns to reconstruct the original input. They used three different types of features as inputs representation: word n-grams, char n-grams, and BETO encodings. The best perfor- mance was obtained when the autoencoder was fed with the combination of the three input representations. • Transformers and Data Augmentation for Aggressiveness Detection in Mexican Spanish [7] – Tasks: Fake News Detection; Aggressiveness Detection. – Team name: CIMAT – Summary: The authors proposed two different strategies for the aggressiveness detection task. The first strategy consisted of an ensemble of different BETO models 226 (BERT models trained in Spanish) with majority and weighted voting schemes. The second strategy considered data augmentation, a technique to generate new instances from the original training data. They reported as best strategy the use of 20 ensemble models and adversarial data augmentation, where the model creates a new input for each misclassified sentence. • ITCG’s participation at MEX-A3T 2020: Aggressive Identification and Fake News detection based on textual features for Mexican Spanish [8] – Tasks: Fake News Detection; Aggressiveness Detection. – Team name: Intensos – Summary: The authors presented a traditional text classification approach, using a combination of binary and tf-idf text representations. They reported that their best result was using a SVM with this representation, without removing stop words. • TecNM at MEX-A3T 2020: Fake News and Aggressiveness Analysis in Spanish Mexican [9] – Tasks: Fake News Detection; Aggressiveness Detection. – Team name: ITCG-SD – Summary: The authors presented a traditional machine learning approach, using a bag-of-words representation with TF and TF-IDF weights. Their best results were obtained when applying a neural network and a SVM classifier. • UPB at MEX-A3T 2020: Detecting Aggressiveness in Mexican Spanish Social Media Content by Fine-tuning Transformer-Based Models [10] – Tasks: Aggressiveness Detection – Team name: UPB – Summary: The authors presented different approaches to fine-tune pre-trained Spanish, English, and multilingual transformer-based models. The best result they reported was using BETO, a BERT model trained in Spanish, but fine-tuned with the MEX-A3T aggressiveness train set and the HatEval Spanish dataset. • UACh at MEX-A3T 2020: Detecting Aggressive Tweets by Incorporating Author and Message Context [11] – Tasks: Aggressiveness Detection – Team name: UACh – Summary: The authors explored the idea of using context information, such as message and author metadata. Their proposed approach has two stages. In the first stage, messages were classified considering only their content; this classification was done using BETO. Then, in the second stage, the predictions of the first stage were concatenated with the author and message metadata to form a new representation vector, which was employed by a XGBoost classifier. 227 • GRU with Author Profiling Information to Detect Aggressiveness [12] – Tasks: Aggressiveness Detection – Team name: DeepMath – Summary: The authors presented a bi-directional GRU model using words as inputs. The output of this model was combined with the predictions of gender and occupation of users obtained by a reference model, using a simple concatenation and considering a one-hot-encoding. At the end, the model considered only the gender and Sciences-Student occupation categories; the rest of the categories were discarded by a chi-squared feature selection criterion. • UMUTeam at MEX-A3T’2020: Towards Aggressiveness Identification in Mexican-Spanish tweets with linguistic features and word-embeddings [13] – Tasks: Aggressiveness Detection – Team name: UMUTeam – Summary: The authors evaluated the characterization of aggressive messages through a set of linguistic attributes and sentence-embeddings. They used two types of classifiers, a support vector machine and two types of deep neural networks. Their best result was obtained by a Bi-LSTM network trained with FastText embeddings and combined with linguistic features. • Detecting Aggressiveness in Mexican Spanish Tweets with LSTM + GRU and LSTM + CNN Architectures [14] – Tasks: Aggressiveness Detection – Team name: UGalileo – Summary: The authors proposed the use of two different architectures based on deep learning models. The first architecture consisted of a Bi-GRU and a Bi-LSTM networks, where the outputs are concatenated and then a prediction layer is added. For the second architecture, the authors used a Bi-LSTM and CNN network, then a concatenation and a prediction layer. Both architectures achieved similar results over the test dataset partition. • Ares Team: No system description paper – Tasks: Fake News Detection – Team name: Ares – Summary: The authors proposed the use of a TF-IDF representation, combined with the capital letter ratio in the article, total number of words in the body of the article, and percentage of coincidence between the words of the body and the headline of the article. The variable selection algorithm is an F-test, and a linear algorithm with training through SGD classification 228 5. Experimental evaluation and analysis of results This section summarizes the results obtained by the participants of MEX-A3T 2020, comparing and analyzing in detail the performance of their submitted solutions. For the final phase of the challenge, participants sent their predictions for the test partition, the performance on this data was used to rank them. We used the F1 over the interest class as the main evaluation measure. For computing the evaluation scores we relied on the EvALL platform [15]. EvALL is an online evaluation service targeting information retrieval and natural language processing tasks. It is a complete evaluation framework that receives as input the ground truth and the predictive outputs of systems and returns a complete performance evaluation. In the following subsections, we report the results obtained by participants as evaluated by EvALL and an analysis of their results. As baseline methods, we implemented two popular approaches that have shown to be hard to beat in both tasks: i) a classification model trained on the bag of words (BoW) representation, and ii) a Bi-GRU neural network. Also, we compared the systems’ results against the result from INGEOTEC, the best performing system at the first MEX-A3T edition [16]. For both classification tasks the BoW approach was applied, in which we used all vocabulary from the corpora, removing stopwords and special characters. The size of the representation of each text was 14,913 for fake news detection, and 5,212 for aggressiveness identification; for classification we used a SVM classifier with linear kernel and 𝐶 = 1. On the other hand, we also applied a Bi-GRU neural network in the task of aggressiveness identification. In this approach texts were pre-processed by removing stopwords, special characters, and converting all emojis to words (e.g. , - ‘cara sonriente’). As input features pre-trained Spanish FastText[17] embeddings were used, and a fully-connected softmax layer handle the class probabilities. 5.1. Aggressiveness detection results Table 5 presents the results obtained by the teams in the aggressiveness detection task. For this task, we sort the teams by their 𝐹1 results over the aggressive class. For extra analysis, we also report the accuracy, the macro 𝐹1 and the 𝐹1 in the non-aggressive class. The approach submitted by the CIMAT team obtained the best performance, outperforming all teams, and the proposed baselines. To analyze in more detail the participants’ results, we focused on the analysis of the com- plementariness and diversity of their predictions. To measure the complementarity, we used the Maximum Possible Accuracy (MPA) metric, which is defined as the quotient of the cor- rectly classified instances over the total number of test instances. We considered an instance as correctly classified if at least one of the participating teams classified it correctly. On the other hand, to measure the diversity we used the Coincident Failure Diversity (CFD) metric [18], which focuses on calculating the error diversity among the participants predictions. The minimum value of this measure is 0 when all teams simultaneously predict a pattern correctly or wrongly, while the maximum value is 1, when the misclassifications are all unique. Table 6 shows the results of applying the Maximum Possible Accuracy, and the Coincident Failure Diversity metrics over all participating teams and the different approaches in the aggressiveness identification task. From these results, it is possible to observe that the MPA 229 Team Aggressive Non aggressive 𝐹𝑚𝑎𝑐𝑟𝑜 Accuracy CIMAT-1 0.7998 0.9195 0.8596 0.8851 CIMAT-2 0.7971 0.9205 0.8588 0.8858 UPB-2 0.7969 0.9107 0.8538 0.8759 UACh-2 0.7720 0.9042 0.8381 0.8651 Baseline(INGEOTEC) 0.7468 0.8933 0.8200 0.8498 Idiap-UAM-1 0.7255 0.8886 0.8071 0.8416 Baseline (Bi-GRU ) 0.7124 0.8841 0.7983 0.8348 Idiap-UAM-2 0.7066 0.8953 0.8010 0.8451 UACh-1 0.7062 0.8861 0.7961 0.8358 DeepMath-1 0.7001 0.8544 0.7773 0.8040 DeepMath-2 0.6957 0.8537 0.7747 0.8024 Baseline (BoW-SVM) 0.6760 0.8780 0.7770 0.8228 UMUTeam-2 0.6727 0.8706 0.7716 0.8145 Intensos-1 0.6619 0.8752 0.7686 0.8177 UMUTeam-3 0.6516 0.8771 0.7644 0.8183 UGalileo-2 0.6388 0.8208 0.7298 0.7604 UGalileo-1 0.6387 0.8430 0.7408 0.7811 ITCG-SD 0.6080 0.8820 0.7450 0.8186 UMUTeam-1 0.5892 0.8430 0.7161 0.7728 UPB-1 0.3437 0.8463 0.5950 0.7509 Intensos-2 0.2515 0.7664 0.5090 0.6440 Table 5: Results for the aggressiveness identification task from all teams and from the teams using the different types of approaches is considerably greater than the best performance Accuracy obtained by the CIMAT team, suggesting that the participants systems and approaches are complementary to each other. In terms of the different approaches, the Transformes approach obtained the greater MPA over the BoW and traditional DNN approaches, which is consistent with the results shown in Table 5, where the top-teams obtained their best performances using transformers. The results obtained with the CFD metric, show that there is a high error diversity in the predictions of the DNN approaches, which is consistent with the performance shown by this approach using the MPA metric. On the other hand, the approach that obtained the best performance results with the MPA metric, showed less error diversity in the participants predictions. Approach Best Accuracy MPA CFD Number of systems All teams 0.8858 0.9965 0.3204 18 Transformers 0.8858 0.9885 0.3549 9 BoW and n-grams 0.8177 0.9287 0.4357 3 Traditional DNN 0.8040 0.9707 0.4731 6 Table 6: Comparison of MPA and CFD results between the different general approaches 230 As a result of applying the MPA metric, it was possible to identify those common errors across all systems. In fact, there are only 10 tweets that no system could classify correctly. All of them are aggressive tweets that were classified as non-aggressive. Below we present some of these tweets, where we can identify offenses with no vulgar or profane words, the use of out of the training vocabulary words, as well as some ironic comments. • No se vale estar “buena” por gorda y “flaca” por anoréxica, chingenle al gym y a la dieta. • Viendo a #AnetteMichel con ese vestido en #MasteChefMx se me hace bien gorda. • @USUARIO te quiero meter todo mi camote. • Las fans de odisseo se ven bien bonitas en sus fotos de twitter y estan bien feas en persona. • #InyeccionLetalParaLGBT y para la que me ilusiono, luego me echó la culpa de que le rompi el corazón, me dejó, y volvió con su ex... ardida! 5.2. Fake News Detection results Table 7 shows a summary of the results obtained by each team in the Fake News detection track. As in the previous section, we report the 𝐹1 in both fake and true classes, the macro 𝐹1 , and the accuracy. We used the 𝐹1 over the fake class to rank participants. In this task, the approach submitted by the Idiap-UAM team outperformed all the other approaches and the baselines. It can be observed that all systems achieved balanced results in both fake and true classes, however, the 𝐹1 score of the true class is in general slightly better in almost all systems. All participated teams used a machine-learning-based approach relying on style-based features, i.e., neither team used a knowledge base or Web searching to verify the authenticity of the news. Team Fake Truth 𝐹𝑚𝑎𝑐𝑟𝑜 Accuracy Idiap-UAM-1 0.8444 0.8688 0.8566 0.8576 Idiap-UAM-2 0.8406 0.8599 0.8502 0.8508 Ares 0.8188 0.8151 0.8169 0.8169 CIMAT-1 0.7943 0.8117 0.8030 0.8034 Baseline (BoW-RF ) 0.7850 0.7879 0.7864 0.7864 Intensos-2 0.7703 0.7883 0.7793 0.7797 Intensos-1 0.7597 0.7376 0.7487 0.7492 Baseline (INGEOTEC) 0.7596 0.7723 0.7659 0.7661 ITCG-SD 0.7464 0.7771 0.7617 0.7627 Table 7: Results for the fake news detection task The analysis of the complementariness and the diversity of the predictions of the different approaches using the MPA and CFD metrics are shown in the Table 8. The table uses the following hierarchy for the participants: Transformers approach considers only the CIMAT team, BoW and n-gram approach considers Ares, Intensos and ITCG-SD teams, Hybrid approaches includes Idiap-UAM teams. The CFD for the Transformers methodologies row could not be 231 calculated because there is only one participant. The MPA for the row of all teams has the highest value, which means that the teams’ approaches complement each other.The best systems (Idiap-UAM 1.2) obtained a lower MPA value, around 9%, compared to that of all teams. On the contrary, the systems with BoW and n-grams approach obtained an MPA value similar to that of all the teams, showing greater complementariness in the proposed approaches. The teams that implemented BoW and n-gram approaches showed a greatest diversity of errors in their predictions, this is consistent with their MPA performance and the heterogeneity of the approaches. The lowest value for the CFD score corresponds to the Idiap-UAM runs, which means that their predictions are alike, this can be explained because both runs use the same core. Approach Best Accuracy MPA CFD No. of systems All teams 0.8576 0.9729 0.3531 7 Hybrid (Idiap-UAM 1,2) 0.8576 0.8814 0.1615 2 Transformers (CIMAT) 0.8034 0.8034 - 1 BoW, n-grams (Intensos 1,2 + ITCG + 0.8169 0.9458 0.3835 4 Ares) Table 8: Comparison of MPA and CFD results between the different general approaches for Fake News track The Table 9 shows the results of the 𝐹1 score for the fake class in the different topics of the corpus. It can be observed that the Economy category is the most difficult for all the evaluated approaches. On the contrary, there were three systems that correctly classified all the instances in the Education topic, and two systems that achieved perfect scores in the the Security topic. The performance of the systems does not seem related to the number of news each topic has. Politics, Entertainment, Sport and Science are the largest topics, while Education, Health and Security are the less represented groups. However, it seems that the most difficult topic to identify was economy, although it could seem that, having more examples, could help the system to learn better. We identified the common prediction errors across all the systems and find that there were only 8 news, 7 in the fake class that none of the approaches classify correctly. Table 10 shows the classified instances, it can be observed that the 37.5% of the missclassified news belong to the Economy category, while the 25% are included in the group of politics. The groups of society, science and health, show one entry that has not been correctly classified by any team (12% of the total). 6. Conclusions This paper described the design and results of the MEX-A3T shared task collocated with IberLef 2020. MEX-A3T stands for Authorship and Aggressiveness Analysis in Mexican Spanish Tweets. Two tasks were proposed, one targeting fake news detection and the other focused on aggressiveness detection. Regarding aggressiveness detection, this has been the third edition of the task, and this 232 Entertainment Education Economy Security Science Politics Society Health Sport Team Idiap-UAM-1 1.00 0.88 0.92 1.00 0.77 0.60 0.79 0.84 0.84 Idiap-UAM-2 1.00 0.82 0.83 1.00 0.77 0.60 0.81 0.85 0.86 Ares 1.00 0.88 0.83 0.86 0.86 0.60 0.74 0.82 0.83 CIMAT-1 0.86 0.81 0.74 0.86 0.86 0.60 0.79 0.83 0.76 BoW-RF 0.86 0.85 0.73 0.92 0.86 0.60 0.68 0.81 0.77 Intensos-2 0.67 0.91 0.79 0.93 0.71 0.44 0.65 0.77 0.78 Intensos-1 0.75 0.85 0.69 0.92 0.88 0.73 0.68 0.79 0.67 INGEOTEC 0.88 0.75 0.77 0.86 0.86 0.60 0.74 0.76 0.75 ITCG-SD 0.75 0.80 0.64 0.67 0.77 0.55 0.69 0.79 0.79 Table 9: Results for the fake news detection task Label Topic Sources Title True Health El país Barba, una moda que daña tu salud Fake Society Actualidad RT “Las puertas del infierno”: Un extraño video univer- sitario causa ’terror’ en la Red Fake Science Rey Misterios Asteroide contra la Tierra Fake Economy Alerta digital El Gobierno de Sánchez gastará *NUMBER* millones de euros en demoler Fake Economy Lamula Se debe pagar Impuesto a la Renta por el uso de satélites de comunicación Fake Economy Voz del Sur La CIA ya conoce la fecha del próxima caída económica que podría afectar a México Fake Politics Criterio Universal Exhiben pacto Duarte-Morena Fake Politics Sin embargo Forbes afirma que Angélica Rivera está ya en la lista de mexicanos millonarios en EU Table 10: Fake Instances Missclassified by all Systems. year the results have outperformed the past competitions. Clearly, the use of transformers has achieved the best results, and shows the appropriateness of this method for approaching this key topic in NLP. Although this has been the first edition of the task in fake news detection, the results that have been achieved are really promising. Contrary to the task of aggressiveness detection, the best results here have been reached by hybrid approaches, using both transformers and BoW-n-grams, or just n-grams. Traditional Deep Neural Networks have not been used in this task. Summing up, the achievements of these tasks of the IberLef evaluation forum showed how some key topics in NLP using Spanish as a source language have experienced a great development 233 in recent years. Both data compilation and the use of cutting-edge methods, have placed Spanish among the languages with the most accurate applications in the area of natural language processing. Acknowledgments Our special thanks go to all of MEX-A3T’s participants. We would like to thank CONACyT for partially supporting this work under grants CB-2015-01-257383, FC-2016-2410, CB-A1-S-27780, the Thematic Networks program (Language Technologies Thematic Network), and UNAM under PAPIIT projects IA401219, TA100520. The first author thanks for doctoral scholarship CONACyT-Mexico 654803 and the second for master scholarship CONACyT-Mexico. References [1] M. E. Aragón, M. Á. Álvarez-Carmona, M. Montes-y Gómez, H. J. Escalante, L. Villaseñor- Pineda, D. Moctezuma, Overview of mex-3at at iberlef 2019: Authorship and aggressiveness analysis in mexican spanish tweets, in: Notebook Papers of 1st SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Bilbao, Spain, September, 2019, p. . [2] M. Casavantes, R. López, L. C. González, Uach at mex-a3t 2019: Preliminary results on detecting aggressive tweets by adding author information via an unsupervised strategy, in: In Proceedings of the First Workshop for Iberian Languages Evaluation Forum (IberLEF 2019), CEUR WS Proceedings, 2019, p. . [3] M.-J. Díaz-Torres, P. A. Moran-Méndez, L. Villasenor-Pineda, M. Montes-y Gomez, J. Aguil- era, L. Meneses-Lerin, Automatic detection of offensive language in social media: Defining linguistic criteria to build a mexican spanish dataset, in: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, 2020, p. . [4] M. Á. Álvarez-Carmona, E. Guzmán-Falcón, M. Montes-y Gómez, H. J. Escalante, L. Villaseñor-Pineda, V. Reyes-Meza, A. Rico-Sulayes, Overview of mex-a3t at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets, in: Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL), Seville, Spain, September, 2018, p. . [5] J.-P. Posadas-Durán, H. Gómez-Adorno, G. Sidorov, J. J. M. Escobar, Detection of fake news in a new corpus for the spanish language, Journal of Intelligent & Fuzzy Systems 36 (2019) 4869–4876. [6] E. Villatoro-Tello, G. Ramírez-de-la Rosa, S. Kumar, S. Parida, M. Petr, Idiap and uam par- ticipation at mex-a3t evaluation campaign, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [7] M. Guzman-Silverio, A. Balderas-Paredes, A.-P. López-Monroy, Transformers and data augmentation for aggressiveness detection in mexican spanish, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [8] D. Zaizar-Gutierrez, D. Fajardo-Delgado, M.-A. Álvarez Carmona, Itcg’s participation at mex-a3t 2020: Aggressive identification and fake news detection based on textual features 234 for mexican spanish, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [9] S. Arce-Cardenas, D. Fajardo-Delgado, M.-A. Álvarez Carmona, Tecnm at mex-a3t 2020: Fake news and aggressiveness analysis in spanish mexican, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [10] M.-A. Tanase, G.-E. Zaharia, D.-C. Cercel, M. Dascalu, Upb at mex-a3t 2020: Detecting aggressiveness in mexican spanish social media content by fine-tuning transformer-based models, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [11] M. Casavantes, R. López, L.-C. González, Uach at mex-a3t 2020: Detecting aggressive tweets by incorporating author and message context, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [12] M.-G. Garrido-Espinosa, A. Rosales-Pérez, A.-P. López-Monroy, Gru with author profiling information to detect aggressiveness, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [13] J.-A. García-Díaz, R. Valencia-García, Umuteam at mex-a3t’2020: Towards aggressiveness identification in mexican-spanish tweets with linguistic features and word-embeddings, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [14] V. Peñaloza, Detecting aggressiveness in mexican spanish tweets with lstm + gru and lstm + cnn architectures, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. . [15] E. Amigó, J. Carrillo-de Albornoz, M. Almagro-Cádiz, J. Gonzalo, J. Rodríguez-Vidal, F. Verdejo, Evall: Open access evaluation for information access systems, in: Proceed- ings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2017, pp. 1301–1304. [16] M. Graff, S. Miranda-Jiménez, E. S. Tellez, D. Moctezuma, V. Salgado, J. Ortiz-Bejar, C. N. Sánchez, Ingeotec at mex-a3t: Author profiling and aggressiveness analysis in twitter using 𝜇tc and evomsa, in: In Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), CEUR WS Proceedings, 2018, p. . [17] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning word vectors for 157 languages, in: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018, p. . [18] E. K. Tang, P. N. Suganthan, X. Yao, An analysis of diversity measures, Mach. Learn. 65 (2006) 247–271. URL: https://doi.org/10.1007/s10994-006-9449-2. doi:1 0 . 1 0 0 7 / s10994- 006- 9449- 2. 235