=Paper=
{{Paper
|id=Vol-2664/mex-a3t_overview
|storemode=property
|title=Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish
|pdfUrl=https://ceur-ws.org/Vol-2664/mex-a3t_overview.pdf
|volume=Vol-2664
|authors=Mario Ezra Aragón,Horacio Jarquín-Vásquez,Manuel Montes-y-Gómez,Hugo Jair Escalante,Luis Villaseñor-Pineda,Helena Gómez-Adorno,Juan-Pablo Posadas-Durán,Gemma Bel-Enguix
|dblpUrl=https://dblp.org/rec/conf/sepln/AragonJMEPGPB20
}}
==Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish==
Overview of MEX-A3T at IberLEF 2020: Fake News
and Aggressiveness Analysis in Mexican Spanish
Mario Ezra Aragóna , Horacio Jarquín-Vásqueza , Manuel Montes-y-Gómeza , Hugo
Jair Escalantea , Luis Villaseñor-Pinedaa,b , Helena Gómez-Adornoc ,
Juan-Pablo Posadas-Duráne and Gemma Bel-Enguixd
a
Laboratorio de Tecnologías del Lenguaje (INAOE), Mexico
b
Centre de Recherche en Linguistique Française GRAMMATICA (EA 4521), Université d’Artois, France
c
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (UNAM), Mexico
d
Instituto de Ingeniería (UNAM), Mexico
e
Escuela Superior de Ingeniería Mecánica y Eléctrica, Unidad Zacatenco (IPN), Mexico
Abstract
This paper presents the overview of MEX-A3T 2020, the third edition of this lab under the IberLEF
conference. The main purpose of MEX-A3T is to explore different methodologies and strategies related
to the analysis of social media content in Mexican Spanish. This year edition focuses in the identification
of fake news and the detection of aggressive tweets. For this purpose, we provided different news from
verified web sources and a corpus of tweets from Mexican users.
Keywords
Fake news detection, aggressiveness detection, MEX-A3T, IberLEF
1. Introduction
The goal of the third edition of MEX-A3T is to further improve the research in NLP tasks as well
as to continue pushing the computational treatment of the Mexican Spanish. As a novelty, this
year’s proposal introduces a new track on fake news detection and an improved corpus for the
aggressive language detection track. The MEX-A3T@IberLEF2020 has the following two tracks:
Aggressiveness Detection Track: Social networks represent a significant threat to users
who are exposed to many risks and potential attacks. One of such threats is aggressive comments,
which can produce long-term harm to victims, in the more accurate cases they can lead to suicide.
This track follows up on last year’s evaluation task; it focuses on the detection of aggressive
Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020)
email: mearagon@inaoep.mx (M.E. Aragón); horacio.jarquin@inaoep.mx (H. Jarquín-Vásquez);
mmontesg@inaoep.mx (M. Montes-y-Gómez); hugojair@inaoep.mx (H.J. Escalante); villasen@inaoep.mx (L.
Villaseñor-Pineda); helena.gomez@iimas.unam.mx (H. Gómez-Adorno); jposadasd@ipn.mx (J. Posadas-Durán);
gbele@iingen.unam.mx (G. Bel-Enguix)
orcid: 0000-0002-8213-957X (M.E. Aragón); 0000-0000-0000-0000 (H. Jarquín-Vásquez); 0000-0002-7601-501X (M.
Montes-y-Gómez); 0000-0003-4603-3513 (H.J. Escalante); 0000-0003-1294-9128 (L. Villaseñor-Pineda);
0000-0002-6966-9912 (H. Gómez-Adorno); 0000-0001-9496-1328 (J. Posadas-Durán); 0000-0002-1411-5736 (G.
Bel-Enguix)
© 2020 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY
4.0). IberLEF 2020, September 2020, Málaga, Spain.
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
tweets in Mexican Spanish. However, for this year, the criteria for identifying aggression have
been revised and a new enhanced data set has been created.
Fake News Detection Track: Fake news provide information that aims to manipulate people
for different purposes: terrorism, political elections, advertisement, among others. In social
networks, misinformation extends in seconds among thousands of people, so it is necessary to
develop tools that help control the amount of false information on the web. Particularly, fake
news detection systems aim to help users to detect and filter out potentially deceptive news.
The Fake News Detection Track consists in classifying a given set of news written in Mexican
Spanish between true and fake.
The remainder of this paper is as follows: Section 2 covers a brief description of the previous
edition of MEX-A3T. Section 3 presents the evaluation framework used at MEX-A3T 2020.
Section 4 shows an overview of the participating approaches. Section 5 reports and analyzes
the results obtained by the participating teams. Finally, Section 6 presents our conclusions from
this evaluation exercise.
2. MEX-A3T 2019
MEX-A3T is a forum for the analysis of social media content in Mexican Spanish. Last year,
we organized the second edition of the MEX-A3T shared task [1], focusing on the problems
of author profiling and aggressiveness detection. A variety of methods were proposed by the
participants, comprising content-based (bag of words, word n-grams, term vectors, dictionary
words, and so on), stylistic-based features (frequencies, punctuation, POS, Twitter-specific
elements, slang words, and so forth), and approaches based on neural networks (CNN, LSTM,
and others).
For author profiling, as a novelty of previous year’s edition, it was considered the use of text
and images as information sources. Our purpose was to study the relevance and complementarity
of multimodal data for profiling social media users. Sadly, participants could not find an effective
way to take advantage of both types of information, and did not outperform the baselines
proposed.
In the case of the aggressiveness identification, the top-ranked team was UACh [2]. This team
used two main kinds of features, character n-grams and word embeddings, and employed two
different classifiers, an SVM and a multilayer perceptron. The main idea of their participation
was the inclusion of features for giving context to the text messages, and to explore if people
verbally attack differently depending on their traits and overall environment.
3. MEX-A3T 2020 Evaluation Framework
This section outlines the construction of the two used corpora, highlighting particular properties,
challenges, and novelties. It also presents the evaluation measures used for both tasks.
223
Aggressive Non-Aggressive
Y por que disculpa? El otro joto comenzó... que tiene de especial la tonta texas
Le hubieras dado dos, no mas por joto y
metiche
Yo voy cualquiera que no seas tu @USUARIO No puedo creer que nos sig-
@USUARIO . Viejo ladrón HDP. amos matando por tontas ideologías
Indios estupidos no saben pa que putas las tontas no van al cielo es mi religión
es la pasarela... una vez vi cruzar unas
cabras por ahi.. no entiendo como ellas si
entienden
Table 1: Aggressive and Non-Aggressive Tweets.
3.1. Aggressiveness Detection
Social networks represent a significant threat to users exposed to many risks and potential
attacks. One such threat is aggressive comments, which can produce long-term harm to victims,
and in some cases, they can lead to suicide. This track focuses on the detection of aggressive
comments on Twitter, a topic with little study in the Ibero-American community. Participants
have to develop methods to determine whether a tweet is aggressive or not. The track is
challenging by the fact that tweets come from Mexican users with a variety of backgrounds and
social expressions.
We built a corpus of tweets for the task of aggressiveness detection from Mexican accounts.
First, we selected a set of terms that served as seeds for extracting the tweets. We used the
words classified as vulgar and non-colloquial in the Diccionario de Mexicanismos de la Academia
Mexicana de la Lengua, as well as words and hashtags identified by the Instituto Nacional de
las Mujeres. Tweets were collected considering their geolocation. We considered Mexico City
as the center and extracted all tweets that were within a radius of 500 km. We annotated the
corpus using the scheme proposed in [3]. The annotation provides a specific criteria to separate
a tweet from aggressive, offensive and vulgar, based on the linguistic characteristics and intent
of the message. Table 1 shows some examples labeled as aggressive and non-aggressive. As can
be intuited, the task of labeling aggressiveness is challenging, especially because in most cases
it is necessary to interpret the message in a given context.
The collected corpus consists of more than 10 thousand tweets. For the evaluation exercise,
we divided the corpus into two parts, one for training and the other for the test. Table 2
shows the distribution of this corpus. The non-aggressive class is the majority class in both
partitions. For readers interested in more details, [4] describes the methodology followed for
the construction of the Mexican Aggressiveness Corpus.
3.2. Fake News Detection
The Spanish Fake News Corpus is a collection of news compiled from several web sources:
established newspaper websites, media companies websites, special websites dedicated to
224
Class Training Corpus Test Corpus
Not Aggressive 5222 2238
Aggressive 2110 905
Σ 7332 3143
Table 2: Mexican aggressiveness corpus: distribution of the classes.
validating fake news, websites designated by different journalists as sites that regularly publish
fake news. The news were collected from January to July of 2018 and all of them were written
in Mexican Spanish [5]. The assembled corpus has 971 news.
The corpus was manually labeled using two classes (true or fake), and considering the
following criteria:
• A news report is true if there is evidence that it has been published in reliable sites.
• A news report is fake if there are news from reliable sites or specialized websites in the
detection of deceptive content that contradict it, or if no other evidence was found about
the news besides the given source.
The data collection includes true-fake news pairs of different events to have a corpus as
balanced as possible. Additionally, in order to avoid topic bias, the corpus covers news from 9
different topics: Science, Sport, Economy, Education, Entertainment, Politics, Health, Security,
and Society. As can be seen in Table 3, the number of fake and true news is balanced; approxi-
mately 70% are used as training corpus (676 news), and 30% as the test corpus (295 news). For
readers interested in more details, [5] describes the methodology followed for the construction
of the Spanish Fake News corpus.
Training corpus Testing corpus
Category
True Fake True Fake
Science 32 30 14 13
Sport 45 41 21 17
Economy 18 12 6 7
Education 6 9 4 3
Entertainment 48 55 22 23
Politics 121 105 54 43
Health 16 16 7 7
Security 11 18 6 7
Society 41 52 19 22
Σ 338 338 153 142
Table 3: Spanish Fake News Corpus: distribution of the classes.
225
Idiap-UAM
UMUTeam
DeepMath
ITCG-SD
UGalileo
Intensos
CIMAT
UACh
Ares
UPB
Approach
Transformers X X X X
Traditional Deep Neural Networks X X X
BoW, n-grams, Stylometrics X X X X X
Table 4: General approach of each participating team.
3.3. Performance Measure
For both tracks, the final score corresponds to the 𝐹1 -measure for the target class, that is, fake
news and aggressive messages respectively.
4. Overview of the Submitted Approaches
At this edition, eleven teams submitted one or more solutions; six teams participated in the
fake news detection task, and nine participated in the aggressiveness identification task. This
section presents a summary of their approaches regarding preprocessing steps, features, and
classification algorithms. In Table 4 we indicate the general approach used for each team. It
can be appreciated that participants used three general approaches: transformers, deep neural
networks, and traditional representations like BoW and n-grams feeding a SVM classifier. Fol-
lowing, we briefly describe each of the participating methods.
• Idiap and UAM Participation at MEX-A3T Evaluation Campaign [6]
– Tasks: Fake News Detection; Aggressiveness Detection.
– Team name: Idiap-UAM
– Summary: The authors used a Supervised Autoencoder (SAE), that is, a neural
network that learns a representation (encoding) of input data and then learns to
reconstruct the original input. They used three different types of features as inputs
representation: word n-grams, char n-grams, and BETO encodings. The best perfor-
mance was obtained when the autoencoder was fed with the combination of the
three input representations.
• Transformers and Data Augmentation for Aggressiveness Detection in Mexican Spanish [7]
– Tasks: Fake News Detection; Aggressiveness Detection.
– Team name: CIMAT
– Summary: The authors proposed two different strategies for the aggressiveness
detection task. The first strategy consisted of an ensemble of different BETO models
226
(BERT models trained in Spanish) with majority and weighted voting schemes.
The second strategy considered data augmentation, a technique to generate new
instances from the original training data. They reported as best strategy the use of
20 ensemble models and adversarial data augmentation, where the model creates a
new input for each misclassified sentence.
• ITCG’s participation at MEX-A3T 2020: Aggressive Identification and Fake News detection
based on textual features for Mexican Spanish [8]
– Tasks: Fake News Detection; Aggressiveness Detection.
– Team name: Intensos
– Summary: The authors presented a traditional text classification approach, using a
combination of binary and tf-idf text representations. They reported that their best
result was using a SVM with this representation, without removing stop words.
• TecNM at MEX-A3T 2020: Fake News and Aggressiveness Analysis in Spanish Mexican [9]
– Tasks: Fake News Detection; Aggressiveness Detection.
– Team name: ITCG-SD
– Summary: The authors presented a traditional machine learning approach, using
a bag-of-words representation with TF and TF-IDF weights. Their best results were
obtained when applying a neural network and a SVM classifier.
• UPB at MEX-A3T 2020: Detecting Aggressiveness in Mexican Spanish Social Media Content
by Fine-tuning Transformer-Based Models [10]
– Tasks: Aggressiveness Detection
– Team name: UPB
– Summary: The authors presented different approaches to fine-tune pre-trained
Spanish, English, and multilingual transformer-based models. The best result they
reported was using BETO, a BERT model trained in Spanish, but fine-tuned with
the MEX-A3T aggressiveness train set and the HatEval Spanish dataset.
• UACh at MEX-A3T 2020: Detecting Aggressive Tweets by Incorporating Author and Message
Context [11]
– Tasks: Aggressiveness Detection
– Team name: UACh
– Summary: The authors explored the idea of using context information, such as
message and author metadata. Their proposed approach has two stages. In the first
stage, messages were classified considering only their content; this classification was
done using BETO. Then, in the second stage, the predictions of the first stage were
concatenated with the author and message metadata to form a new representation
vector, which was employed by a XGBoost classifier.
227
• GRU with Author Profiling Information to Detect Aggressiveness [12]
– Tasks: Aggressiveness Detection
– Team name: DeepMath
– Summary: The authors presented a bi-directional GRU model using words as
inputs. The output of this model was combined with the predictions of gender and
occupation of users obtained by a reference model, using a simple concatenation
and considering a one-hot-encoding. At the end, the model considered only the
gender and Sciences-Student occupation categories; the rest of the categories were
discarded by a chi-squared feature selection criterion.
• UMUTeam at MEX-A3T’2020: Towards Aggressiveness Identification in Mexican-Spanish
tweets with linguistic features and word-embeddings [13]
– Tasks: Aggressiveness Detection
– Team name: UMUTeam
– Summary: The authors evaluated the characterization of aggressive messages
through a set of linguistic attributes and sentence-embeddings. They used two types
of classifiers, a support vector machine and two types of deep neural networks. Their
best result was obtained by a Bi-LSTM network trained with FastText embeddings
and combined with linguistic features.
• Detecting Aggressiveness in Mexican Spanish Tweets with LSTM + GRU and LSTM + CNN
Architectures [14]
– Tasks: Aggressiveness Detection
– Team name: UGalileo
– Summary: The authors proposed the use of two different architectures based on
deep learning models. The first architecture consisted of a Bi-GRU and a Bi-LSTM
networks, where the outputs are concatenated and then a prediction layer is added.
For the second architecture, the authors used a Bi-LSTM and CNN network, then a
concatenation and a prediction layer. Both architectures achieved similar results
over the test dataset partition.
• Ares Team: No system description paper
– Tasks: Fake News Detection
– Team name: Ares
– Summary: The authors proposed the use of a TF-IDF representation, combined
with the capital letter ratio in the article, total number of words in the body of
the article, and percentage of coincidence between the words of the body and the
headline of the article. The variable selection algorithm is an F-test, and a linear
algorithm with training through SGD classification
228
5. Experimental evaluation and analysis of results
This section summarizes the results obtained by the participants of MEX-A3T 2020, comparing
and analyzing in detail the performance of their submitted solutions. For the final phase of the
challenge, participants sent their predictions for the test partition, the performance on this data
was used to rank them. We used the F1 over the interest class as the main evaluation measure.
For computing the evaluation scores we relied on the EvALL platform [15]. EvALL is an
online evaluation service targeting information retrieval and natural language processing tasks.
It is a complete evaluation framework that receives as input the ground truth and the predictive
outputs of systems and returns a complete performance evaluation. In the following subsections,
we report the results obtained by participants as evaluated by EvALL and an analysis of their
results.
As baseline methods, we implemented two popular approaches that have shown to be hard
to beat in both tasks: i) a classification model trained on the bag of words (BoW) representation,
and ii) a Bi-GRU neural network. Also, we compared the systems’ results against the result
from INGEOTEC, the best performing system at the first MEX-A3T edition [16].
For both classification tasks the BoW approach was applied, in which we used all vocabulary
from the corpora, removing stopwords and special characters. The size of the representation
of each text was 14,913 for fake news detection, and 5,212 for aggressiveness identification;
for classification we used a SVM classifier with linear kernel and 𝐶 = 1. On the other hand,
we also applied a Bi-GRU neural network in the task of aggressiveness identification. In this
approach texts were pre-processed by removing stopwords, special characters, and converting
all emojis to words (e.g. , - ‘cara sonriente’). As input features pre-trained Spanish FastText[17]
embeddings were used, and a fully-connected softmax layer handle the class probabilities.
5.1. Aggressiveness detection results
Table 5 presents the results obtained by the teams in the aggressiveness detection task. For
this task, we sort the teams by their 𝐹1 results over the aggressive class. For extra analysis, we
also report the accuracy, the macro 𝐹1 and the 𝐹1 in the non-aggressive class. The approach
submitted by the CIMAT team obtained the best performance, outperforming all teams, and the
proposed baselines.
To analyze in more detail the participants’ results, we focused on the analysis of the com-
plementariness and diversity of their predictions. To measure the complementarity, we used
the Maximum Possible Accuracy (MPA) metric, which is defined as the quotient of the cor-
rectly classified instances over the total number of test instances. We considered an instance
as correctly classified if at least one of the participating teams classified it correctly. On the
other hand, to measure the diversity we used the Coincident Failure Diversity (CFD) metric
[18], which focuses on calculating the error diversity among the participants predictions. The
minimum value of this measure is 0 when all teams simultaneously predict a pattern correctly
or wrongly, while the maximum value is 1, when the misclassifications are all unique.
Table 6 shows the results of applying the Maximum Possible Accuracy, and the Coincident
Failure Diversity metrics over all participating teams and the different approaches in the
aggressiveness identification task. From these results, it is possible to observe that the MPA
229
Team Aggressive Non aggressive 𝐹𝑚𝑎𝑐𝑟𝑜 Accuracy
CIMAT-1 0.7998 0.9195 0.8596 0.8851
CIMAT-2 0.7971 0.9205 0.8588 0.8858
UPB-2 0.7969 0.9107 0.8538 0.8759
UACh-2 0.7720 0.9042 0.8381 0.8651
Baseline(INGEOTEC) 0.7468 0.8933 0.8200 0.8498
Idiap-UAM-1 0.7255 0.8886 0.8071 0.8416
Baseline (Bi-GRU ) 0.7124 0.8841 0.7983 0.8348
Idiap-UAM-2 0.7066 0.8953 0.8010 0.8451
UACh-1 0.7062 0.8861 0.7961 0.8358
DeepMath-1 0.7001 0.8544 0.7773 0.8040
DeepMath-2 0.6957 0.8537 0.7747 0.8024
Baseline (BoW-SVM) 0.6760 0.8780 0.7770 0.8228
UMUTeam-2 0.6727 0.8706 0.7716 0.8145
Intensos-1 0.6619 0.8752 0.7686 0.8177
UMUTeam-3 0.6516 0.8771 0.7644 0.8183
UGalileo-2 0.6388 0.8208 0.7298 0.7604
UGalileo-1 0.6387 0.8430 0.7408 0.7811
ITCG-SD 0.6080 0.8820 0.7450 0.8186
UMUTeam-1 0.5892 0.8430 0.7161 0.7728
UPB-1 0.3437 0.8463 0.5950 0.7509
Intensos-2 0.2515 0.7664 0.5090 0.6440
Table 5: Results for the aggressiveness identification task
from all teams and from the teams using the different types of approaches is considerably
greater than the best performance Accuracy obtained by the CIMAT team, suggesting that the
participants systems and approaches are complementary to each other. In terms of the different
approaches, the Transformes approach obtained the greater MPA over the BoW and traditional
DNN approaches, which is consistent with the results shown in Table 5, where the top-teams
obtained their best performances using transformers. The results obtained with the CFD metric,
show that there is a high error diversity in the predictions of the DNN approaches, which is
consistent with the performance shown by this approach using the MPA metric. On the other
hand, the approach that obtained the best performance results with the MPA metric, showed
less error diversity in the participants predictions.
Approach Best Accuracy MPA CFD Number of systems
All teams 0.8858 0.9965 0.3204 18
Transformers 0.8858 0.9885 0.3549 9
BoW and n-grams 0.8177 0.9287 0.4357 3
Traditional DNN 0.8040 0.9707 0.4731 6
Table 6: Comparison of MPA and CFD results between the different general approaches
230
As a result of applying the MPA metric, it was possible to identify those common errors
across all systems. In fact, there are only 10 tweets that no system could classify correctly. All
of them are aggressive tweets that were classified as non-aggressive. Below we present some of
these tweets, where we can identify offenses with no vulgar or profane words, the use of out of
the training vocabulary words, as well as some ironic comments.
• No se vale estar “buena” por gorda y “flaca” por anoréxica, chingenle al gym y a la dieta.
• Viendo a #AnetteMichel con ese vestido en #MasteChefMx se me hace bien gorda.
• @USUARIO te quiero meter todo mi camote.
• Las fans de odisseo se ven bien bonitas en sus fotos de twitter y estan bien feas en persona.
• #InyeccionLetalParaLGBT y para la que me ilusiono, luego me echó la culpa de que le
rompi el corazón, me dejó, y volvió con su ex... ardida!
5.2. Fake News Detection results
Table 7 shows a summary of the results obtained by each team in the Fake News detection
track. As in the previous section, we report the 𝐹1 in both fake and true classes, the macro
𝐹1 , and the accuracy. We used the 𝐹1 over the fake class to rank participants. In this task, the
approach submitted by the Idiap-UAM team outperformed all the other approaches and the
baselines. It can be observed that all systems achieved balanced results in both fake and true
classes, however, the 𝐹1 score of the true class is in general slightly better in almost all systems.
All participated teams used a machine-learning-based approach relying on style-based features,
i.e., neither team used a knowledge base or Web searching to verify the authenticity of the news.
Team Fake Truth 𝐹𝑚𝑎𝑐𝑟𝑜 Accuracy
Idiap-UAM-1 0.8444 0.8688 0.8566 0.8576
Idiap-UAM-2 0.8406 0.8599 0.8502 0.8508
Ares 0.8188 0.8151 0.8169 0.8169
CIMAT-1 0.7943 0.8117 0.8030 0.8034
Baseline (BoW-RF ) 0.7850 0.7879 0.7864 0.7864
Intensos-2 0.7703 0.7883 0.7793 0.7797
Intensos-1 0.7597 0.7376 0.7487 0.7492
Baseline (INGEOTEC) 0.7596 0.7723 0.7659 0.7661
ITCG-SD 0.7464 0.7771 0.7617 0.7627
Table 7: Results for the fake news detection task
The analysis of the complementariness and the diversity of the predictions of the different
approaches using the MPA and CFD metrics are shown in the Table 8. The table uses the
following hierarchy for the participants: Transformers approach considers only the CIMAT team,
BoW and n-gram approach considers Ares, Intensos and ITCG-SD teams, Hybrid approaches
includes Idiap-UAM teams. The CFD for the Transformers methodologies row could not be
231
calculated because there is only one participant. The MPA for the row of all teams has the
highest value, which means that the teams’ approaches complement each other.The best systems
(Idiap-UAM 1.2) obtained a lower MPA value, around 9%, compared to that of all teams. On
the contrary, the systems with BoW and n-grams approach obtained an MPA value similar to
that of all the teams, showing greater complementariness in the proposed approaches. The
teams that implemented BoW and n-gram approaches showed a greatest diversity of errors in
their predictions, this is consistent with their MPA performance and the heterogeneity of the
approaches. The lowest value for the CFD score corresponds to the Idiap-UAM runs, which
means that their predictions are alike, this can be explained because both runs use the same
core.
Approach Best Accuracy MPA CFD No. of systems
All teams 0.8576 0.9729 0.3531 7
Hybrid (Idiap-UAM 1,2) 0.8576 0.8814 0.1615 2
Transformers (CIMAT) 0.8034 0.8034 - 1
BoW, n-grams (Intensos 1,2 + ITCG + 0.8169 0.9458 0.3835 4
Ares)
Table 8: Comparison of MPA and CFD results between the different general approaches for
Fake News track
The Table 9 shows the results of the 𝐹1 score for the fake class in the different topics of the
corpus. It can be observed that the Economy category is the most difficult for all the evaluated
approaches. On the contrary, there were three systems that correctly classified all the instances
in the Education topic, and two systems that achieved perfect scores in the the Security topic.
The performance of the systems does not seem related to the number of news each topic has.
Politics, Entertainment, Sport and Science are the largest topics, while Education, Health and
Security are the less represented groups. However, it seems that the most difficult topic to
identify was economy, although it could seem that, having more examples, could help the
system to learn better.
We identified the common prediction errors across all the systems and find that there were
only 8 news, 7 in the fake class that none of the approaches classify correctly. Table 10 shows
the classified instances, it can be observed that the 37.5% of the missclassified news belong
to the Economy category, while the 25% are included in the group of politics. The groups of
society, science and health, show one entry that has not been correctly classified by any team
(12% of the total).
6. Conclusions
This paper described the design and results of the MEX-A3T shared task collocated with
IberLef 2020. MEX-A3T stands for Authorship and Aggressiveness Analysis in Mexican Spanish
Tweets. Two tasks were proposed, one targeting fake news detection and the other focused on
aggressiveness detection.
Regarding aggressiveness detection, this has been the third edition of the task, and this
232
Entertainment
Education
Economy
Security
Science
Politics
Society
Health
Sport
Team
Idiap-UAM-1 1.00 0.88 0.92 1.00 0.77 0.60 0.79 0.84 0.84
Idiap-UAM-2 1.00 0.82 0.83 1.00 0.77 0.60 0.81 0.85 0.86
Ares 1.00 0.88 0.83 0.86 0.86 0.60 0.74 0.82 0.83
CIMAT-1 0.86 0.81 0.74 0.86 0.86 0.60 0.79 0.83 0.76
BoW-RF 0.86 0.85 0.73 0.92 0.86 0.60 0.68 0.81 0.77
Intensos-2 0.67 0.91 0.79 0.93 0.71 0.44 0.65 0.77 0.78
Intensos-1 0.75 0.85 0.69 0.92 0.88 0.73 0.68 0.79 0.67
INGEOTEC 0.88 0.75 0.77 0.86 0.86 0.60 0.74 0.76 0.75
ITCG-SD 0.75 0.80 0.64 0.67 0.77 0.55 0.69 0.79 0.79
Table 9: Results for the fake news detection task
Label Topic Sources Title
True Health El país Barba, una moda que daña tu salud
Fake Society Actualidad RT “Las puertas del infierno”: Un extraño video univer-
sitario causa ’terror’ en la Red
Fake Science Rey Misterios Asteroide contra la Tierra
Fake Economy Alerta digital El Gobierno de Sánchez gastará *NUMBER* millones
de euros en demoler
Fake Economy Lamula Se debe pagar Impuesto a la Renta por el uso de
satélites de comunicación
Fake Economy Voz del Sur La CIA ya conoce la fecha del próxima caída
económica que podría afectar a México
Fake Politics Criterio Universal Exhiben pacto Duarte-Morena
Fake Politics Sin embargo Forbes afirma que Angélica Rivera está ya en la lista
de mexicanos millonarios en EU
Table 10: Fake Instances Missclassified by all Systems.
year the results have outperformed the past competitions. Clearly, the use of transformers has
achieved the best results, and shows the appropriateness of this method for approaching this
key topic in NLP.
Although this has been the first edition of the task in fake news detection, the results that
have been achieved are really promising. Contrary to the task of aggressiveness detection,
the best results here have been reached by hybrid approaches, using both transformers and
BoW-n-grams, or just n-grams. Traditional Deep Neural Networks have not been used in this
task.
Summing up, the achievements of these tasks of the IberLef evaluation forum showed how
some key topics in NLP using Spanish as a source language have experienced a great development
233
in recent years. Both data compilation and the use of cutting-edge methods, have placed Spanish
among the languages with the most accurate applications in the area of natural language
processing.
Acknowledgments
Our special thanks go to all of MEX-A3T’s participants. We would like to thank CONACyT for
partially supporting this work under grants CB-2015-01-257383, FC-2016-2410, CB-A1-S-27780,
the Thematic Networks program (Language Technologies Thematic Network), and UNAM
under PAPIIT projects IA401219, TA100520. The first author thanks for doctoral scholarship
CONACyT-Mexico 654803 and the second for master scholarship CONACyT-Mexico.
References
[1] M. E. Aragón, M. Á. Álvarez-Carmona, M. Montes-y Gómez, H. J. Escalante, L. Villaseñor-
Pineda, D. Moctezuma, Overview of mex-3at at iberlef 2019: Authorship and aggressiveness
analysis in mexican spanish tweets, in: Notebook Papers of 1st SEPLN Workshop on
Iberian Languages Evaluation Forum (IberLEF), Bilbao, Spain, September, 2019, p. .
[2] M. Casavantes, R. López, L. C. González, Uach at mex-a3t 2019: Preliminary results on
detecting aggressive tweets by adding author information via an unsupervised strategy, in:
In Proceedings of the First Workshop for Iberian Languages Evaluation Forum (IberLEF
2019), CEUR WS Proceedings, 2019, p. .
[3] M.-J. Díaz-Torres, P. A. Moran-Méndez, L. Villasenor-Pineda, M. Montes-y Gomez, J. Aguil-
era, L. Meneses-Lerin, Automatic detection of offensive language in social media: Defining
linguistic criteria to build a mexican spanish dataset, in: Proceedings of the Second
Workshop on Trolling, Aggression and Cyberbullying, 2020, p. .
[4] M. Á. Álvarez-Carmona, E. Guzmán-Falcón, M. Montes-y Gómez, H. J. Escalante,
L. Villaseñor-Pineda, V. Reyes-Meza, A. Rico-Sulayes, Overview of mex-a3t at ibereval
2018: Authorship and aggressiveness analysis in mexican spanish tweets, in: Notebook
Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for
Iberian Languages (IBEREVAL), Seville, Spain, September, 2018, p. .
[5] J.-P. Posadas-Durán, H. Gómez-Adorno, G. Sidorov, J. J. M. Escobar, Detection of fake
news in a new corpus for the spanish language, Journal of Intelligent & Fuzzy Systems 36
(2019) 4869–4876.
[6] E. Villatoro-Tello, G. Ramírez-de-la Rosa, S. Kumar, S. Parida, M. Petr, Idiap and uam par-
ticipation at mex-a3t evaluation campaign, in: Notebook Papers of 2nd SEPLN Workshop
on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. .
[7] M. Guzman-Silverio, A. Balderas-Paredes, A.-P. López-Monroy, Transformers and data
augmentation for aggressiveness detection in mexican spanish, in: Notebook Papers of
2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain,
September, 2020, p. .
[8] D. Zaizar-Gutierrez, D. Fajardo-Delgado, M.-A. Álvarez Carmona, Itcg’s participation at
mex-a3t 2020: Aggressive identification and fake news detection based on textual features
234
for mexican spanish, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages
Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. .
[9] S. Arce-Cardenas, D. Fajardo-Delgado, M.-A. Álvarez Carmona, Tecnm at mex-a3t 2020:
Fake news and aggressiveness analysis in spanish mexican, in: Notebook Papers of
2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain,
September, 2020, p. .
[10] M.-A. Tanase, G.-E. Zaharia, D.-C. Cercel, M. Dascalu, Upb at mex-a3t 2020: Detecting
aggressiveness in mexican spanish social media content by fine-tuning transformer-based
models, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation
Forum (IberLEF), Malaga, Spain, September, 2020, p. .
[11] M. Casavantes, R. López, L.-C. González, Uach at mex-a3t 2020: Detecting aggressive
tweets by incorporating author and message context, in: Notebook Papers of 2nd SEPLN
Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September,
2020, p. .
[12] M.-G. Garrido-Espinosa, A. Rosales-Pérez, A.-P. López-Monroy, Gru with author profiling
information to detect aggressiveness, in: Notebook Papers of 2nd SEPLN Workshop on
Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. .
[13] J.-A. García-Díaz, R. Valencia-García, Umuteam at mex-a3t’2020: Towards aggressiveness
identification in mexican-spanish tweets with linguistic features and word-embeddings,
in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum
(IberLEF), Malaga, Spain, September, 2020, p. .
[14] V. Peñaloza, Detecting aggressiveness in mexican spanish tweets with lstm + gru and lstm
+ cnn architectures, in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages
Evaluation Forum (IberLEF), Malaga, Spain, September, 2020, p. .
[15] E. Amigó, J. Carrillo-de Albornoz, M. Almagro-Cádiz, J. Gonzalo, J. Rodríguez-Vidal,
F. Verdejo, Evall: Open access evaluation for information access systems, in: Proceed-
ings of the 40th International ACM SIGIR Conference on Research and Development in
Information Retrieval, ACM, 2017, pp. 1301–1304.
[16] M. Graff, S. Miranda-Jiménez, E. S. Tellez, D. Moctezuma, V. Salgado, J. Ortiz-Bejar, C. N.
Sánchez, Ingeotec at mex-a3t: Author profiling and aggressiveness analysis in twitter
using 𝜇tc and evomsa, in: In Proceedings of the Third Workshop on Evaluation of Human
Language Technologies for Iberian Languages (IberEval 2018), CEUR WS Proceedings,
2018, p. .
[17] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning word vectors for 157
languages, in: Proceedings of the International Conference on Language Resources and
Evaluation (LREC 2018), 2018, p. .
[18] E. K. Tang, P. N. Suganthan, X. Yao, An analysis of diversity measures, Mach.
Learn. 65 (2006) 247–271. URL: https://doi.org/10.1007/s10994-006-9449-2. doi:1 0 . 1 0 0 7 /
s10994- 006- 9449- 2.
235