Overview of MEX-A3T at IberLEF 2019:
      Authorship and aggressiveness analysis in
              Mexican Spanish tweets

       Mario Ezra Aragón1 , Miguel Á. Álvarez-Carmona4,5 , Manuel
   Montes-y-Gómez1 , Hugo Jair Escalante1 , Luis Villaseñor-Pineda1,2 , and
                          Daniela Moctezuma3
                    1
                        Laboratorio de Tecnologías del Lenguaje,
       Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Mexico.
    2
       Centre de Recherche en Linguistique Française GRAMMATICA (EA 4521),
                              Université d’Artois, France.
  3
      Centro de Investigación en Ciencias de Información Geoespacial A.C., Mexico
           4
             Consejo Nacional de Ciencia y Tecnología (CONACYT), Mexico
                        5
                          Unidad de Transferencia Tecnológica,
        Centro de Investigación Científica y de Educación Superior de Ensenada
                                (CICESE-UT3), Mexico
                 {mearagon,mmontesg,hugojair,villasen}@inaoep.mx,
                 malvarezc@cicese.mx, dmoctezuma@centrogeo.edu.mx


      Abstract. This paper presents the framework and results from the
      MEX-A3T track at IberLEF 2019. This track considers two tasks, au-
      thor profiling and aggressiveness detection, both of them using Mexican
      Spanish tweets. The author profiling task consists on determining the
      gender, occupation and place of residence of users from their tweets.
      As a novelty in this year’s edition, it considers the use of text and im-
      ages as information sources, with the aim of studying the relevance and
      complementarity of multimodal data for profiling social media users. On
      the other hand, the aggressiveness detection task follows the same design
      than the previous edition; it aims to discriminate between aggressive and
      non-aggressive tweets. For both tasks, we have built new corpora con-
      sidering tweets from Mexican Twitter users. This paper compares and
      discusses the results of the participants.


1 Introduction

Twitter platform is constantly growing thanks to the information generated by
a massive community of active users. The analysis of shared information has be-
come very relevant for several applications in marketing, security, and forensics,
among others.
  Copyright © 2019 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 Septem-
  ber 2019, Bilbao, Spain.
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


    One essential task for social media analysis is author profiling (AP), which
consists in predicting general or demographic attributes of authors by examining
the content of their posts [2, 4]. On the other hand, detecting aggressive content
targeted to specific people or vulnerable groups is also a task of high relevance
to preventing possible viral destructive behaviors through social networks.
    The objective of the MEX-A3T is to encourage research on the analysis of
social media content in Mexican Spanish. Mainly, it aims to push research into
the treatment of a variety of Spanish that has cultural traits that make it sig-
nificantly different from the peninsular Spanish. Accordingly, the 2019 edition
of MEX-A3T consider two main tasks: author profiling, whose aim was to de-
velop methods for profiling users according to non-standard dimensions (gender,
occupation, and place of residence), and aggressiveness detection in tweets. Par-
ticularly, the main novelty for this edition is the use multimodal data (text and
images) for AP, with the aim of exploring the relevance of multimodal informa-
tion for profiling social media users.
    To evaluate these tasks, we built two ad hoc collections. The first one is a
multimodal author profiling corpus consisting of 5 thousand Mexican users, each
one having eleven images, the profile image as well as ten random selected pic-
tures. This corpus is labeled for the subtasks of gender, occupation and place
of residence identification. Whereas the second corpus is oriented to the aggres-
siveness detection and contains more than 11 thousand tweets. In this case, each
tweet is labeled as aggressive or not.
    The remainder of this paper is organized as follows: Section 2 covers a brief
description of the first edition of the MEX-A3T; Section 3 presents the evaluation
framework used at MEX-A3T 2019; Section 4 shows an overview of the partic-
ipating approaches; Section 5 reports and analyses the results obtained by the
participating teams; finally, Section 6 draws the conclusions of this evaluation
exercise.


2 MEX-A3T 2018

Last year, the first edition of the MEX-A3T shared task was carried out [1]. This
represented the first attempt for organizing an evaluation forum for the analysis
of social media content in Mexican Spanish. A variety of methods were proposed
by participants, comprising content-based (bag of words, word n-grams, term
vectors, dictionary words, and so on) and stylistic-based features (frequencies,
punctuation, POS, Twitter-specific elements, slang words, and so forth) as well
as approaches based on neural networks (CNN, LSTM and others). In both
tasks, author profiling and aggressiveness identification, the baseline results were
outperformed by most participants.
    For author profiling, the approach proposed by the MXAA team [10] obtained
the best results with an approach based on emphasizing the value of personal
information for building the text representation. In the case of the aggressive-
ness identification, the top-ranked team was INGEOTEC [7], which proposed an
approach based on MicroTC and EvoMSA. MicroTC is a minimalistic text clas-


                                          479
            Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


sifier independent from domain and language. EvoMSA is another text classifier
which combines models (as MicroTC) with Genetic Programming.


3 Evaluation framework
This section outlines the construction of the two used corpus, highlighting partic-
ular properties, challenges, and novelties. It also presents the evaluation measures
used for both tasks.

3.1 A multimodal Mexican corpus for author profiling
This new corpus is based on previous year’s collection. For the MEX-A3T 2018,
we labeled 5 thousand Twitter users for occupation and place traits divided
into 3500 users for training and 1500 for the test. For the occupation label,
we considered the following eight classes: arts, student, social, sciences, sports,
administrative, health, and others. For the place of residence trait, we considered
the following six classes: north, northwest, northeast, center, west, and southeast.
For more details, we recommend consulting [1].
    For this year’s edition, we added the gender trait to the corpus, and in this
way, each user is characterized by three labels: gender, occupation, and location.
Another important novel aspect of this new corpus is the addition of 11 images
for user. We selected the profile picture for each user as well as 10 randomly
selected images from their tweets6 .
    Table 1 shows the distribution of the corpus according to the gender trait. For
this corpus, the gender trait is balanced. Also, Table 2 shows the distribution of
the corpus according to the place of residence trait. As it is possible to observe,
the distributions of training and test sets are very similar. The majority class
corresponds to the center region, with more than 36% of the profiles, whereas
the minority class is the north region with only 3% of the instances. On the other
hand, Table 3 shows the distribution of the occupation trait. It also shows similar
distributions in the training and test partitions. The majority class are students
with almost 50% of the profiles, whereas sports correspond to the minority class,
with approximately 1% of the instances.
    In the Tables 2 and 3, the class imbalance was calculated as proposed in
[15]. The place of residence trait shows a value of 396.1, whereas the occupation
trait has a value of 502.4. Considering that zero represents a perfect balance,
these numbers indicate that the imbalance is bigger for the occupation trait
and, therefore, that it could be more complex to be predicted that the place of
residence.
    Finally, Table 4 presents some additional statistics for the author profiling
corpus. For computing these numbers, we have considered words, numbers, punc-
tuation marks and emoticons as terms. We also applied a normalization over user
6
    For most users we collected 11 images, although there are a small number of users
    with fewer images since in total they did not shared 10 images. In this cases, we take
    all available images.


                                            480
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


                            Table 1. Gender distribution

                    Class      Train Corpus (%) Test Corpus (%)
                    Male           1750 (50)        750 (50)
                   Female          1750 (50)        750 (50)
                      Σ              3500             1500
               Class imbalance         0               0


Table 2. Mexican author profiling corpus: distribution of the place of residence trait.

                    Class      Train Corpus (%) Test Corpus (%)
                    North          106 (3.02)       34 (2.26)
                  Northwest       576 (16.45)     229 (15.26)
                  Northeast       914 (26.11)     389 (25.93)
                   Center        1266 (36.17)     554 (36.93)
                    West           322 (9.20)      144 (9.60)
                  southeast        316 (9.02)     150 (10.00)
                      Σ               3500            1500
               Class imbalance       396.45          173.23


   Table 3. Mexican author profiling corpus: distribution of the occupation trait.

                    Class      Train Corpus (%) Test Corpus (%)
                     Arts          240 (6.85)      103 (6.86)
                   Student       1648 (47.08)     740 (49.33)
                    Social        570 (16.28)     234 (15.60)
                   Sciences        185 (5.28)       65 (4.33)
                    Sports          45 (1.28)       26 (1.73)
               Administrative     632 (18.05)     264 (17.60)
                    Health         105 (3.00)       43 (2.86)
                    Others          75 (2.14)       25 (1.66)
                      Σ               3500            1500
               Class imbalance       502.42          226.04


                                          481
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


mentions, hashtags, and URLs. It is possible to observe that the lexical diver-
sity is very close for the training and test partitions. Also, the same goes for
the tweets per profile averages. Nevertheless, the standard deviation in training
and test is quite large, implying that the length of the profiles is very variable.
Finally, the last row in the table shows the number of images in the corpus.


            Table 4. Statistics for the Mexican Author profiling corpus.

        Measure         Train Corpus     Test Corpus       Full corpus
   Tweets per profile 1354.21(±917.61) 1353.38(±905.58) 1353.96(±914.02)
   Number of terms       78,542,124       34,032,819       112,574,943
    Vocabulary size       2,540,580        1,274,902        3,506,826
    Lexical diversity      0.0323           0.0374           0.0311
        Images              38249            16354            54603


3.2 A Mexican corpus for aggressiveness identification
As the author profiling corpus, for the previous edition of MEX-A3T we also
built an corpus of tweets for the task of aggressiveness detection. To build this
corpus, we used rude words and controversial hashtags to narrow the search.
The hashtags were related to topics of politics, sexism, homophobia, and dis-
crimination. The collected tweets were labeled by two persons. At the end each
tweet of the corpus was labeled as aggressive or non-aggressive. Table 5 shows
some examples labeled as aggressive and non-aggressive. As can be intuited, the
task of labeling aggressiveness is challenging, especially because in most of the
cases it is necessary to interpret the message in a given context.


                  Table 5. Aggressive and Non-Aggresive Tweets.


Aggressive                                    Non-Aggressive
Tu novia la gata esa que usa hashtag hasta Aquí me juego la vida, o leo el libro o leo
para poner hola, tu novia la acapulqueña esa las diapos, porque nuestro capítulo es de mil
                                              putas hojas. *literal*
Deja de estar de calientagüevos, que te vas a Soy una enamoradiza sin remedio”. -La em-
ganar una madriza                             peratriz de todas las putas.
Es una tipa tan cagante que no tiene amigos Atendiendote apartir de las 5 pm zona centro
                                              #SQUIRT #MILF #CULOS #NALGONA
                                              #HOTWIFE #SCORT #PUTAS


    The collected corpus consists of 11 thousand tweets. For the evaluation ex-
ercise, the corpus was divided into two parts, one for training and the other for


                                          482
            Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


test. Table 6 shows the distribution of this corpus. It is noticed that the non-
aggressive class is the majority class in both partitions. For more details of the
labeled methodology, please consult [1].


          Table 6. Mexican aggressiveness corpus: distribution of the classes.

                    Class      Training Corpus (%) Test Corpus (%)
                Not Aggressive      4973 (65)         2372 (75)
                 Aggressive         2727 (35)          784(25)
                     Σ                 7700              3156


3.3 Performance measures

Author profiling. For the author profiling task, we used as final score the
average of the macro F1 measures for gender, place of residence, and occupation
traits, as shown in Formula 1.


                  Fmacro (Cgender) + Fmacro (Clocation ) + Fmacro (Coccupation )
     Faverage =                                                                        (1)
                                               3
   The Fmacro measures were computed using Formula 2, where C indicates
the set of classes for a given trait7 , and F1 (c) is the F1 -measure of each of the
categories from that trait.
                                               1 ∑
                               Fmacro (C) =        F1 (c)                              (2)
                                              |C|
                                                   c∈C


Aggressiveness identification. For this task, the final score corresponds to
the F1 -measure for the aggressive class.


4 Overview of the Submitted Approaches

For this study, 8 teams have submitted one or more solutions, of which, 2 par-
ticipated in the author profiling task and 6 participated in the aggressiveness
identification task. By what they explained in their notebook papers, this section
presents a summary of their approaches regarding preprocessing steps, features,
and classification algorithms.
    The participating methods are listed below:
7
    Cgender = {male, female}, Clocation = {north, northwest, northeast, center, west,
    southeast}, and Coccupation = {arts, student, social, sciences, sports, administrative,
    health, others}


                                            483
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


– CerpamidUA at MexA3T 2019: Transition Point Proposal [6]
   • Task: Author Profiling
   • Team name: Cerpamid
   • Features: Bag of Words.
   • Classification: Support Vector Machine.
   • Summary: In this paper, the authors proposed an approach that fol-
     lows the traditional pipeline of a non-thematic text classification system,
     where they employed a BOW representation and a SVM classifier. The
     authors focused on determining a reduced subset of features that rep-
     resent frequent words for each profile, and propose using the theory of
     Transition Points for the selection of these features.


– Author profiling from images using 3D Convolutional Neural Networks [16]
   • Task: Author Profiling
   • Team name: CIC-VCR
   • Features: Hierarchical features obtained with CNN.
   • Classification: CNN.
   • Summary: In this paper, the authors focused on determining the pro-
     file of an author using images only. They proposed a 3D Convolutional
     Neural Network for extracting features from the images and classifying
     them in the different classes. They concluded that predicting the AP of
     a Twitter user using only images is a difficult task due to the generality
     of purpose of the images on this platform.


– Aggressive analysis in Twitter using a combination of models [13]
   • Task: Aggressiveness Detection
   • Team name: PRHLT
   • Features: Bag of Words with TF-IDF weights, hierarchical features ob-
     tained with CNN.
   • Classification: CNN, LSTM and Multi-layer Perceptron.
   • Summary: In this work, the authors proposed a method that combines
     different classification strategies: a Convolutional Neuronal Networks
     whose outputs feed a LSTM Neural Network; a pre-trained Universal
     Sentence Encoder for encoding sentences into embedding vectors; and a
     simple Multi-layer Perceptron which gets the TF-IDF representation of
     the tweet. The best results were obtained with the simplest model (the
     multi-layer perceptron), which can be explained by the lack of data to
     train deep learning models.


– Aggressiveness Identification in Twitter at IberLEF2019: Frequency Analysis
  Interpolation for Aggressiveness Identification [12]
    • Task: Aggressiveness Detection
    • Team name: OscarGaribo


                                        484
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


    • Features: Statistical descriptors.
    • Classification: Support Vector Machine.
    • Summary: In this paper, the authors proposed a new text representa-
      tion that reduces the dimensionality of the information for each author
      or text to 6 characteristics per class. The proposed representation aims
      to capture the level of association of each word to each one of the classes
      and, therefore, to model the probability distribution of the presence or
      evidence of each class in the texts. This representation, named as Fre-
      quency Analysis Interpolation, is used to codify the texts for each user,
      and then this codified information is used to feed a Support Vector Ma-
      chines classifier.


– Attribute selection techniques for classification of aggressive tweets. LyR-
  UAMC participation at MexA3T 2019 Task [14]
   • Task: Aggressiveness Detection
   • Team name: LyR
   • Features: Document frequency, mutual information, and lexical Avail-
      ability.
   • Classification: Naïve Bayes.
   • Summary: In this work, the authors proposed an approach that fol-
      lows the traditional pipeline of a non-thematic text classification system.
      They employed a BOW representation and evaluated the impact of dis-
      tinct features selection strategies. Their goal was to test if a condensed
      set of words can be indicative of the aggressiveness of a tweet. They pro-
      posed a new criterion to select relevant words: the lexical availability,
      and reach the following conclusion: different feature selection techniques
      favor different aspects of the aggressiveness in a short text.


– Detection of Aggressive Tweets in Mexican Spanish Using Multiple Features
  with Parameter Optimization [11]
   • Task: Aggressiveness Detection
   • Team name: mineriaUNAM
   • Features: Linguistically features and different types of n-grams.
   • Classification: Support Vector Machine
   • Summary: In this work, the authors approached the problem using
      linguistically motivated features and several types of n-grams (words,
      characters, functional words, punctuation symbols, among others). They
      trained a Support Vector Machine using a combinatorial framework that
      optimizes the results of the classifier.


– UACh at MEX-A3T 2019: Preliminary results on detecting aggressive tweets
  by adding author information via an unsupervised strategy [5]
   • Task: Aggressiveness Detection


                                        485
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


     • Team name: UACh
     • Features: Character n-grams and word embeddings.
     • Classification: Support Vector Machine and a multilayer perceptron.
     • Summary: In this paper, the authors considered the application of a
       traditional classification method to the problem of aggressiveness detec-
       tion in Spanish tweets. They used two main kinds of features: character
       n-grams and word embeddings. Then employed two different classifiers,
       a SVM and a multilayer perceptron. The main idea of their participation
       was the inclusion of features to try to give context to the text messages
       and explore if people verbally attack differently depending on their traits
       and overall environment. The obtained results indicated that adding con-
       text features produce almost unnoticeable changes in the performance.


 – Aggressiveness Detection through Deep Learning Approaches [9]
    • Task: Aggressiveness Detection
    • Team name: VRAIN
    • Features: Hierarchical features obtained by a CNN.
    • Classification: CNN, LSTM, GRU.
    • Summary: In this paper, the authors explore three deep learning ap-
      proaches to the task: a convolutional network, a recurrent network and
      a self-attention network. They did not obtain good results in the test
      set. They assumed that that was due to the fact that the content of the
      test data is too different from the training set.


 – Ensemble learning to detect aggressiveness in Mexican Spanish tweets [8]
    • Task: Aggressiveness Detection
    • Team name: CEATIC
    • Features: Bag of words with term frequency.
    • Classification: Support Vector Machine, Logistic Regression, Multino-
      mial Naïve Bayes.
    • Summary: In this work, the authors used a traditional BOW represen-
      tation, considering unigrams and bigrams, and applied a TF weighting.
      They evaluated multiple classification algorithms, among them Logistic
      Regression, Multinomial NB and SVM, and proposed an ensemble clas-
      sifier combining the three best individual algorithms by majority vote.


5 Experimental evaluation and analysis of results

This section summarizes the results obtained by the participants, comparing and
analyzing in detail the performance of their submitted solutions. For the final
phase of the challenge, participants sent their predictions for the test partition,


                                          486
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


Table 7. General description for the different approaches; with the preprocessing,
representation (features) and classification of each participant. In the classification
section, the C is to indicate a classifier and A to indicate a general approach.


                                                          mineriaUNAM


                                                          CIC-VCR
                                                          Cerpamid
                                                          CEATIC
                                                          OGaribo
                                                          PRHLT


                                                          VRAIN
                                                          UACh
                                                          LyR
           Type       Approach
                      Lowercase                       ✓✓✓ ✓
      Preprocessing Normalize tweets              ✓ ✓✓✓✓✓
                      Characters n-grams                ✓✓
                      Words n-grams               ✓          ✓✓
                      Aggressive words            ✓
      Representation Word embeddings              ✓       ✓✓
        (features) Statistical descriptors          ✓
                      LIWC                              ✓
                      Hierarchical (over texts)   ✓        ✓
                      Hierarchical (over images)                ✓
                      C Logistic regression                  ✓
       Classification C Naïve Bayes                   ✓      ✓
                      C SVM                         ✓ ✓✓ ✓✓
                      A Deep-learning             ✓        ✓    ✓
                      A Model selection/Ensembles ✓        ✓✓


                                          487
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


the performance on this data was used to rank them. The macro average F1 was
used as the main evaluation measure.
    For computing the evaluation scores we relied on the EvALL platform [3].
EvALL is an online evaluation service targeting information retrieval and natural
language processing tasks. It is a complete evaluation framework that receives
as input the ground truth and the predictive outputs of systems and returns a
complete performance evaluation. In the following, we report the results obtained
by participants as evaluated by EvALL.
    As baseline methods, we implemented two popular approaches that have
shown to be hard to beat in both tasks: i) a classification model trained on the
bag of words (BoW) representation, and ii) a classifier trained on a character
3-grams representation. Also, we compared the systems’ results versus the best
results for both tasks in the previous year edition. For author profiling we con-
sider the results from the MXAA approach [10], and for aggressiveness detection
we use the results from the INGEOTEC system [7],
    For the BOW approach, all the corpus vocabulary was used, but stopwords
and special characters were removed. The size of the representation of each text
was 14,913. For the 3-grams representation, all 3-grams were used. As in BOW,
stopwords and special characters were removed. The size of the representation
of each text was 5,212. A SVM with linear kernel and C = 1 was applied for
classification in both tasks.

5.1 Author profiling results
Table 8 shows a summary of the results obtained by each team in the three AP
subtasks as well as their average performance. The average macro F1 was used
to rank participants. The approach of the CerpamidTeam (run 1) team obtained
the best performance. Nevertheless, this system do not overcome all baselines. In
particular, it is noticeable that for the two traits considered in the 2018 edition,
i.e., occupation and place of residence, any of the two participant teams was able
to improve the results from winner team (MXAA) of the last year’s edition.


Table 8. Average Macro F1 perf ormancef orthethreetraitsintheauthorprof ilingtask

          Team         Source Gender Occupation Location Average − F1
    Baseline (MXAA) Text        -       0.51     0.83          -
     Baseline (BoW )    Text 0.72       0.48      0.63       0.61
   CerpamidTeam run 1 Text 0.84         0.40      0.50       0.58
    Baseline (3-grams) Text 0.68        0.42      0.60       0.57
   CerpamidTeam run 2 Text 0.83         0.38      0.48       0.56
     CIC-VCR run 1     Image 0.52       0.12      0.15       0.26
     CIC-VCR run 2     Image 0.47       0.10      0.11       0.23


   Tables 9 and 10 show the results obtained by each team for the location
and occupation traits respectively. Although we used the macro average of F1


                                          488
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


to rank the participants, we also show the accuracy results as well as the F1 for
each class. For these two particular traits the two participant systems could not
outperform any of the proposed baselines.

        Table 9. Results for the location trait in the author profiling task.

                         Global                   Per class performance
       Team         Fmacro Accuracy center southeast northwest north northeast west
 Baseline (MXAA)     0.83    0.86    0.87    0.81        0.86    0.78   0.90 0.75
  Baseline (BoW )    0.63    0.75    0.79    0.60        0.78    0.32   0.83 0.45
 Baseline (3-grams) 0.60     0.72    0.75    0.50        0.77    0.31   0.80 0.47
CerpamidTeam run 1 0.50      0.63    0.69     0.39       0.67    0.17   0.71   0.29
CerpamidTeam run 2 0.48      0.62    0.70     0.40       0.67    0.25   0.73   0.26
  CIC-VCR run 1      0.15    0.24    0.42     0.04       0.10    0.03   0.03   0.10
  CIC-VCR run 2      0.11    0.17    0.36     0.0       0.087    0.05   0.14   0.03


       Table 10. Results for the occupation trait in the author profiling task

                         Global                     Per class performance
       Team         Fmacro Accuracy others arts student social sciences sports admin health
 Baseline (MXAA)     0.51    0.74    0.04 0.51 0.91 0.69 0.47            0.49 0.59 0.38
  Baseline (BoW )    0.48    0.71    0.15 0.48 0.90 0.61 0.37            0.52 0.54 0.23
Baseline (Trigrams) 0.42     0.69    0.13 0.32 0.90 0.62 0.26 0.28 0.52 0.32
CerpamidTeam run 1 0.40      0.66    0.10 0.25 0.86 0.57 0.20            0.31 0.51 0.26
CerpamidTeam run 2 0.38      0.66    0.13 0.33 0.86 0.55 0.21            0.35 0.48 0.25
  CIC-VCR run 1      0.12    0.27     0.0 0.09 0.45 0.14 0.06             0.0 0.21 0.0
  CIC-VCR run 2      0.09    0.23     0.0 0.12 0.44 0.07 0.04             0.0 0.09 0.0


    From an overall analysis, it was possible to notice that for all traits the
best results corresponded to textual-based solutions. In spite of this general
behaviour, we could identify 110 users out of 1500 from the the test set that
were correctly classified only by the image-based systems (runs 1 and 2 from
the CIC-VCR team). We hypothesize that this result could be caused by the
lower number of tweets from these users in comparison to the rest. They have on
average 1218 tweets, whereas the average from the complete test set is of 1353
tweets per user.
    To analyze the complementarity of the predictions by the two participants, we
built a theoretically perfect ensemble from their four runs. That is, we considered
that a test instance was correctly classified if at least one of the participating
teams (i.e., one of their runs) classified it correctly. Additionally, we considered
a majority vote approach; for this we choose the class with the greatest number
of predictions among the four runs.
    Table 11 shows the results of the perfect ensemble and the majority vote
approach, and compares them with the best result obtained for each trait by a


                                          489
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


single participant system. From these results, it is possible to observe that the
perfect ensemble performance is considerably greater than the best approach for
the three traits, suggesting that the two participant systems are complementary
to each other. Nevertheless, the bad results from the majority vote approach
indicate that the intersection of correctly classified instances by the two sys-
tems is quite small, and therefore, that automatically taking advantage of these
complementarity is a complex task

Table 11. Combining AP results from the different systems: perfect ensemble and the
majority vote approach

                   Trait   Best approach Vote Perfect ensemble
                  Gender        0.83     0.76       0.97
                 Location       0.50     0.45       0.71
                Occupation      0.39     0.35       0.57


5.2 Aggressiveness identification results
Table 12 shows the results obtained by the participating teams in the aggres-
siveness detection task. For this task, we sort the teams by their F1 results in
the aggressive class, but for completion we also report the accuracy, the macro
F1 and the F1 in the non-aggressive class. The approach submitted by the Uni-
versity of Chihuahua (UACh) obtained the best performance, outperforming all
proposed baselines, except the results from INGEOTEC, which the winner team
in the 2018 edition. Nevertheless, it is important to point out that the UACh
approach is considerably much simpler than the one from INGEOTEC.
    As previously done with the profiling task, we also built a theoretically perfect
ensemble and a majority vote approach from all submitted submissions to the
aggressiveness task. Table 13 shows these results. Again, the perfect ensemble
shows a very good result, in this case a F1 = 0.99, but also the majority vote
approach obtained a low performance, indicating that also for this task it is
difficult to find a way to merge the information from the different approaches,
even when they are complementary to each other.
    As a result of the perfect theoretical ensemble, it was possible to identify
those common errors across all the systems. In fact, there are only 9 tweets that
no system could classify correctly. All of them are aggressive tweets that were
classified as non-aggressive. Below we show some of these tweets, where we can
identify ironic comments, the use of out of the training vocabulary words, such
as some named entities, as well as offenses with no vulgar or profane words.
 – Y hablando de cosas feas, ¿cómo está tu novia?
 – A mí más real se me hace Carla Morrison porque está super gorda
 – Ponete a correr gorda, está bien que las puertas del gimnasio de abren
 – pero va a llegar el momento en que no vas a pasar el mejor profff????
 – Tu siempre tan tonta Viviana ????


                                          490
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


             Table 12. Results for the aggressiveness identification task

                Team        Accuracy Fmacro Aggresive Non aggressive
               UACh           0.73    0.65    0.48        0.82
       Baseline (INGEOTEC) 0.73       0.65    0.48        0.81
            PRHLT-run2        0.70    0.63    0.47        0.79
            PRHLT-run4        0.69    0.62    0.46        0.78
        mineriaUNAM-run2      0.71    0.63    0.45        0.80
        mineriaUNAM-run1      0.71    0.63    0.45        0.80
            PRHLT-run3        0.65    0.59    0.44        0.74
            PRHLT-run1        0.65    0.59    0.44        0.74
        Baseline (Trigrams)   0.69    0.60    0.43        0.79
              LyR-run3        0.69    0.61    0.43        0.79
              LyR-run6        0.68    0.59    0.42        0.77
            VRAIN-run1        0.50    0.49    0.41        0.57
         OscarGaribo-run1     0.68    0.59    0.40        0.79
              LyR-run5        0.67    0.59    0.38        0.79
              LyR-run2        0.70    0.55    0.38        0.77
              LyR-run1        0.65    0.57    0.38        0.76
          Baseline (BoW )     0.68    0.58    0.37        0.78
         OscarGaribo-run2     0.67    0.57    0.37        0.77
        LASTUS-UPF-run2       0.60    0.52    0.32        0.72
        LASTUS-UPF-run1       0.58    0.50    0.30        0.70
              CEATIC          0.72    0.56    0.30        0.82
            VRAIN-run2        0.61    0.51    0.29        0.73
            Aspie96-run2      0.63    0.52    0.29        0.75
              LyR-run4        0.66    0.57    0.28        0.81
              hzegheru        0.60    0.50    0.28        0.73
            Aspie96-run1      0.68    0.53    0.27        0.79


Table 13. Results of majority vote and perfect ensemble for aggressiveness identifica-
tion

                       Best approach Vote Perfect ensemble
                            0.47     0.40       0.99


                                          491
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


6 Conclusions

This paper described the design and results of the MEX-A3T shared task col-
located with IberLef 2019. MEX-A3T stands for Authorship and Aggressiveness
Analysis in Mexican Spanish Tweets. Two tasks were proposed, one targeting au-
thor profiling and the other focused on aggressiveness detection. Mainly, given
a set of tweets in Mexican Spanish, the participants had to identify the gender,
location and occupation of their authors as well as the aggressive messages. For
these tasks we employed the same data sets than from the previous MEX-A3T
edition, but we extended the author profiling collection by including eleven im-
ages for each user, with the aim of evaluating multimodal profiling approaches.
The shared task lasted more than two months and attracted the participation
of nine teams from three different countries, Mexico, Spain and Cuba.
    A variety of methodologies were proposed by the participants, from tradi-
tional supervised methods to deep learning approaches. For author profiling, the
approach proposed by the Cerpamid team obtained the best results with an
approach based on dimentionality reduction in text. However; their results did
not overcome the best results from the previous year edition.For aggressiveness
identification, the top-ranked team was UACh with an apporach based on two
main kinds of features: character n-grams and word embeddings. Their results
were equal to the previous year winner but employing a simpler approach.
    In general terms, the competition was a success: the solutions proposed by
nine participants were diverse regarding methodologies and performances, and
new insights on how to deal with tweets on Mexican Spanish were obtained.
Among the most interesting findings was the complementarity of the predictions
from the different participants, a phenomenon that was also observed in the
previous edition. This opens the possibility to study how to take advantage of
the different information extracted by the teams in such a way that results reach
those from the perfect ensemble.


Acknowledgements Our special thanks go to all of MEX-A3T’s participants.
We would like to thank CONACyT for partially supporting this work under
grants CB-2015-01-257383, FC-2016-2410 and the Thematic Networks program
(Language Technologies Thematic Network). The first author thanks for doctoral
scholarship CONACyT-Mexico 654803 and the second for doctoral scholarship
CONACyT-Mexico 401887.


References

 1. Álvarez-Carmona, M.Á., Guzmán-Falcón, E., Montes-y Gómez, M., Escalante,
    H.J., Villaseñor-Pineda, L., Reyes-Meza, V., Rico-Sulayes, A.: Overview of mex-a3t
    at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets.
    In: Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language
    Technologies for Iberian Languages (IBEREVAL), Seville, Spain, September (2018)


                                          492
           Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


 2. Álvarez-Carmona, M.A., López-Monroy, A.P., Montes-y Gómez, M., Villaseñor-
    Pineda, L., Meza, I.: Evaluating topic-based representations for author profiling in
    social media. In: Ibero-American Conference on Artificial Intelligence. pp. 151–162.
    Springer (2016)
 3. Amigó, E., Carrillo-de Albornoz, J., Almagro-Cádiz, M., Gonzalo, J., Rodríguez-
    Vidal, J., Verdejo, F.: Evall: Open access evaluation for information access systems.
    In: Proceedings of the 40th International ACM SIGIR Conference on Research and
    Development in Information Retrieval. pp. 1301–1304. ACM (2017)
 4. Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing
    style in formal written texts. TEXT-THE HAGUE THEN AMSTERDAM THEN
    BERLIN- 23(3), 321–346 (2003)
 5. Casavantes, M., López, R., González, L.C.: Uach at mex-a3t 2019: Preliminary
    results on detecting aggressive tweets by adding author information via an unsu-
    pervised strategy. In: In Proceedings of the First Workshop for Iberian Languages
    Evaluation Forum (IberLEF 2019), CEUR WS Proceedings (2019)
 6. Castro Castro, D., Artigas Herold, M.F., Ortega Bueno, R., Muñoz, R.: Cer-
    pamidua at mexa3t 2019: Transition point proposal. In: In Proceedings of the
    First Workshop for Iberian Languages Evaluation Forum (IberLEF 2019), CEUR
    WS Proceedings (2019)
 7. Graff, M., Miranda-Jiménez, S., Tellez, E.S., Moctezuma, D., Salgado, V., Ortiz-
    Bejar, J., Sánchez, C.N.: Ingeotec at mex-a3t: Author profiling and aggressiveness
    analysis in twitter using µtc and evomsa. In: In Proceedings of the Third Workshop
    on Evaluation of Human Language Technologies for Iberian Languages (IberEval
    2018), CEUR WS Proceedings (2018)
 8. Molina-González, M.D., Plaza-del Arco, F.M., Martín-Valdivia, M.T., Ureña
    López, L.A.: Ensemble learning to detect aggressiveness in mexican spanish tweets.
    In: In Proceedings of the First Workshop for Iberian Languages Evaluation Forum
    (IberLEF 2019), CEUR WS Proceedings (2019)
 9. Nina-Alcocer, V., González, J.Á., Hurtado, L.F., Pla, F.: Aggressiveness detection
    through deep learning approaches. In: In Proceedings of the First Workshop for
    Iberian Languages Evaluation Forum (IberLEF 2019), CEUR WS Proceedings
    (2019)
10. Ortega-Mendoza, R.M., López-Monroy, A.P.: The winning approach for author
    profiling of mexican users in twitter at mex.a3t@ibereval-2018. In: In Proceedings
    of the Third Workshop on Evaluation of Human Language Technologies for Iberian
    Languages (IberEval 2018), CEUR WS Proceedings (2018)
11. Ortiz, G., Gómez-Adorno, H., Reyes-Magaña, J., Bel-Enguix, G., Sierra, G.: De-
    tection of aggressive tweets in mexican spanish using multiple features with param-
    eter optimization. In: In Proceedings of the First Workshop for Iberian Languages
    Evaluation Forum (IberLEF 2019), CEUR WS Proceedings (2019)
12. Garibo i Orts, O.: Aggressiveness identification in twitter at iberlef2019: Frequency
    analysis interpolation for aggressiveness identification. In: In Proceedings of the
    First Workshop for Iberian Languages Evaluation Forum (IberLEF 2019), CEUR
    WS Proceedings (2019)
13. De la Peña Sarracén, G.L., Rosso, P.: Aggressive analysis in twitter using a combi-
    nation of models. In: In Proceedings of the First Workshop for Iberian Languages
    Evaluation Forum (IberLEF 2019), CEUR WS Proceedings (2019)
14. Ramírez-de-la Rosa, G., Villatoro-Tello, E., Jiménez-Salazar, H.: Attribute selec-
    tion techniques for classification of aggressive tweets lyr-uamc participation at
    mexa3t 2019 task. In: In Proceedings of the First Workshop for Iberian Languages
    Evaluation Forum (IberLEF 2019), CEUR WS Proceedings (2019)


                                           493
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


15. Tellez, F.P., Pinto, D., Cardiff, J., Rosso, P.: Defining and evaluating blog char-
    acteristics. In: Artificial Intelligence, 2009. MICAI 2009. Eighth Mexican Interna-
    tional Conference on. pp. 97–102. IEEE (2009)
16. Valdez-Rodríuez, J.E., Calvo, H., Felipe-Riverón, E.M.: Author profiling from im-
    ages using 3d convolutional neural networks. In: In Proceedings of the First Work-
    shop for Iberian Languages Evaluation Forum (IberLEF 2019), CEUR WS Pro-
    ceedings (2019)


                                          494