=Paper= {{Paper |id=Vol-3159/T5-5 |storemode=property |title=Deep Multi-Task Models for Misogyny Identification and Categorization on Arabic Social Media |pdfUrl=https://ceur-ws.org/Vol-3159/T5-5.pdf |volume=Vol-3159 |authors=Abdelkader El Mahdaouy,Abdellah El Mekki,Ahmed Oumar,Hajar Mousannif,Ismail Berrada |dblpUrl=https://dblp.org/rec/conf/fire/MahdaouyMOMB21 }} ==Deep Multi-Task Models for Misogyny Identification and Categorization on Arabic Social Media== https://ceur-ws.org/Vol-3159/T5-5.pdf
Deep Multi-Task Models for Misogyny Identification
and Categorization on Arabic Social Media
Abdelkader El Mahdaouy1 , Abdellah El Mekki1 , Ahmed Oumar2 , Hajar Mousannif2
and Ismail Berrada1
1
    School of Computer Sciences, Mohammed VI Polytechnic University, Morocco
2
    LISI Laboratory, Computer Science Department, FSSM, Cadi Ayyad University, Morocco


                                         Abstract
                                         The prevalence of toxic content on social media platforms, such as hate speech, offensive language, and
                                         misogyny, presents serious challenges to our interconnected society. These challenging issues have
                                         attracted widespread attention in Natural Language Processing (NLP) community. In this paper, we
                                         present the submitted systems to the first Arabic Misogyny Identification shared task. We investigate
                                         three multi-task learning models as well as their single-task counterparts. In order to encode the input
                                         text, our models rely on the pre-trained MARBERT language model. The overall obtained results show
                                         that all our submitted models have achieved the best performances (top three ranked submissions) in
                                         both misogyny identification and categorization tasks.

                                         Keywords
                                         Misogyny Identification, Misogyny Categorization, Multi-Task Learning, Pre-trained Language Models




1. Introduction
With the popularity of the Internet and the rise of social media platforms, users around the
world are having more freedom of expression. They can express their thoughts and opinions
with minimal limitations and restrictions. As a result, they can share their positive thoughts
about a specific product or service, a political decision, etc. Besides, they can share their negative
thoughts about other things. Unfortunately, many users can employ these communication
channels and freedom of expression to bully other people or groups. Misogyny is one of these
phenomena, and it is defined as hate speech towards the female gender [1]. Misogyny can be
classified into several categories such as sexual harassment, damning, dominance, etc [2].
   Misogynistic behavior has prevailed on social media such as Facebook and Twitter. The ease
of use and richness of these platforms have upraised misogyny to new levels of violence around
the globe. Moreover, women suffer from misogyny in the 1st tier world as they suffer from it in
the 2nd and 3rd tier world despite their race, language, age, etc. In the Arabic world, women’s
rights and liberty have been always a controversial subject. Therefore, women are also exposed

FIRE 2021: Forum for Information Retrieval Evaluation, 13th-17th December, 2021
Envelope-Open abdelkader.elmahdaouy@um6p.ma (A. El Mahdaouy); abdellah.elmekki@um6p.ma (A. El Mekki);
ahmedmohamedlemine.oumar@edu.uca.ma (A. Oumar); mousannif@uca.ac.ma (H. Mousannif);
ismail.berrada@um6p.ma (I. Berrada)
Orcid 0000-0003-4281-2472 (A. El Mahdaouy); 0000-0002-7394-3611 (A. El Mekki); 0000-0002-1307-4215
(H. Mousannif); 0000-0003-4225-911X (I. Berrada)
                                       © 2021 Forum for Information Retrieval Evaluation, December 13-17, 2021, India.
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
to online misogyny, where people can start campaigns of intimidation and harassment against
them for one reason or another.
   Fighting online misogyny has become a topic of interest of several Internet players, where
social media networks such as Facebook and Twitter propose reporting systems that allow users
to post messages expressing misogynistic behavior. These reporting systems can detect these
behaviors from users’ posts and delete them automatically. For high-resource languages such
as English, Spanish, and French, these systems have been shown to perform well. However,
when it comes to languages such as Arabic, automatic reporting systems are not yet deployed,
and that is mainly due to: 1) the lack of annotated data needed to build such systems and 2) the
complexity of the Arabic language compared to other languages.
   Fine-tuning pre-trained transformer-based language models [3] on downstream tasks has
shown state-of-the-the-art (SOTA) performances on various languages including Arabic [4, 5, 6,
7, 8]. Although several research works based on pre-trained transformers have been introduced
for misogyny detection in Indo-European languages [9, 10, 11], works on Arabic language
remain under explored [12].
   In this paper, we present our participating system and submissions to the first Arabic Misogyny
Identification (ArMI) shared tasks [13]. We introduce three Multi-Task Learning (MTL) models
and their single-task counterparts. To embed the input texts, our models employ the pre-trained
MARBERT language model [5]. Moreover, for Task 2, we tackle the class imbalance problem
by training our models to minimize the Focal Loss [14]. The obtained results demonstrate that
our three submissions have achieved the best performances for both ArMI tasks in comparison
to the other participating systems. The results also show that MTL models outperform their
single-task counterparts on most evaluation measures. Additionally, the Focal Loss has shown
effective performances, especially on F1 measures.
   The rest of this paper is organized as follows. Section 2 describes the ArMI tasks and the
provided dataset. In Section 3, we introduce our participating system and the investigated deep
learning models. Section 4 presents the conducted experiments and shows the obtained results.
In section 5, we conclude the paper.


2. Tasks and dataset description
The Arabic Misogyny Identification (ArMI) task consists of the automatic detection of misogyny
from Arabic tweets [13]. This task is composed of two main sub-tasks: the 1st sub-task is a
binary classification task where the objective is to classify whether a tweet is misogynistic or
not. In the second sub-task, the objective is to detect the misogynistic behavior expressed in
a tweet. It is modeled as a multi-class classification problem consisting of seven misogynistic
behaviors (labels). The organizers of this task have provided 7,866 labeled tweets to serve
both sub-tasks for model training, while 1,966 tweets have been used for model testing and
evaluation. Figure 1 presents the distribution of both tasks labels. It shows that the class labels
are imbalanced for both misogyny identification and categorization tasks.
   The provided tweets are expressed mainly in Modern Standard Arabic (MSA), while several
tweets are expressed in some Arabic dialects such as Egyptian, Gulf, and Levantine. The
Levantine tweets are taken from Let-Mi misogyny detection dataset, proposed by Mulki and
 (a) Distribution of misogynistic tweets            (b) Distribution of misogynistic categories

Figure 1: Labels distribution for both misogyny and category detection tasks.


Ghanem [12]. Besides, the rest of the tweets have been scrapped from Twitter using hashtags
related to the misogyny phenomenon. The provided dataset is manually annotated by Arabic
native speakers.


3. Methodology
We propose three deep Multi-task Learning (MTL) models based on the pre-trained MARBERT
encoder [5] for the ArMI shared task. We also investigate the single-task version of the proposed
MTL models. The choice of MARBERT encoder is motivated by the fact that this language
model is pre-trained on 1B tweet corpus, containing both dialectal Arabic and MSA. Moreover,
Fine-tuning MARBERT on downstream NLP tasks has shown effective results in many Arabic
NLP applications [5, 7, 8]. In what follows, we describe each component of our submitted
system.

3.1. Preprocessing
The tweet preprocessing component performs emojis extraction, user mention and URL sub-
stitution, and hashtag normalization. Following MARBERT’s tweets preprocessing guidelines,
user mentions and URLs are replaced by ”user” and ”url” token, respectively. For hashtags
normalization, we remove ”#” symbol and replace ”_” by white space. It is worth mentioning that
diacritics are already removed from the training and testing datasets. Based on our preliminary
experiments, emojis are not removed from the normalized text and added after the [SEP] token
of the employed encoder. Finally, each tweet is represented using its normalized text and its
emojis, as follows:
   ⋆ [CLS] normalized tweet [SEP] emojis [SEP]

3.2. Deep Learning Models
In this section, we describe the employed MTL models and their single task counterparts. All
our models utilize MARBERT encoder to represent the input tweets. The models are described
as follows:
    • MT_CLS uses a classification layer for each task on top of MARBERT encoder. It relies
      on [CLS] token embedding to predict the class label for each task. The single-task version
      of this model is denoted by ST_CLS.
    • MT_ATT consists of MARBERT encoder, two task-specific attention layers, and two
      classification layers. Each attention layer [15, 16] extracts task discriminative features by
      weighting the output token embedding of the encoder according to their contribution
      to the task at hand. Each classification layer is feed with the concatenation of the
      task attention output and the [CLS] token embedding. This model has shown effective
      performances in many NLP tasks, including dialect identification, sentiment analysis
      and sarcasm detection for the Arabic language [7, 8], humor detection and rating, as
      well as lexical complexity prediction in English [17, 18]. The single-task counterpart of
      MT_ATT is denoted by ST_ATT.
    • MT_VHATT is an extension of the MT_ATT model. In addition to the task-specific
      attention layers (called horizontal attention layers), it employs vertical attention layers to
      incorporate the features of the top intermediate layers of MARBERT encoder for both
      tasks. This model utilizes six attention layers to extract features from the token embedding
      of the top six layers of the encoder [15, 16]. Then, another attention layer is employed to
      aggregate features from the six vertical attention layers. Note that, we exclude the top
      output layer of the encoder as its features are already used by the horizontal attention
      layers (task-specific attention). Finally, the input of the classification layers for both tasks
      is the concatenation of the [CLS] token embedding of the last layer of the encoder, the
      task-specific attention output, and the aggregated features of intermediate layers. The
      single-task version of this model (MT_VHATT) is denoted by ST_VHATT.

  For misogyny identification (Task 1), all models are trained to minimize the binary cross-
entropy loss. For misogyny categorization (Task 2), we have investigated the Cross-Entropy
(CE) loss, as well as the Focal Loss (FL) [14]. The latter loss is employed to handle the class
imbalance problem. It reduces the loss contribution from easy examples and assigns higher
importance weights for hard-to-classify examples. The FL is given by:
                                                              𝛾
                                   𝐹 𝐿(𝑦, 𝑝)̂ = −𝛼𝑦 (1 − 𝑝𝑦̂ ) log(𝑝𝑦̂ )                               (1)
where, 𝑦 ∈ {0, … , 𝐾 − 1} denotes the category’s label, 𝑝 ̂ = (𝑝0̂ , … , 𝑝𝐾̂ −1 ) is a vector representing
the predicted probability distribution over the labels, 𝛼𝑦 is the weight of label 𝑦, and 𝛾 controls
the contribution of high-confidence predictions in the loss. In other words, a higher value of 𝛾
implies lower loss contribution for well-classified examples [14].


4. Experiments and results
In this section, we present the experiment settings as well as the obtained results for our
development set and the provided test set.
4.1. Experiment settings
All our models are implemented using PyTorch1 framework and the open-source Transformers2
libraries. Experiments are performed using a PowerEdge R740 Server, having 44 cores Intel
Xeon Gold 6152 2.1GHz, a RAM of 384 GB, and a single Nvidia Tesla V100 with 16GB of RAM.
The provided training set is split into 90% for the training and 10% for the development. Based
on our preliminary results, all models are trained using Adam optimizer. The learning rate,
the number of epochs, and the batch size are fixed to 1 × 10−5 , 5, and 16 respectively. The
hyper-parameter 𝛾 of the Focal Loss is set to 2, while the weights of Task 2 labels are set to
𝛼𝑦 = number  of instance of dominant label
          number of instance of label y
                                           . All models are evaluated using the Accuracy as well as
the macro averaged Precision, Recall, and F1 measures.

4.2. Results
In order to select the best models for our official submissions, we have evaluated the three MTL
models and their single-task counterparts. For Task 2, we have investigated both CE and FL
losses. Table 1 presents the obtained results on the development set using the three single-task
models. The overall obtained results for Task 1 show that the ST_ATT model outperforms the
other models on most evaluation measures. It shows also the best Recall and F1 measures for
Task 2. Moreover, ST_VHATT yields slightly better performances on Task 1 and achieves far
better precision and F1 scores on Task 2 in comparison to ST_CLS model. Furthermore, FL
outperforms the CE loss on most evaluation measures for Task 2, except for the accuracy and
the precision of model ST_CLS. Table 2 presents the classification reports for Task 2 of the
ST_ATT model using CE and FL loss functions. The obtained results show that the FL leads to
better F1 scores for all categories, except ”Discredit” and ”Damning” misogynistic behaviours.
Indeed, the classification of rare events is increased while maintaining the overall performance.

Table 1
The obtained results on the dev set using the three single-task models for both Task 1 and Task 2.
                                 Task 1                                              Task 2
 Model           Accuracy    Precision    Recall    F1     Cat. Task Loss   Accuracy     Precision    Recall    F1
                                                                CE           81.58            71.80   56.05    60.66
 ST_CLS            90.72       90.48      89.91    90.18        FL           79.67            62.15   63.00    62.05
                                                                CE           80.81            67.79   56.70    59.63
 ST_ATT            90.98       90.80      90.12    90.43        FL           80.94            70.22   62.12    64.60
                                                                CE           80.43            64.87   60.15    61.99
 ST_VHATT          90.85       90.50      90.20    90.34        FL           80.94            68.43   61.79    63.96


  Table 3 presents the obtained results on the dev set using the three multi-task models for
both Task 1 and Task 2. The overall obtained results show that the MT_ATT outperforms all
other models on both tasks for most evaluation measures. The results demonstrate that using
the FL loss for Task 2 improves also the model’s performance on Task 1 in multi-task settings.
    1
        https://pytorch.org/
    2
        https://huggingface.co/transformers/
Table 2
ST_ATT model’s classification reports on the dev set of Task 2 using CE and FL loss functions.
                                                CE loss                                 FL loss
 Category                                                                                                            Support
                                   Precision        Recall        F1        Precision      Recall        F1
 None                                  0.8509       0.8954   0.8726            0.8845      0.8758      0.8801          306
 Damning                               0.8841       0.9104   0.8971            0.8955      0.8955      0.8955           67
 Derailing                             0.2500       0.0909   0.1333            0.4286      0.2727      0.3333           11
 Discredit                             0.8247       0.8362   0.8304            0.7980      0.8397      0.8183          287
 Dominance                             0.3636       0.3636   0.3636            0.4375      0.3182      0.3684           22
 Sexual harassment                     1.0000       0.3333   0.5000            1.0000      0.5000      0.6667           6
 Stereotyping & objectification        0.6786       0.5846   0.6281            0.6897      0.6154      0.6504           65
 Threat of violence                    0.5714       0.5217   0.5455            0.4839      0.6522      0.5556           23


Table 3
The obtained results on the dev set using the three multi-task models for both Task 1 and Task 2.
                                                Task 1                                              Task 2
 Model        Cat. Task Loss   Accuracy     Precision     Recall       F1       Accuracy     Precision       Recall    F1
                   CE          90.98        90.39         90.72        90.55    79.67        67.68           57.02     60.18
 MT_CLS            FL          91.49        91.34         90.66        90.97    80.43        67.65           60.55     62.92
                   CE          91.11        91.48         89.75        90.46    80.56        66.80           58.63     60.52
 MT_ATT            FL          91.74        91.42         91.16        91.28    80.81        67.90           61.55     63.29
                   CE          91.11        90.81         90.40        90.60    80.18        66.91           57.39     60.01
 MT_VHATT          FL          91.49        91.19         90.84        91.01    80.05        66.82           58.92     61.67


In accordance with the obtained results using single-task models, MT_VHATT shows slightly
better performances on Task 1 than ST_CLS model. The overall obtained show that muti-task
learning models surpass their single-task counterparts on Task 1. This can be explained by the
fact MT models leverage signals from both tasks [19, 20].

4.3. Official submissions results
Based on the obtained results on the development set, we have submitted models that are trained
using the FL for misogyny categorization (Task 2). This choice is motivated by the fact that
the FL loss has lead to better F1 scores (Tables 1 and 3) than CE loss on the dev set. Our three
official submissions are described as follows:
    • run1: corresponds to the submission of the obtained results on both tasks using the
      single-task model ST_ATT.
    • run2: corresponds to the obtained results on both tasks using the multi-task model
      MT_ATT.
    • run3: corresponds to the ensembling of the three multi-task learning models, namely
      MT_CLS, MT_ATT, and MT_VHATT models. In this submission, the logits of the three
      models are averaged. Depending on the task, either the sigmoid or the softmax activation
      is applied to get the labels probabilities.
Table 4
Top five submitted systems’s performance on ArMI Task 1.

                                        Accuracy    Precision   Recall    F1
                    UM6P-NLP_run3         91.9         92        90.9    91.4
                    UM6P-NLP_run2         91.5        91.5       90.5     91
                    UM6P-NLP_run1         91.5        91.1       91.1    91.1
                    UoT_run1              90.5        90.1       89.9     90
                    SOA_NLP_run1          88.3        87.8       87.6    87.7


Table 5
Top five submitted systems’s performance on ArMI Task 2.

                                        Accuracy   Precision    Recall   F1
                    UM6P-NLP_run2         82.7        69.7      64.7     66.5
                    UM6P-NLP_run3         83.3        71.7      63.6     65.3
                    UM6P-NLP_run1         81.6        69.2      65.2     65.1
                    SOA_NLP_run2          76.4        67.6        48     53.1
                    SOA_NLP_run3          74.5        54.9       50.8    52.6

   Tables 4 and 5 summaries the official results of top five submitted systems to Task 1 and
Task 2 respectively. The results show that all our submissions are ranked top three among
all submitted systems. In accordance with our previous results, our multi-task models have
achieved the first and the second-ranking positions. Although the ensembling of the three MTL
models (run3) has yielded the best performances on most evaluation measures for both tasks,
the best F1 score for Task 2 is obtained by run2 (MT_ATT model).


5. Conclusion
In this paper, we have presented our participating system in the first Arabic Misogyny Identifi-
cation shared task. We have investigated three Multi-Task Learning models and their single-task
counterparts using the pre-trained MARBERT encoder. In order to deal with class labels imbal-
ance for Task 2, we have employed the Focal Loss. The results show that our three submitted
systems are top-ranked among the participating systems to both ArMI tasks. The overall
obtained results demonstrate that MTL models outperform their single-task versions in most
evaluation scenarios. Besides, the Focal Loss has shown effective performances, especially on
F1 measures.


Acknowledgments
Experiments presented in this paper were carried out using the supercomputer simlab-cluster,
supported by Mohammed VI Polytechnic University (https://www.um6p.ma), and facilities of
simlab-cluster HPC & IA platform.
References
 [1] M. Moloney, T. P. Love, Assessing online misogyny: Perspectives from sociology and
     feminist media studies, Sociology Compass 12 (2018).
 [2] B. Poland, Haters: Harassment, abuse, and violence online, 2016.
 [3] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
     transformers for language understanding, in: Proceedings of the 2019 Conference of
     the North American Chapter of the Association for Computational Linguistics: Human
     Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
     Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/
     N19-1423. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 4 2 3 .
 [4] W. Antoun, F. Baly, H. Hajj, AraBERT: Transformer-based model for Arabic language
     understanding, in: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and
     Processing Tools, with a Shared Task on Offensive Language Detection, European Language
     Resource Association, Marseille, France, 2020, pp. 9–15. URL: https://aclanthology.org/
     2020.osact-1.2.
 [5] M. Abdul-Mageed, A. Elmadany, E. M. B. Nagoudi, ARBERT & MARBERT: Deep bidi-
     rectional transformers for Arabic, in: Proceedings of the 59th Annual Meeting of the
     Association for Computational Linguistics and the 11th International Joint Conference on
     Natural Language Processing (Volume 1: Long Papers), Association for Computational
     Linguistics, Online, 2021, pp. 7088–7105. URL: https://aclanthology.org/2021.acl-long.551.
     doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . a c l - l o n g . 5 5 1 .
 [6] A. El Mekki, A. El Mahdaouy, I. Berrada, A. Khoumsi, Domain adaptation for Arabic cross-
     domain and cross-dialect sentiment analysis from contextualized word embedding, in: Pro-
     ceedings of the 2021 Conference of the North American Chapter of the Association for Com-
     putational Linguistics: Human Language Technologies, Association for Computational Lin-
     guistics, Online, 2021, pp. 2824–2837. URL: https://aclanthology.org/2021.naacl-main.226.
     doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . n a a c l - m a i n . 2 2 6 .
 [7] A. El Mekki, A. El Mahdaouy, K. Essefar, N. El Mamoun, I. Berrada, A. Khoumsi, BERT-based
     multi-task model for country and province level MSA and dialectal Arabic identification,
     in: Proceedings of the Sixth Arabic Natural Language Processing Workshop, Association
     for Computational Linguistics, Kyiv, Ukraine (Virtual), 2021, pp. 271–275. URL: https:
     //aclanthology.org/2021.wanlp-1.31.
 [8] A. El Mahdaouy, A. El Mekki, K. Essefar, N. El Mamoun, I. Berrada, A. Khoumsi, Deep
     multi-task model for sarcasm detection and sentiment analysis in Arabic language, in:
     Proceedings of the Sixth Arabic Natural Language Processing Workshop, Association
     for Computational Linguistics, Kyiv, Ukraine (Virtual), 2021, pp. 334–339. URL: https:
     //aclanthology.org/2021.wanlp-1.42.
 [9] N. Safi Samghabadi, P. Patwa, S. PYKL, P. Mukherjee, A. Das, T. Solorio, Aggression and
     misogyny detection using BERT: A multi-task approach, in: Proceedings of the Second
     Workshop on Trolling, Aggression and Cyberbullying, European Language Resources
     Association (ELRA), Marseille, France, 2020, pp. 126–131. URL: https://aclanthology.org/
     2020.trac-1.20.
[10] E. Fersini, D. Nozza, P. Rosso, AMI @ EVALITA2020: automatic misogyny identification, in:
     V. Basile, D. Croce, M. D. Maro, L. C. Passaro (Eds.), Proceedings of the Seventh Evaluation
     Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop
     (EVALITA 2020), Online event, December 17th, 2020, volume 2765 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2020.
[11] F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, L. Plaza, Automatic classification of sexism
     in social networks: An empirical study on twitter data, IEEE Access 8 (2020) 219563–219576.
     doi:1 0 . 1 1 0 9 / A C C E S S . 2 0 2 0 . 3 0 4 2 6 0 4 .
[12] H. Mulki, B. Ghanem, Let-mi: An Arabic Levantine Twitter dataset for misogynistic
     language, in: Proceedings of the Sixth Arabic Natural Language Processing Workshop,
     Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 2021, pp. 154–163. URL:
     https://aclanthology.org/2021.wanlp-1.16.
[13] H. Mulki, B. Ghanem, ArMI at FIRE2021: Overview of the First Shared Task on Arabic
     Misogyny Identification, in: Working Notes of FIRE 2021 - Forum for Information Retrieval
     Evaluation, CEUR, 2021.
[14] T. Lin, P. Goyal, R. B. Girshick, K. He, P. Dollár, Focal loss for dense object detection, CoRR
     abs/1708.02002 (2017). URL: http://arxiv.org/abs/1708.02002. a r X i v : 1 7 0 8 . 0 2 0 0 2 .
[15] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
     and translate, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learn-
     ing Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
     Proceedings, 2015. URL: http://arxiv.org/abs/1409.0473.
[16] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, Hierarchical attention networks for
     document classification, in: Proceedings of the 2016 Conference of the North American
     Chapter of the Association for Computational Linguistics: Human Language Technologies,
     Association for Computational Linguistics, San Diego, California, 2016, pp. 1480–1489.
     URL: https://www.aclweb.org/anthology/N16-1174. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 6 - 1 1 7 4 .
[17] K. Essefar, A. El Mekki, A. El Mahdaouy, N. El Mamoun, I. Berrada, CS-UM6P at SemEval-
     2021 task 7: Deep multi-task learning model for detecting and rating humor and offense,
     in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-
     2021), Association for Computational Linguistics, Online, 2021, pp. 1135–1140. URL: https:
     //aclanthology.org/2021.semeval-1.159. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . s e m e v a l - 1 . 1 5 9 .
[18] N. El Mamoun, A. El Mahdaouy, A. El Mekki, K. Essefar, I. Berrada, CS-UM6P at SemEval-
     2021 task 1: A deep learning model-based pre-trained transformer encoder for lexical
     complexity, in: Proceedings of the 15th International Workshop on Semantic Evaluation
     (SemEval-2021), Association for Computational Linguistics, Online, 2021, pp. 585–589.
     URL: https://aclanthology.org/2021.semeval-1.73. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . s e m e v a l - 1 . 7 3 .
[19] R. Caruana, Learning many related tasks at the same time with backpropagation, in:
     Proceedings of the 7th International Conference on Neural Information Processing Systems,
     NIPS’94, MIT Press, Cambridge, MA, USA, 1994, p. 657–664.
[20] Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, H. Wang, Ernie 2.0: A continual pre-training
     framework for language understanding, arXiv preprint arXiv:1907.12412 (2019).