=Paper=
{{Paper
|id=Vol-2943/detoxis_paper4
|storemode=property
|title=Alejandro Mosquera at DETOXIS 2021: Deep Learning Approaches to Toxicity Detection in Spanish Social Media Texts
|pdfUrl=https://ceur-ws.org/Vol-2943/detoxis_paper4.pdf
|volume=Vol-2943
|authors=Alejandro Mosquera López
|dblpUrl=https://dblp.org/rec/conf/sepln/Lopez21
}}
==Alejandro Mosquera at DETOXIS 2021: Deep Learning Approaches to Toxicity Detection in Spanish Social Media Texts==
<pdf width="1500px">https://ceur-ws.org/Vol-2943/detoxis_paper4.pdf</pdf>
<pre>
         Alejandro Mosquera at DETOXIS 2021:
      Deep Learning Approaches to Toxicity Detection in
                 Spanish Social Media Texts

                 Alejandro Mosquera López1[0000−0002−6020−3569]

    Broadcom Corporation, 1320 Ridder Park Drive San Jose, 95131 California, USA
                        alejandro.mosquera@broadcom.com


        Abstract. This paper presents the system submitted to the DETOXIS
        2021 challenge for detecting toxicity in Spanish social media texts. The
        chosen approach relies on an ensemble of different neural network ar-
        chitectures including thread and topic features as side information. For
        sub-task 1, we have also applied machine translation in order to reuse
        linguistic resources from other languages such as English. Our best sub-
        mission scored 0.569 F1 in the test set, ranking 6th out of 31 competing
        teams.

        Keywords: Toxicity detection · Spanish · Social Media · Machine trans-
        lation · Text Normalization · Deep learning · Capsule networks.


1     Introduction

News websites allow million of users to share and discuss their opinions publicly
in near real-time every day. Such large reach and constantly increasing user base
present challenges for content moderation teams, which not only need to fight
affiliate and cyber-crime operators but also less traditional forms of messaging
abuse such as the spread of hate, propaganda and fake news.
     While social media platforms are under increasingly pressure to swiftly deal
with the spread of toxic content, the use of over-aggressive filtering models and
the under-representation of certain user groups in the training data can also have
negative consequences if false positives happen at large scale [23].
     Because of the aforementioned reasons, the automatic detection of toxic lan-
guage in social media has received growing attention from the NLP research
community in the last few years, which is also reflected in the number of public
evaluations and resources recently focused on this area: e.g. HASOC [14] for hate
speech and aggressive content, TRAC [8] for identifying aggression, HatEval [1]
    IberLEF 2021, September 2021, Málaga, Spain.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
for detecting hate speech against women and immigrants, OffensEval-2019 [24]
and OffensEval-2020 [25], both for identifying and categorizing offensive lan-
guage.
    This paper evaluates our participation in the shared task DETOXIS [22] of
IberLEF-2021 for the subtask 1: Toxicity detection of Spanish comments posted
in response to news articles related to immigration, using an ensemble of neural
networks. The rest of the document is organised as follows: In section 2, related
work is reviewed. In Section 3 we describe our system and approach. In Section
4 we evaluate the obtained results. Finally, in Section 5 we draw our conclusions
and outline potential future work.


2     Related Work

Best performing approaches for toxicity detection follow the recent advances in
neural networks for NLP: Liu et al. [10] and Zhu et al. [26] leveraged bidirectional
transformers by fine-tuning BERT [5] embeddings. Earlier architectures such as
convolutional neural networks (CNN) and bidirectional LSTMs (bi-LSTMs) can
also obtain strong results [12] when paired with pre-trained embeddings such as
FastText [2], GloVe [19] or word2vec [15]. Finally, combining different models
and features helps reducing bias and variance, examples are voting ensembles
[21] and stacked generalization [13].


3     System Description

Since the first sub-task was only focused on determining if a comment is either
toxic or not, we have treated it as a binary classification problem.


3.1   Pre-processing

Social media texts usually contain informal lexical variants and out-of-vocabulary
words which can be difficult to understand not only for humans but also for
NLP tools and applications [18]. For this reason, we have applied a text nor-
malization filter in order to reduce out-of-vocabulary words (OOV) by using a
lexical normalization dictionary which is recursively combined with shortening
and lengthening rules [17].


3.2   Data Augmentation

Data augmentation is a popular technique that can increase the volume and
diversity of the training data for many applications including NLP [9]. While
we have only used the NewsCom-TOX dataset provided by the organization for
training purposes, in order to reuse publicly available pre-trained resources for
the English language we have also generated a parallel dataset in English by
using the Google Translate API.
3.3   Models
The list of models that our system comprises of is as follows:
 – capsule es Neural network with a capsule network architecture [20] using
   SBWC [3] i25 GloVe Spanish embeddings.
 – capsule en Neural network with a capsule network architecture using GloVe
   840B-300d English embeddings.
 – detox orig Detoxify original [6], a pre-trained BERT model that detects
   toxicity in English texts.
 – detox unb Detoxify unbiased, a pre-trained RoBERTa [11] model that rec-
   ognizes toxicity in English texts and minimizes unintended biases with re-
   spect to mentions of identities.
 – detox multi Detoxify multilingual, a pre-trained XLM [4] model that de-
   tects toxicity in English texts.
 – detox multi es Detoxify multilingual, a pre-trained XLM model that de-
   tects toxicity in Spanish texts.
 – spacylr Logistic regression model trained using Spacy [7] Spanish embed-
   dings.

3.4   Side Information
In addition to the actual comments, non-textual metadata was made available as
part of the training dataset such as topic, thread id, comment id and reply to.
These were used in order to engineer extra features for the stacking model as
side information:
 – topic words max The maximum word-wise toxicity score in a comment
   after averaging all the capsule es model probabilities of the individual words
   across the training data by topic.
 – topic words avg The average word-wise toxicity score in a comment after
   averaging all the capsule es model probabilities of the individual words across
   the training data by topic.
 – avg group tox The average toxicity score determined by the capsule es
   model for all the comments with the same thread id.
Since there was no topic information in the test dataset, we have considered it as
a separate topic when computing the features above. Although inaccurate (the
test data had comments from the same set of topics as train) it did not impact
negatively in the final results.

3.5   Stacking Model
Due the relatively small amount of training data in the NewsCom-TOX corpus
(less than 4000 samples, only 1147 positive) we went for an stacked generalization
strategy, where the soft probabilities calculated from several models are used
as features with the original labels against the stacking model. This not only
reduces the computing resources needed in order to tune hyperparameters and
perform cross-validation, but can also achieve competitive results even with just
pre-trained models [16].
    Our stacking model was logistic regression with a custom threshold of 0.32,
which was determined via cross-validation. The latter was required because of
the class imbalance and the unusual evaluation metric used in this sub-task
(F1 of the toxicity class rather than micro or macro averages) which favours
aggressive models towards the positive class.
    The most important features by considering the regression coefficients can
be seen at Figure 1. From there we can determine that capsule networks and
avg group tox are the strongest features for detecting the toxic and non-toxic
class respectively.


                                 Fig. 1. figure
      Feature importance based on the LR coefficients of the stacking model.


4   Results

Our toxicity detection system obtained promising results as shown in Table 1:
It ranked 6th/31, with a difference in F1 of only 0.077 when compared against
the winning system. It is also worth mentioning that only 16 systems (out of
31) achieved better F1 score than the AllToxic benchmark, which highlights the
difficulty of this sub-task for the chosen evaluation metric.
System             F1 Toxic Model          F1 Toxic Model              F1 Toxic
SINAI (best)       0.6461   capsule es     0.5040   capsule es         0.5168
Alejandro Mosquera 0.5691   capsule en     0.4872   capsule en         0.5299
AllToxic           0.4231   spacylr        0.4833   spacylr            0.5156
RandomClassifier   0.3760   detox unb      0.4117   detox unb          0.4671
ChainBOW           0.3746   detox multi    0.4053   detox multi        0.4430
BOWClassifier      0.1837   detox multi es 0.3887   detox multi es     0.4209
                            detox orig     0.3542   detox orig         0.4237
                                                    Alejandro Mosquera 0.5813

Table 1. Partial results table for the test set (left) results of individual models in the
ensemble for the test set (middle) and out-of-fold validation scores for the train set
(right).


   With regards to our individual models, we can observe that they are weaker,
only 3 out of 7 would beat the AllToxic baseline, and exhibit higher variance
between train and test scores than the final stacking ensemble. However, a post-
workshop analysis showed that removing the weakest models would have not
improved the final score.


5    Conclusions
In this paper we describe the system for detecting toxicity in Spanish social me-
dia texts engineered for DETOXIS 2021 sub-task 1. Since the amount of training
data was relatively small, different strategies were applied in order to overcome
this limitation, such as performing data augmentation through machine transla-
tion and leveraging pre-trained models using larger toxicity datasets. Our best
submission was a logistic regression ensemble using neural network predictions
and side information features extracted from thread and topic metadata.


References
 1. Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Pardo, F.M.R., Rosso,
    P., Sanguinetti, M.: Semeval-2019 task 5: Multilingual detection of hate speech
    against immigrants and women in twitter. In: Proceedings of the 13th International
    Workshop on Semantic Evaluation. pp. 54–63 (2019)
 2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
    subword information. Transactions of the Association for Computational Linguis-
    tics 5, 135–146 (2017), https://www.aclweb.org/anthology/Q17-1010
 3. Cardellino, C.: Spanish Billion Words Corpus and Embeddings (August 2019),
    https://crscardellino.github.io/SBWCE/
 4. CONNEAU, A., Lample, G.: Cross-lingual language model pretraining. In: Wal-
    lach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett,
    R. (eds.) Advances in Neural Information Processing Systems. vol. 32. Cur-
    ran Associates, Inc. (2019), https://proceedings.neurips.cc/paper/2019/file/
    c04c19c2c2474dbf5f7ac4372c5b9af1-Paper.pdf
 5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidi-
    rectional transformers for language understanding (2018)
 6. Hanu, L., Unitary team: Detoxify. Github. https://github.com/unitaryai/detoxify
    (2020)
 7. Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy:
    Industrial-strength Natural Language Processing in Python (2020).
    https://doi.org/10.5281/zenodo.1212303,                 https://doi.org/10.5281/
    zenodo.1212303
 8. Kumar, R., Reganti, A.N., Bhatia, A., Maheshwari, T.: Aggression-annotated Cor-
    pus of Hindi-English Code-mixed Data. In: chair), N.C.C., Choukri, K., Cieri, C.,
    Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., Mazo,
    H., Moreno, A., Odijk, J., Piperidis, S., Tokunaga, T. (eds.) Proceedings of the
    Eleventh International Conference on Language Resources and Evaluation (LREC
    2018). European Language Resources Association (ELRA), Miyazaki, Japan (May
    7-12, 2018 2018)
 9. Li, Y., Li, X., Yang, Y., Dong, R.: A diverse data augmentation strat-
    egy for low-resource neural machine translation. Information 11(5) (2020).
    https://doi.org/10.3390/info11050255, https://www.mdpi.com/2078-2489/11/5/
    255
10. Liu, P., Li, W., Zou, L.: NULI at SemEval-2019 task 6: Transfer learning
    for offensive language detection using bidirectional transformers. In: Proceed-
    ings of the 13th International Workshop on Semantic Evaluation. pp. 87–
    91. Association for Computational Linguistics, Minneapolis, Minnesota, USA
    (Jun 2019). https://doi.org/10.18653/v1/S19-2011, https://www.aclweb.org/
    anthology/S19-2011
11. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M.,
    Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining
    approach (2019), http://arxiv.org/abs/1907.11692, cite arxiv:1907.11692
12. Mahata, D., Zhang, H., Uppal, K., Kumar, Y., Shah, R.R., Shahid, S., Mehnaz,
    L., Anand, S.: MIDAS at SemEval-2019 task 6: Identifying offensive posts and
    targeted offense from twitter. In: Proceedings of the 13th International Workshop
    on Semantic Evaluation. pp. 683–690. Association for Computational Linguistics,
    Minneapolis, Minnesota, USA (Jun 2019). https://doi.org/10.18653/v1/S19-2122,
    https://www.aclweb.org/anthology/S19-2122
13. Malmasi, S., Zampieri, M.: Challenges in Discriminating Profanity from Hate
    Speech. Journal of Experimental & Theoretical Artificial Intelligence 30, 1–16
    (2018)
14. Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., Patel,
    A.: Overview of the hasoc track at fire 2019: Hate speech and offensive content
    identification in indo-european languages. In: Proceedings of the 11th Forum for
    Information Retrieval Evaluation. pp. 14–17 (2019)
15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representa-
    tions of words and phrases and their compositionality. In: Proceedings of the 26th
    International Conference on Neural Information Processing Systems - Volume 2.
    p. 3111–3119. NIPS’13, Curran Associates Inc., Red Hook, NY, USA (2013)
16. Mosquera, A.: Amsqr at SemEval-2020 task 12: Offensive language detection us-
    ing neural networks and anti-adversarial features. In: Proceedings of the Four-
    teenth Workshop on Semantic Evaluation. pp. 1898–1905. International Com-
    mittee for Computational Linguistics, Barcelona (online) (Dec 2020), https:
    //www.aclweb.org/anthology/2020.semeval-1.250
17. Mosquera, A., Lloret, E., Moreda, P.: Towards facilitating the accessibility of web
    2.0 texts through text normalisation. In: Proceedings of the LREC Workshop:
    Natural Language Processing for Improvign Textual Accessibility (NLP4ITA). pp.
    9–14 (2012)
18. Mosquera, A., Moreda, P.: The use of metrics for measuring informality levels in
    web 2.0 texts. In: Proceedings of the 8th Brazilian Symposium in Information and
    Human Language Technology (2011), https://www.aclweb.org/anthology/W11-
    4523
19. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word repre-
    sentation. In: Proceedings of the 2014 Conference on Empirical Methods in Nat-
    ural Language Processing (EMNLP). pp. 1532–1543. Association for Computa-
    tional Linguistics, Doha, Qatar (Oct 2014). https://doi.org/10.3115/v1/D14-1162,
    https://www.aclweb.org/anthology/D14-1162
20. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Pro-
    ceedings of the 31st International Conference on Neural Information Processing
    Systems. p. 3859–3869. NIPS’17, Curran Associates Inc., Red Hook, NY, USA
    (2017)
21. Seganti, A., Sobol, H., Orlova, I., Kim, H., Staniszewski, J., Krumholc,
    T., Koziel, K.: NLPR@SRPOL at SemEval-2019 task 6 and task 5: Lin-
    guistically enhanced deep learning offensive sentence classifier. In: Proceed-
    ings of the 13th International Workshop on Semantic Evaluation. pp. 712–
    721. Association for Computational Linguistics, Minneapolis, Minnesota, USA
    (Jun 2019). https://doi.org/10.18653/v1/S19-2126, https://www.aclweb.org/
    anthology/S19-2126
22. Taulé, M., Ariza, A., Nofre, M., Amigó, E., Rosso, P.: Overview of the detoxis task
    at iberlef-2021: Detection of toxicity in comments in spanish. In: Procesamiento
    del Lenguaje Natural, Vol. 67 (2021)
23. Vincent, J.: Youtube brings back more human moderators after ai systems over-
    censor (Sep 2020), Online.https://web.archive.org/web/20210515080450/
    https://www.theverge.com/2020/9/21/21448916/youtube-automated-
    moderation-ai-machine-learning-increased-errors-takedowns
24. Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.:
    SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social
    Media (OffensEval). In: Proceedings of The 13th International Workshop on Se-
    mantic Evaluation (SemEval) (2019)
25. Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak,
    H., Derczynski, L., Pitenis, Z., Çöltekin, c.: SemEval-2020 Task 12: Multilingual
    Offensive Language Identification in Social Media (OffensEval 2020). In: Proceed-
    ings of SemEval (2020)
26. Zhu, J., Tian, Z., Kübler, S.: UM-IU@LING at SemEval-2019 task 6:
    Identifying offensive tweets using BERT and SVMs. In: Proceedings of
    the 13th International Workshop on Semantic Evaluation. pp. 788–795.
    Association for Computational Linguistics, Minneapolis, Minnesota, USA
    (Jun 2019). https://doi.org/10.18653/v1/S19-2138, https://www.aclweb.org/
    anthology/S19-2138

</pre>