=Paper= {{Paper |id=Vol-3395/T4-4 |storemode=property |title=Exploring Language Independent Linguistic Features and Transformers in a Multi-label Emotion Detection Challenge in Urdu using Nastalīq Script |pdfUrl=https://ceur-ws.org/Vol-3395/T4-4.pdf |volume=Vol-3395 |authors=José Antonio García Díaz,Manuel Valencia-García,Gema Alcaraz Mármol,Rafael Valencia-Garcia |dblpUrl=https://dblp.org/rec/conf/fire/Garcia-DiazVAV22 }} ==Exploring Language Independent Linguistic Features and Transformers in a Multi-label Emotion Detection Challenge in Urdu using Nastalīq Script== https://ceur-ws.org/Vol-3395/T4-4.pdf
Exploring Language Independent Linguistic Features
and Transformers in a Multi-label Emotion Detection
Challenge in Urdu using Nastalīq Script
José Antonio García-Díaz1 , Manuel Valencia-García1 , Gema Alcaraz Mármol2 and
Rafael Valencia-García1
1
    Facultad de Informática, Universidad de Murcia, Campus de Espinardo, 30100, Spain
2
    Departamento de Filología Moderna, Universidad de Castilla-La Mancha, Spain


                                         Abstract
                                         Emotion Analysis is a Natural Language Processing task whose objective is to obtain fine-grained
                                         emotions from a text. The understanding of emotions in written communication has applications in
                                         marketing, e-commerce and infodemiology among others. Besides, Emotion Analysis can be applied to
                                         identify threats that could represent a threat to citizens, from a Smart City perspective. In this working
                                         notes we describe the participation of the UMUTeam in the EmoThreat shared task, proposed at FIRE’2022
                                         workshop. Out of the subtasks proposed, our team only participated in the main subtask, which consisted
                                         in a multi-label emotion classification based on Ekman’s six basic emotions in documents written in Urdu
                                         using Nastalīq script. We achieved the second best result, from a total of 8 participants, achieving 66.9% of
                                         macro average F1-score. Our proposal combines in the same neural network four feature sets that include
                                         a subset of language independent linguistic features extracted from UMUTextStats, a non-contextual
                                         sentence embeddings from fastText and two contextual sentence embeddings from multilingual versions
                                         of BERT and RoBERTA.

                                         Keywords
                                         Feature Engineering, Deep-learning, Transformers, Linguistic Features, Natural Language Processing




1. Introduction
The proliferation of social media platforms has made it easier for people all over the world
to communicate and share experiences. It has also provided benefits in international trade
and improved public health policies, as social media posts can refer threats that potentially
endanger citizens. Natural Language Processing (NLP) tools provide a useful way to process
and to understand what the users want to express in an automatic manner. However, NLP
tools have some challenges, highlight that some of the methods and state-of-the-art tools and
datasets are based on English, and the fact that natural language is complex to understand, as it
is highly subjective.


FIRE 2022: Forum for Information Retrieval Evaluation, December 9–13, 2022, India
Envelope-Open joseantonio.garcia8@um.es (J. A. García-Díaz); manuelv@um.es (M. Valencia-García); Gema.Alcaraz@uclm.es
(G. A. Mármol); valencia@um.es (R. Valencia-García)
Orcid 0000-0002-3651-2660 (J. A. García-Díaz); 0000-0001-7703-3829 (G. A. Mármol); 0000-0003-2457-1791
(R. Valencia-García)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   The organisers of the EmoThreat 2022 shared task [1] release a dataset for conducting multi-
label emotion detection in written Urdu using Nastalīq script (written from right to left). Urdu
language is spoken by more than 170 million people worldwide, highlighting India, Pakistan
and Nepal. The underlying objective of this shared task is to understand public emotions from
social networks applicable in NLP tools that can help to monitor events such as disasters, or to
improve public policies in e-commerce and public health.
   These working notes describe the participation of the UMUTeam at EmoThreat 2022 shared
task [2], proposed in FIRE 2022 [3]. The main challenge in this shared task is a multi-label
emotion classification task in Urdu [4]. The participants of the shared task are required to
classify each document with one, or more of the Ekman’s six basic emotions (plus one neutral
emotion).
   It is worth noting that our team has previous experience dealing with emotion classification.
Specifically, we participate in the EmoEvalEs 2021 shared task [5], achieving the sixth position
[6]. This shared task consisted in a multi-classification emotion detection with texts written in
Spanish. However, this shared task allowed our team to continue validating subsets of language
independent linguistic features tools that we have already applied in other languages such as
Tamil [7].
   The remainder of these working notes is organised as follows. First, Section 2 provides
a review of related work focused in Urdu and emotion detection. Second, in Section 3, the
developed pipeline for solving this task is described. Third, Section 4 includes the results
achieved in the challenge and a comparison with the rest of runs submitted by our team and by
the rest of the participants. Forth, Section 5 presents the findings obtained and it also includes
some promising research lines.


2. Related work
The EmoThreat 2022 shared task is a continuation of a previously shared task [8]. In the previous
edition of this shared task, the organisers proposed two challenges of abusive and threatening
language detection in Urdu. The past shared task consisted only in binary classification tasks,
one for detecting abusive documents (2400 documents for training, 1100 for testing) and another
shared task for detecting threatening messages (6000 documents for training, 3950 documents
for testing). The datasets were extracted from the micro-blogging platform Twitter. A total of
10 teams submitted their proposals for the abusive classification task and 9 for the threatening
classification task. The best result was achieved with a F1-score value of 0.880 for Subtask A
and 0.545 for Subtask B, using both run architectures based on Transformers.
   Sentiment Analysis is another NLP field that has been explored in Urdu. Recent works such
as the one described at [9] performed a multi-classification task from an Urdu dataset of 9312
reviews manually annotated compiled from user reviews about food, movies, apps, politics and
sports. The dataset was annotated with three labels (positive, negative and neutral). The authors
explored different baselines based on traditional machine-learning, deep-learning and models
based on multilingual Transformers. Their experiments confirmed that multilingual BERT
outperforms traditional models for Urdu, reaching an F1 score of 81.49%. In [10], the authors
compile a dataset in Urdu for sentiment analysis and evaluate several traditional machine
learning classifiers. The features were extracted using count-based techniques and pre-trained
word embeddings from fastText. The authors found that the combination of features of these
features outperformed the results achieved separately and compared with existing state-of-the-
art approaches, reaching a F1-score of 82.05%. Another relevant work focused on Sentiment
Analysis in Urdu is [11], in which the authors [11] evaluated several word embeddings using an
architecture based on convolutional and recurrent neural networks, combined with traditional
machine learning classifiers for the final classification. The authors evaluate their proposal
with four corpora. Among the different architectures evaluated, the authors achieve their best
results using a classifier based on Support Vector Machines and features based on Word2Vec
based on Continuous Bag of Words.


3. Methodology
The first step of our pipeline is to explore the dataset and to create a custom validation split.
The validation split is created using a stratified sample in a ratio of 80-20. In Table 1 it can
be observed that there is an important imbalance among the emotions, being fear, anger, and
disgust underrepresented.

Table 1
Dataset distribution of the training and custom validation splits
                                    sentiment   train val     total
                                    anger         656 155      811
                                    disgust       616 145      761
                                    fear          495 114      609
                                    happiness     841 205     1046
                                    neutral     2412 602      3014
                                    sadness     1760 430      2190
                                    surprise    1246 304      1550
                                    total       8026 1955     9981

   As we deal with a multi-label classification challenge, we analyse the co-occurrence of
emotions (see Figure 1). As it can be observed, anger and disgust are the two sentiments that
usually appear together in the same tweet. It is also noticed that the neutral class is not used
combined with other sentiments.
   The next step in our pipeline is the feature extraction. Four different feature sets are involved
in our participation. The first feature set is a subset of language independent linguistic features
extracted with UMUTextStats (LF) [12, 13, 14, 15]. The second feature set are non-contextual
sentence embeddings from FastText (SE) [16]. The third and forth feature sets correspond to
multilingual contextual embeddings from BERT (BF) [17] and RoBERTa (RF) [18].
   To obtain the contextual sentence embeddings from BF and RF we do hyperparameter tuning.
A total of 20 transformers models (10 for BF, 10 for RF) were trained using the EmoThreat
training split, and deciding which the best model is by using our custom validation split. From
the best model, we extracted [CLS] token [19]. The hyperparameters involved in this process
are 1) the weight decay, 2) the batch size, 3) the warm-up speed, 4) the number of epochs, and
the 5) learning rate. The combination of these hyperparameters is performed using Tree of
Parzen Estimators (TPE) [20].
   Once all feature sets had been extracted, we evaluated two strategies for combining the
strengths of each one. The first strategy is called Knowledge Integration (KI), and consists in
training a multi-input deep-learning model that combines all feature sets at once. The second
strategy involves ensemble learning (EL), which combines the predictions of models focused on
one specific feature set. For this, we obtain a model for each feature set using hyperparameter
tuning (described below) and then, we evaluate two ways to use ensemble learning. The first
strategy is soft voting, which consists in calculating the mode of the predictions. The second
strategy corresponds to average probabilities, which consists in averaging the probabilities
predicted of each individual model to generate the final prediction.
   Regardless the training of the KI or the ensemble learning, we perform a hyperparameters
tuning stage. The hyperparameters involved are the shape of the network (that is, the number
of neurons and the number of hidden layers), the dropout mechanism, the learning rate and
several activation functions. The results of the hyperparameter tuning stage can be found in
Table 2.
   In all cases, the best result is achieved with shallow neural networks (that is, neural networks
with only one or two hidden layers, and the same number of neurons in all layers). Besides,
except for RF, all experiments achieved better results with a small dropout rate of .1. The learning
rate varies, being 0.001 for SE, RF and KI. In case of the activation function, all experiments
achieved better results with non-linear activation functions except LF.



                         1       0.68     0.11      0.028     0       0.11      0.11
                  r
               ge
              an




                        0.63      1       0.094     0.015     0       0.14      0.12
                  st
               gu
             dis




                       0.081     0.075     1        0.058     0       0.14     0.031
                   r
               fea
                ss




                       0.036     0.021     0.1           1    0       0.033     0.11
              ine
              pp
            ha




                         0        0        0             0    1        0           0
                al
              utr
              ne
                 ss




                        0.29     0.39     0.51      0.069     0        1        0.44
              ne
             sad
               ise




                        0.21     0.25     0.079      0.16     0       0.31         1
              rpr
             su




                                   st
                        r




                                          r




                                                     ess




                                                              al




                                                                       ss




                                                                                ise
                       ge




                                         fea




                                                             utr
                                 gu




                                                                      ne




                                                                               rpr
                                                   pin
                       an




                                dis




                                                                     sad
                                                             ne




                                                                              su
                                                     p
                                                  ha




Figure 1: Label co-occurrence
4. Results and analysis
First, we report in Table 3 the macro average results achieved by each feature set (for the
EL strategy) and the KI strategy. It can be observed that the best results achieved separately
are obtained with RF. However, when combined with the rest of the feature sets, the recall is
higher and the precision is lower. As it is expected, the results achieved by LF are limited, as
they are based on stylometry and PoS features. It draw out attention the limited recall of the
embeddings based on BERT compared with the embeddings based on RoBERTa. As we deal
with classification tasks, we consider that this difference is not related to the tasks in which
these models has been trained (Next Sentence Prediction and Masked Language Model), but
with the tokenizer and the dataset used to learn the embeddings.
   Next, we report in Table 4 the results per emotion achieved with the KI strategy using
the custom validation split. It can be observed that the model reaches almost a perfect score
concerning documents without attached emotions. In non-neutral documents, all emotions
achieve similar scores. Sadness reaches the best f1-score and happiness gets very good precision
but limited recall.
   The results of the official leader board are reported in Table 5. The results are ranked using
the Macro F1 score. The rest of the evaluated metrics are the multi-label accuracy, the Hamming
loss and the micro and weighted versions of the F1-score. We achieve the second best position
with our run based on KI.
   The results of our three runs are depicted in Table 6. As it can be observed, the results
achieved with ensemble learning are more limited in all metrics. The reason for this is that


Table 2
Best hyper-parameters for each feature set trained separately and combined using knowledge integration.
                Feature set shape # of layers neurons dropout              lr activation
                LF           brick              1        48        .1   0.010   linear
                SE           brick              2       256        .1   0.001   relu
                BF           brick              2       256        .1   0.010   tanh
                RF           brick              1        16         -   0.001   relu
                KI           brick              2        48        .1 0.001 sigmoid


Table 3
Macro average precision, recall and f1-score of each feature set and the Knowledge Integration strategy
using custom validation split.
                                          precision   recall f1-score
                                     LF      45.155   47.898   41.162
                                     SE      64.728   57.551   60.646
                                     BF      67.104   58.236   60.274
                                     RF     71.821    64.585   67.422
                                     KI     71.482 65.440      67.441
the majority of correct predictions are performed by the RF feature set and the contribution of
the rest of the feature sets dismisses the performance of the model applying ensemble learning
strategies.

4.1. Error Analyses
For the error analysis we get our best run and collect the wrong predictions with the test split.
Next, we sort the multi-label output by euclidean distance in order to get the predictions with
the higher number of wrong labels. We obtained that the wrong classifications represent the


Table 4
Classification report for the Knowledge Integration strategy using custom validation split.
                                                 precision       recall f1-score
                              anger                    58.378    69.677      63.529
                              disgust                  56.051    60.690      58.278
                              fear                     60.000    60.526      60.262
                              happiness                80.734    42.927      56.051
                              neutral                 100.000    99.003      99.499
                              sadness                  74.346    66.047      69.951
                              surprise                 70.866    59.211      64.516
                              micro avg                78.587    72.276      75.300
                              macro avg                71.482    65.440      67.441
                              weighted avg             78.915    72.276      74.807
                              samples avg              75.108    73.725      72.996


Table 5
Official leader-board
       Rank Team                   Accuracy Weighted F1 Micro F1 Macro F1 Hamming loss
           1   FOSUNlpTeam             63.6               75.9            75.9        68.7            8.80
           2   UMUTeam                 61.6               74.3            74.9        66.9            8.80
           3   hate-alert              61.2               70.9            72.4        61.5            9.20
           4   MUCS                    58.2               69.6            69.2        60.3           11.30
           5   ERTIM                   59.3               69.9            72.0        59.9            9.18
           7   SakshiEmo2022           38.5               61.1            47.7        46.6           34.00
           8   Aces                    42.6               38.1            45.8        24.0           16.90


Table 6
Results per run
                  Run Accuracy Weighted F1 Micro F1 Macro F1 Hamming loss
                  1        0.616              0.743        0.749          0.669              0.088
                  2        0.570              0.670        0.704          0.565              0.091
                  3        0.602              0.714        0.734          0.624              0.087
39.18% of total test split. 129 documents get one wrong label, 529 two wrong labels, 92 three
wrong labels and 14 four wrong labels.
   Next, we present the most notable failures. It is worth noting that the texts presented here
are translated using Google Translator. The most distant classifications made by our system are
those in which our system could not be able to identify any emotion. That is the case of: 1)
You interpret me very well. You hate Maulana Tariq Jameel Sahib. Fear Allah. Those holy persons
should respect him., and 2) Even if you express the pain in a happy way, the pain will still hurt..
For the first sentence, we consider that the problem is that the application does not have enough
context to understand the sentence. For the second sentence, we consider that the text does not
express any emotion but a refrain. The case of 3) The pain and sadness of Imran Niazi on the
death of such a close friend of Imran Niazi is not seen, or even Imran Niazi is just the opposite.
This document was rated as anger, sadness, and surprise, but the annotators did not find any
emotion in the document. Besides, we identified other documents related to SARS-Covid 2019
diseases. That is the case of 4) Those who ask for permission to open shops are not afraid of Corona.
Watch the program Live with Nasrullah Malik only New, and 5) Sami Ibrahim sir, what are you
most afraid of Corona till now? Imran Ahmed Khan Niazi still scares me the most. Besides, there
are other errors with short texts. That is the case of I hate this game. In this case, our system
correctly predicted the anger emotion, but missclassified disgust with sadness, which can be
considered a minor mistake.


5. Conclusions
We achieved the second position in a multi-label classification task in Urdu (66.9% of macro
F1-score), in which our pipeline is based on the combination of a subset of independent linguistic
features and transformers. Our best result combines the features using a knowledge integration
strategy; however, the runs submitted with ensemble learning achieved limited results, losing
several positions in the official ranking. Although we are very happy with our participation,
as we have evaluated our tools with non-Latin languages, we could not participate in the
second subtask of the competition due to lack of time. The source code is available at: https:
//github.com/Smolky/umuteam-emothreat-2022
   As promising future research lines, we would include nested cross validation to prevent
the hyperparameter tinning stages from being biased to the custom validation split and we
will apply data augmentation to increase the number of instances and reduce the effects of
class imbalance. We also explore the reliability of using transformers focused on Urdu rather
than multilingual. Besides, we will include features concerning figurative language [21], as its
identification may increase the generalisation of emotion analysis detectors. Another research
line is to apply emotion analysis to authors profiling tasks. In this sense, we are planning to
extend the PoliticES 2022 shared task [22] to compile tweets from politicians and journalist and
to extract emotions per author profile. For this, we will use the UMUCorpusClassifier tool [23].


Acknowledgments
This work is part of the research project LaTe4PSP (PID2019-107652RB-I00) funded by MCIN/
AEI/10.13039/501100011033. This work is also part of the research projects AIInFunds (PDC2021-
121112-I00) funded by MCIN/AEI/10.13039/501100011033, by the European Union NextGenera-
tionEU/PRTR, LT-SWM (TED2021-131167B-I00) funded by MCIN/AEI/10.13039/501100011033
and by the European Union NextGenerationEU/PRTR, and by “Programa para la Recualificación
del Sistema Universitario Español 2021-2023”. In addition, José Antonio García-Díaz is supported
by Banco Santander and the University of Murcia through the Doctorado Industrial programme.


References
 [1] S. Butt, M. Amjad, F. Balouchzahi, N. Ashraf, R. Sharma, G. Sidorov, A. Gelbukh, Overview
     of EmoThreat: Emotions and Threat Detection in Urdu at FIRE 2022, in: CEUR Workshop
     Proceedings, 2022.
 [2] N. Ashraf, L. Khan, S. Butt, H.-T. Chang, G. Sidorov, A. Gelbukh, Multi-label emotion
     classification of urdu tweets, PeerJ Computer Science 8 (2022) e896.
 [3] S. Butt, M. Amjad, F. Balouchzahi, N. Ashraf, R. Sharma, G. Sidorov, A. Gelbukh, EmoTh-
     reat@FIRE2022: Shared Track on Emotions and Threat Detection in Urdu, in: Forum for
     Information Retrieval Evaluation, FIRE 2022, Association for Computing Machinery, New
     York, NY, USA, 2022.
 [4] I. Ameer, N. Ashraf, G. Sidorov, H. Gómez Adorno, Multi-label emotion classification using
     content-based features in twitter, Computación y Sistemas 24 (2020) 1159–1164.
 [5] F. M. Plaza-del Arco, S. M. Jiménez-Zafra, A. Montejo-Ráez, M. D. Molina-González, L. A.
     Ureña-López, M. T. Martín-Valdivia, Overview of the emoevales task on emotion detection
     for spanish at iberlef 2021, Procesamiento del Lenguaje Natural 67 (2021) 155–161.
 [6] J. A. García-Díaz, R. C. Palacios, R. Valencia-García, Umuteam at emoevales 2021: Emotion
     analysis for spanish based on explainable linguistic features and transformers, in: Pro-
     ceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the
     Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII
     International Conference of the Spanish Society for Natural Language Processing., Málaga,
     Spain, September, 2021, volume 2943 of CEUR Workshop Proceedings, CEUR-WS.org, 2021,
     pp. 59–71. URL: http://ceur-ws.org/Vol-2943/emoeval_paper6.pdf.
 [7] J. García-Díaz, M. Á. R. García, R. Valencia-García, Umuteam@ tamilnlp-acl2022: Emotional
     analysis in tamil, in: Proceedings of the Second Workshop on Speech and Language
     Technologies for Dravidian Languages, 2022, pp. 39–44.
 [8] M. Amjad, A. Zhila, G. Sidorov, A. Labunets, S. Butt, H. I. Amjad, O. Vitman, A. Gelbukh,
     Urduthreat@ fire2021: Shared track on abusive threat identification in urdu, in: Forum for
     Information Retrieval Evaluation, 2021, pp. 9–11.
 [9] L. Khan, A. Amjad, N. Ashraf, H.-T. Chang, Multi-class sentiment analysis of urdu text
     using multilingual bert, Scientific Reports 12 (2022) 1–17.
[10] L. Khan, A. Amjad, N. Ashraf, H.-T. Chang, A. Gelbukh, Urdu sentiment analysis with
     deep learning methods, IEEE Access 9 (2021) 97803–97812.
[11] L. Khan, A. Amjad, K. M. Afaq, H.-T. Chang, Deep sentiment analysis using cnn-lstm
     architecture of english and roman urdu text shared in social media, Applied Sciences 12
     (2022) 2694.
[12] J. A. García-Díaz, P. J. Vivancos-Vicente, Á. Almela, R. Valencia-García, Umutextstats: A
     linguistic feature extraction tool for spanish, in: Proceedings of the Language Resources
     and Evaluation Conference, European Language Resources Association, Marseille, France,
     2022, pp. 6035–6044. URL: https://aclanthology.org/2022.lrec-1.649.
[13] J. A. García-Díaz, R. Colomo-Palacios, R. Valencia-García, Psychographic traits identifica-
     tion based on political ideology: An author analysis study on spanish politicians’ tweets
     posted in 2020, Future Generation Computer Systems 130 (2022) 59–74.
[14] J. A. García-Díaz, S. M. Jiménez-Zafra, M. A. García-Cumbreras, R. Valencia-García, Evalu-
     ating feature combination strategies for hate-speech detection in spanish using linguistic
     features and transformers, Complex & Intelligent Systems (2022) 1–22.
[15] J. A. García-Díaz, R. Valencia-García, Compilation and evaluation of the spanish saticorpus
     2021 for satire identification using linguistic features and transformers, Complex &
     Intelligent Systems 8 (2022) 1723–1736.
[16] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning word vectors
     for 157 languages, CoRR abs/1802.06893 (2018). URL: http://arxiv.org/abs/1802.06893.
     arXiv:1802.06893 .
[17] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional
     transformers for language understanding, CoRR abs/1810.04805 (2018). URL: http://arxiv.
     org/abs/1810.04805. arXiv:1810.04805 .
[18] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
     M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation
     learning at scale, CoRR abs/1911.02116 (2019). URL: http://arxiv.org/abs/1911.02116.
     arXiv:1911.02116 .
[19] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-
     networks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural
     Language Processing, Association for Computational Linguistics, 2019, pp. 3982–3992.
     URL: https://arxiv.org/abs/1908.10084.
[20] J. Bergstra, D. Yamins, D. Cox, Making a science of model search: Hyperparameter opti-
     mization in hundreds of dimensions for vision architectures, in: International conference
     on machine learning, PMLR, 2013, pp. 115–123.
[21] M. del Pilar Salas-Zárate, G. Alor-Hernández, J. L. Sánchez-Cervantes, M. A. Paredes-
     Valverde, J. L. García-Alcaraz, R. Valencia-García, Review of english literature on figurative
     language applied to social networks, Knowledge Information Systems 62 (2020) 2105–2137.
     URL: https://doi.org/10.1007/s10115-019-01425-3. doi:10.1007/s10115- 019- 01425- 3 .
[22] J. A. García-Díaz, S. M. J. Zafra, M. T. M. Valdivia, F. García-Sánchez, L. A. U. López,
     R. Valencia-García, Overview of politices 2022: Spanish author profiling for political
     ideology, Proces. del Leng. Natural 69 (2022) 265–272. URL: http://journal.sepln.org/sepln/
     ojs/ojs/index.php/pln/article/view/6446.
[23] J. A. García-Díaz, Á. Almela, G. Alcaraz-Mármol, R. Valencia-García, Umucorpusclassifier:
     Compilation and evaluation of linguistic corpus for natural language processing tasks,
     Proces. del Leng. Natural 65 (2020) 139–142. URL: http://journal.sepln.org/sepln/ojs/ojs/
     index.php/pln/article/view/6292.