<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Vector Embeddings and Feature Vectors to Humor Identification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>María Carmen Aguirre-Delgado</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Angel Eduardo Cadena-Bautista</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Posgrado en Ciencia e Ingeniería de la Computación, Universidad Nacional Autónoma de México (UNAM)</institution>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Expressing prejudice is the most common strategy to harm minority groups. Prejudice is defined as "the formation of a negative concept or judgment in advance about members of a race, religion, or any other significant social group, despite facts contradicting it." This paper proposes two diferent approaches for the classification of tweets aiming to generate humor by expressing prejudice, detecting the target group to which the prejudice is being expressed, as well as the prejudice score, ranging from 1 to 5, contained in the tweet. The approaches to solve these tasks are from two perspectives: one focusing on if there are some lexical and morphological characteristics to discriminate tweets with humor and those who don't, and the other utilizing word embeddings to tackle the same tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Prejudice</kwd>
        <kwd>NLP</kwd>
        <kwd>features</kwd>
        <kwd>Embeddings</kwd>
        <kwd>Humor</kwd>
        <kwd>feature vectors</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        from highly ofensive ones. We are focusing on the tasks provided by Labadie et al. in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Those
tasks focus on three main objectives:
• Subtask 1: Identify whether the tweet text uses humor to express prejudice.
• Subtask 2A: Identify which of the following groups the prejudice is being expressed
towards:
– Women and feminists
– LGBTIQ community
– Immigrants and racially discriminated individuals
– Overweight individuals
• Subtask 2B: Determine the level of prejudice in the text on a scale from 1 to 5.
1.1. Tasks
• Subtask 1:
      </p>
      <p>HUrtful HUmour Detection: The first subtask consists of determining whether a
prejudiced tweet intends to be humorous. Participants will have to distinguish between
tweets that express prejudice using humor and tweets that express prejudice without
using humor. Systems will be evaluated using the F1 measure for the positive class.
• Subtask 2A:</p>
      <p>Prejudiced Target Detection: Taking into account the analyzed minority groups, namely
women and feminists, LGBTIQ community, immigrants, racially discriminated individuals,
and overweight individuals, participants are asked to identify the target groups in each
tweet as a multi-label classification task. The evaluation metric used for this subtask will
be macro-F1.
• Subtask 2B:</p>
      <p>Prejudice Degree Prediction: The third subtask consists of predicting the degree of
prejudice in a continuous scale from 1 to 5 for the messages targeting the minority groups.</p>
      <p>The predictions submitted will be evaluated using the Root Mean Square Error (RMSE).</p>
      <sec id="sec-1-1">
        <title>1.2. Evaluation Measures</title>
        <p>1.2.1. F1
To define the F1 measure, we need to use two widely used measures for evaluating the
performance of information retrieval systems, classification, and other tasks. Precision is defined as
the proportion of correct predictions among the total predictions of a class. In other words, it is
the proportion of true positives out of all positive predictions.</p>
        <p>Recall, also known as sensitivity or true positive rate, is the proportion of retrieved items out of
the total relevant items. In other words, it is the proportion of true positives for a class out of
all positive instances.</p>
        <p>precision =</p>
        <p>TP</p>
        <p>FP + TP
recall =</p>
        <p>TP
FN + TP
(1)
(2)
Thus, the F1 measure is the harmonic mean that combines precision and recall values.</p>
        <p>precision × recall
 1 = 2 × precision + recall
The F1 measure ranges between 0 and 1, where a value closer to 1 indicates better classification
performance.
1.2.2. Macro-F1
The macro-averaged F1 score (or macro-F1 score) is calculated by taking the arithmetic mean
(i.e., the unweighted mean) of all class-wise F1 scores. This method treats all classes equally,
regardless of their support values.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.3. Root Mean Square Error (RMSE)</title>
        <p>To calculate RMSE, the residual (diference between the prediction and the truth) of each data
point is computed, the norm of the residual for each data point is calculated, the average of the
residuals is computed, and the square root of that average is obtained.</p>
        <p>RMSE(, ˆ) =
√︃
∑︀=− 01( − ˆ)2

The best possible score is 0, and a lower value indicates better performance.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Description of the dataset</title>
        <p>
          The organizers [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] provided a dataset, this dataset was divided into two parts: the training set
and the evaluation set.
        </p>
        <p>The training set consists of 2671 entries with 8 columns. Figure 2 shows an example
of the data:
• Index: Unique identifier for the tweet.
• Tweet: Text of the tweet in question.
• Humor: Binary, 1 if the tweet is considered humorous, 0 if not.
• Prejudice_woman: Binary, 1 if the tweet exhibits prejudice towards women, 0 if not.
• Prejudice_lgbtiq: Binary, 1 if the tweet exhibits prejudice towards the LGBTIQ community,
0 if not.
• Prejudice_inmigrant_race: Binary, 1 if the tweet exhibits prejudice towards immigrants
or racialized individuals, 0 if not.
• Gordophobia: Binary, 1 if the tweet exhibits prejudice towards overweight people, 0 if
not.
(3)
(4)
• Mean_prejudice: Continuous value between 1 and 5, representing the degree of prejudice
in the tweet.</p>
        <p>For the test dataset the organizing committee provided a set of 778 tweets with the following
form.</p>
        <p>• index: Unique indicator for the tweet
• tweet: Text of the tweet in question</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data preprocessing</title>
        <p>We applied basic preprocessing techniques for feature extraction and TF-IDF weighting on the
tweets, following these steps:
1. Removal of pre-existing marks on the dataset such as "URL", "HASHTAG", and
"MEN</p>
        <p>TION".
2. Removal of line breaks, special characters, and double spaces.
3. Removal of stop words.
4. Tokenization of words and sentences.</p>
        <p>For the use of word embeddings, the same procedure was followed, except for the removal of
stop words. It was observed that removing stop words resulted in lower performance of the
classifiers.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Feature engineering</title>
        <p>We aimed to identify features that would allow us to distinguish between humorous and
non-humorous tweets. To achieve this, we performed textual, grammatical, and morphological
feature extraction on the tweets to capture humor-related characteristics.</p>
        <p>A total of 19 features were extracted from the text, as follows:
• Textual Features
– Number of characters per document
– Number of digits
– Number of words per document
– Number of characters per word
– Number of uppercase characters per document
– Number of special characters (-, :, !, ¡, ¿, ?)
– Number of emoticons (:),;/, &lt;3, etc.)
– Number of emojis
– Flesch-Kincaid Grade Level
• Morphological Features
– Number of verbs
– Number of adjectives
– Number of nouns
– Number of pronouns
• TF-IDF Weighting
• TF Weighting
– Bag of Words
– Word bigrams
– Word trigrams
– POS tag bigrams
– Bag of Words
– Word bigrams
– POS tag bigrams
• Embeddings
– fastText
– Word2Vec
• Lexicons of ofensive words</p>
        <p>
          – Hurtlex [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] + SHARE [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
Feature vectors were computed for each tweet, and the diference between the humor and
non-humor categories was examined.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Word Embeddings</title>
        <p>
          We defined a feature vector space for training and evaluation composed of unsupervised word
embedding vectors. A set of word embedding vectors represents the ideal semantic space of
words in a continuous vector space of real values, where the relationships between word
vectors reflect linguistic relationships between words. Word embeddings provide a dense
representation of the meaning of a word, with each word linked to a continuous vector of
real values with a defined dimension. Word embeddings can be generated using pre-trained
embeddings such as Word2Vec [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], GloVe [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and fastText [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>In this study, we utilized the average of pre-trained word embeddings in Spanish. Specifically,
we used a set of pre-trained word embedding vectors of 300 dimensions from fastText, trained
on an unannotated Spanish corpus. These vectors were obtained using the skip-gram model.
They were trained on 15 diferent sources, including Spanish Wikis, ParaCrawl, EUBookshop,
MultiUN, OpenSubtitles, among others. The oficial embeddings were trained solely on
Common Crawl and Wikipedia sources.</p>
        <p>
          Additionally, we also experimented with pre-trained word embeddings in Spanish from
Word2Vec, which are 300-dimensional embeddings generated using the skip-gram model. The
training corpus used for these embeddings is an unannotated Spanish language corpus of nearly
1.5 billion words, compiled from various corpora and web resources. These embeddings were
evaluated by translating the word analogy test set from the original Word2Vec [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] into Spanish.
Both are available in the following repository:
https://github.com/dccuchile/spanish-word-embeddings.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Classification Models</title>
        <sec id="sec-2-5-1">
          <title>2.5.1. Model configuration using features</title>
          <p>For subtasks 1 and 2A using only features, three sklearn models were used with the following
parameters:
• Logistic Regression: solver = "lbgs", tol = 0.001,  = 0.01, class_weight = "balanced"
• Support Vector Machines:  = 0.01, kernel = "linear", class_weight = "balanced"
• Random Forests: max_depth = 10, random_state = 0, class_weight = "balanced"
The configuration for subtask 2B, being a regression one, the following models were used:
• Ridge: alphas = [1− 3, 1− 2, 1− 1, 1], cv= 10
• Lasso: cv = 2
• SVM:  = 1.0, epsilon = 0.2, gamma = "scale", kernel = "linear"
• DTR: random_state = 0</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>2.5.2. Model configuration for models using word embeddings</title>
          <p>
            For the use of word embeddings, the following configurations were used:
For subtask 1, the following models were tested using RandomizedSearchCV with 5
folds for parameter exploration. The next parameter options were passed to each model to
select the best ones using cross-validation. Those in which no parameters are indicated were
initialized with default parameters2:
• Logistic Regression (LR): C: expon(scale=100), ’penalty’: [’l1’, ’l2’, ’elasticnet’, None]
• Support Vector Machines (SVM): class_weight=’balanced’, kernel: [linear, poly, rbf,
sigmoid, precomputed], C: expon(scale=100)
• XGBoost: n_estimators: stats.randint(150, 500), learning_rate: stats.uniform(0.01, 0.07),
subsample: stats.uniform(0.3, 0.7), max_depth: [
            <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7 ref8 ref9">3, 4, 5, 6, 7, 8, 9, 10</xref>
            ], colsample_bytree:
stats.uniform(0.5, 0.45), min_child_weight’: [
            <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
            ]
• Random Forest: max_depth = 10, random_state = 0, class_weight = "balanced"
• Gaussian Naive Bayes (GaussianNB)
• Bernoulli Naive Bayes (BernoulliNB)
          </p>
          <p>For subtask 2A, the following models were used. Those in which no parameters are indicated
were initialized with default parameters. We used cross-validation with 5 folds:
• Logistic Regression (LR)
• Support Vector Machines (SVM):  = 1.0, epsilon = 0.2, gamma = "scale", kernel = "linear"
• XGBoost: objective=’binary:logistic’
• Random Forest: max_depth = 10, random_state = 0, class_weight = "balanced"
For subtask 2B, the following models were used:
• Ridge: alphas = [1− 3, 1− 2, 1− 1, 1], cv= 5
• Lasso: CV = 5
• XGBoost Regressor: n_estimators: 500, max_depth: 4, min_samples_split: 5, learning_rate:
0.01, loss: squared_error
• LGBM Regressor: task: train, boosting: gbdt, objective: regression, num_leaves: 10,
learnnig_rage: 0.05, metric: {l2, l1 }
• Support Vector Machines (SVM): C= 1.0, epsilon= 0.2, gamma= ’scale’, kernel= ’linear’</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Feature Engineering</title>
        <p>After extracting the feature vectors, the F1 score was calculated for each of them using three
diferent classifiers. This calculation was performed using stratified 10-fold cross-validation.
For subtask 1, we hypothesized that if a feature demonstrated outstanding individual
performance in classification, then including that feature would improve the classification
process. Consequently, we proceeded to combine the features that showed the best individual
performance, calculating the F1 score for each of the obtained combinations.
In a second experiment, an exhaustive search was conducted to identify the
combination that ofered the best performance in terms of the F1 score in the classification.
Regarding subtasks 2A and 2B, we selected the combination of features that yielded
the best results in subtask 1 and used that combination in those subtasks.</p>
        <p>
          Regarding the embeddings, the following combinations were tested:
• Subtask 1: Pretrained Word2Vec and fastText embeddings were tested individually, along
with TF-IDF vectorization, character n-grams using count vectorizer from sklearn.
• Subtask 2A: TF-IDF vectorization, TF-IDF + fastText vectorization, TF-IDF + fastText +
character Bag of Words, TF-IDF + fastText + character Bag of Words + count of aggressive
words using the SHARE lexicon [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] were tested.
        </p>
        <p>• Subtask 2B: Same configuration as subtask 2A.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>We can observe that the best results for individual feature vectors across three classifiers on the
training set (evaluated through CV) are obtained using the Bag of Words approach with term
frequency weighting, as shown in Table 1.</p>
      <p>Subsequently, the combination of characteristics was made. During the process of feature
combination for each classifier, it is observed that the combinations difer from each other. The
combination of features was made taking into account the individual performance and doing it
forward. In the case of logistic regression, the combination with better performance in the
train and evaluation set includes term frequency weighting and TF-IDF weighting for words.
For SVM, all available features are used, while for Random Forest, in the training set, the top
12 individual features were better, and in the evaluation set, the best combination was term
frequency weighting and TF-IDF for words the results are in Table 2.</p>
      <p>However, it is important to note that the results obtained on the evaluation set do not
necessarily coincide with the best combination found in the testing phase. This means that
a combination that performs well on the training set may not necessarily have the best
performance on the evaluation set.</p>
      <p>It is crucial to conduct a thorough analysis of the results on both sets to determine
the optimal combination of features that maximizes the overall model performance. This
highlights the importance of comprehensive evaluation and validation to ensure reliable and
consistent results.</p>
      <p>LR
BoW
TFIdFBoW
Bigram_POS
Bigram_TFIdF_POS
FLSKGL
PalsDoc
NOUNCnt
ADJCnt
LenDoc
CharsPal
MayusDoc
VERBCnt
SpecCharCnt
PRONCnt
Bigram_TFIdF_BoW
LexRich
Bigram_BoW
DigitsCnt
EmojisCnt</p>
      <p>On the other approach taken into account to subtask 1, performance comparison of average
embeddings from fastText and Word2Vec used with various classifiers. FastText shows slightly
better performance compared to Word2Vec, which is why it was used in later configurations.
These results are in Table 3. On the other hand, Table 4 presents a comparison of TF-IDF
weighting against the use of TF-IDF by character.</p>
      <p>For subtasks 2A and 2B, the results using word embeddings are shown in Table 5 and
Table 8. We observed that the best results for subtask 2a are obtained using the combination of
TF-IDF weighing, bag of words, FastText embeddings and are further improved by adding the
aggressive word count. On the other hand, for substask 2b almost all classifiers show an over-fit,
as their performance worsens in the test data set. However, the best results are obtained with
the combination of bag-of-words, heavy TF-IDF, fastTex emmbedings, and aggressive word
counts. LGBM Regressor and SVM show less overfitting compared with the rest.
For subtasks 2A and 2B, the matrix of the best feature combination from subtask 1 was used,
which consisted of the top 12 best-performing features for the Random Forest algorithm. The
results for subtask 2A using feature combination are shown in Table 6. And for the subtask 2B
is shown in Table 7</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <sec id="sec-4-1">
        <title>4.1. Feature Vectors</title>
        <p>TF-IDF</p>
        <p>fastText + TF-IDF
The best performance for subtask 1 is achieved with an F1-score of 0.73 for a Random Forest
classifier, using a combination of Bag of Words (BoW) and TF-IDF. In subtask 2A, the best
macro F1-score is 0.69 for SVM, with the combination that performs best on the training set.
For subtask 2B, our best score was 0.71 using Lasso regression.</p>
        <p>By following the proposed methodology of combining feature vectors based on their
individual performance, the expected result is not obtained. In addition, features extracted from
the text that we assumed would provide more information to the classifier did not yield the
expected results. Furthermore, these features were able to help with classification for subtask 1
but not so much for subtasks 2A and 2B. A better approach could be to improve the use of
lexicon to identify the target group of aggressive words to help classify better for subtask 2A
and this would improve subtask 2B as well. The evaluation metric depends on the order in
which the feature vectors are combined and thus, the best combination for one subtask does
not necessarily work well for the others.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Embeddings and Feature Vectors</title>
        <p>The best embedding for addressing the problem is fastText, although Word2Vec also performs
well, in fact, quite similar to fastText.</p>
        <p>In subtask 2A, including the aggressive word count improves the performance
compared to using just embeddings and vectorization. In subtask 2A, a macro-F1 score of 0.81 is
achieved using SVM with TF-IDF, CBOW, fastText, and aggressive word count.
When using embeddings and vectorizations, subtask 2B shows overfitting in most of
the regressors, except for LGBM Regressor and SVM, wich achieves 0.73 and 0.73 RMSE. The
best results were obteined using Ridge regresion and XGBoost regressor with 0.73 and 0.75.
One thing that all successful configurations have in common is the use of
emmbeddings, lexicons, and weights, which proves to improve the performance of the classifiers when
used together. Due to the structure and versatility of lexicons, their use could be improved to
obtain other metrics, as they proved to be useful when used for counting aggressive words.
Future works
• Explore deep learning techniques, such as CNN and Transformers.
• Utilize embeddings from pretrained transformer models.
• Improve the use of lexicons and test them for subtasks 1 and 2A.
• Use heuristics to find the best combination of feature vectors.</p>
        <p>• Try data augmentation techniques, as the dataset is imbalanced.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <article-title>Automatic detection of irony and humour in twitter</article-title>
          ,
          <source>in: International Conference on Innovative Computing and Cloud Computing</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. I.</given-names>
            <surname>Merlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>When humour hurts: linguistic features to foster explainability</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          (
          <year>2023</year>
          )
          <fpage>85</fpage>
          -
          <lpage>98</lpage>
          . doi:
          <volume>10</volume>
          .26342/ 2023-70-7.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Labadie-Tamayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Everybody hurts, sometimes. overview of hurtful humour at iberlef 2023: Detection of humour spreading prejudice in twitter</article-title>
          ,
          <source>in: Procesamiento del Lenguaje Natural (SEPLN)</source>
          , volume
          <volume>71</volume>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bassignana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <article-title>Hurtlex: A multilingual lexicon of words to hurt</article-title>
          , volume
          <volume>2253</volume>
          ,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .4000/books.aaccademia.
          <volume>3085</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Plaza-Del-Arco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B. P.</given-names>
            <surname>Portillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>López-Úbeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. T.</surname>
          </string-name>
          Martín-Valdivia,
          <article-title>Share: A lexicon of harmful expressions by spanish speakers</article-title>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space (</article-title>
          <year>2013</year>
          ). URL: http://arxiv.org/abs/1301.3781.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , C. Manning, GloVe:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . URL: https://aclanthology.org/D14-1162. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>D14</fpage>
          -1162.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics 5 (</article-title>
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          . URL: https://aclanthology.org/Q17-1010. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00051</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cardellino</surname>
          </string-name>
          ,
          <source>Spanish Billion Words Corpus and Embeddings</source>
          ,
          <year>2019</year>
          . URL: https:// crscardellino.github.io/SBWCE/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>