<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MultiAzterTest@Exist-IberLEF 2021: Linguistically Motivated Sexism Identi cation</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Ixa Group, HiTZ center / University of the Basque Country, UPV/EHU</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Identifying sexism in social networks is the focus of the EXISTIberLEF 2021 shared task. By participating in this task, the aim of the MultiAzterTest team is to see if linguistically motivated features can help in the detection of sexism. That is why, we present the three approaches: i) an approach based on language models, ii) an approach based on linguistic and stylistic features + machine learning classi ers and iii) an approach combining the previous approaches. The language model approach obtains the best results in the test data. However, the approaches that use linguistic and stylistic features o er more interpretability.</p>
      </abstract>
      <kwd-group>
        <kwd>Sexism detection</kwd>
        <kwd>Exist-IberLEF</kwd>
        <kwd>Language Models</kwd>
        <kwd>Linguistic features</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Sexism is de ned by the Oxford English Dictionary as \prejudice, stereotyping or
discrimination, typically against women, on the basis of sex". Sexism, moreover,
can be classi ed as indirect sexism, sexual and physical [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and categorized as
in the Exist-IberLEF shared task [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] as ideological and inequality, stereotyping
and dominance, objecti cation, sexual violence and misogyny and non-sexual
violence.
      </p>
      <p>
        The Natural Language Processing (NLP) community has focused on
detecting hate speech [
        <xref ref-type="bibr" rid="ref13 ref22">13, 22</xref>
        ], and abusive language and o ensive language [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] among
others but also on the their related outcomes such as misogyny [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] or racism
[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Sexism has also been addressed and Rodr guez-Sanchez et al. experiment
with user, network, and text-based features, machine learning classi ers (logistic
regression, support vector machine and random forest), deep recurrent neural
networks (BI-LSTM) and transformer-based language models (BERT) [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] to
detect it.
      </p>
      <p>
        In this paper, we test MultiAzterTest-Social (MATS) in the task of detecting
sexism in the context of the EXIST 2021 Shared Task [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], a shared task at
      </p>
      <p>
        IberLef 2021 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. MATS is a version of the MultiAzterTest tool [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which is
the trilingual version of AzterTest [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. MultiAzterTest and AzterTest are
opensource NLP based tools and web services for text stylometrics and readability
assessment. In addition to the linguistic and stylistic features, MATS includes
features to analyse social media texts inspired by Fersini et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and other
improvements. By participating in this shared task, we want to see if a tool that
outperforms the state-of-the art results in readability assessment can be applied
to other classi cation tasks, where texts are shorter, include more subjective
information and colloquial and informal speech. Linguistic based features have
been used to detect fake news [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and text features have also been taken into
account for sexism detection [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], but to other knowledge this is the rst time
that more than 150 features are taken into account for this task. The aim of
using a linguistically motivated tool is to give explanations to the predictions
and to be able to analyse the linguistic characteristics of sexism.
      </p>
      <p>This paper is structured as follows: in Section 2 we introduce
MultiAzterTestSocial, in Section 3 we describe our approaches and the experimental set-up, in
Section 4 we present the results and we conclude and outline the future work in
5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>MultiAzterTest-Social</title>
      <p>
        MultiAzterTest-Social is an improvement of MultiAzterTest [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. MultiAzterTest
analyses more than 125 linguistic and stylistic features in Basque (125 features)
English (163 features), and Spanish (141 features). Following, we brie y explain
how MultiAzterTest works:
{ Preprocessing: This step carries out all the necessary analysis in raw texts
in order to be processed. This includes multilingual parsing (in our case
Stanza [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]), syllable splitting, and stopword removing.
{ Linguistic and stylistic pro ling: Based on the previous text analysis,
this step calculates the linguistic and stylistic features. These features are
grouped in the following types: descriptive and raw features, lexical
diversity, classical readability formulae, word frequencies, vocabulary knowledge,
morphological information, syntax, semantic information, semantic overlap
(semantic similarity), referential cohesion (overlaps) and logical cohesion
(connectives). There are ve types of indicators: absolute numbers, mean,
standard deviation, incidence and ratios.
{ Classi cation: Based on the linguistic and stylistic features, a machine
learning classi er is applied. This classi er varies depending on the task.
In the case of readability assessment, for example, support vector machines
seem to be the most adequate.
      </p>
      <p>
        In order to analyse social media text, we have added more features to
MultiAzterTest and, this way, we have adapted it to MultiAzterTest-Social. Some of
the new features for social media are inspired in the features presented by Fersini
et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to pro le misogynists. Advanced morpho-syntactic, and named-entities
are based on other readability assessment works e.g. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], we have created the
sentiment analysis and the abusive term features and we have added more
descriptive features (descriptive+). Following, we explain the new features:
{ Descriptive+: We analyse the number of words and sentences per tweet;
the number of numerical expressions, its incidence per 1000 words and the
ratio of numbers per tweet and per sentence; the number and incidence of
each punctuation mark (colon, exclamation mark...); and the number and
incidence of special characters.
{ Advanced morpho-syntactic: We calculate the number and incidence
per 1000 words of the types of determinants (de nite, inde nite), adjectives
(comparative, superlative), pronouns (person and number), causal and
intentional verbs and particles, adverbial and prepositional phrases and the
ratios of causal/intentional particles to causal/intentional verbs.
{ Named entities: Stanza's base version detects 4 named entity types
(Person, Location, Organisation and Miscellaneous). We calculate the mean of
all the entities per sentence and the incidence per 1000 words; the ratio of
entities per nouns; and each entity type per all the entities, per sentence and
its the incidence per 1000 words.
{ Social media: These features include the number and ratio of emojis per
tweet and sentence; the number and incidence of hashtags/mentions/stretched
words, ratio of each of them per sentence and tweet; the number and
incidence of mentions, ratio of hashtags per sentence and tweet; and percentage
of capital letters per sentence.
{ Sentiment analysis: We calculate the average positive, negative, neutral
or compound score per sentence based on sentiment intensity analyser from
VADER [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the number of positive, negative, and neutral emojis according
to the Emoji sentiment lexicon [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and average sentiment score per sentence.
{ Abusive terms: We include features for profane words, abusive words and
hurt words based on the following resources: i) Luis von Ahn's Research
Group's O ensive/Profane Word List [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], ii) the Lexicon of Abusive Words
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], and iii) HurtLex, the multilingual lexicon of words to hurt [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Although
HurtLex classi es the words in di erent categories, we take all of them
together. We calculate the number, the incidence and the ratio per sentence
of the profane, abusive and hurt words.
      </p>
      <p>In Table 1 we show the number of new features MATS analyses. Taking into
account the features MultiAzterTest calculates, in this work we have used 280
features for English and 244 for Spanish.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Approaches and Experimental Set-up</title>
      <p>
        In this section we present the experiments carried out for the task 1: Sexism
Identi cation. The dataset we have used has been provided by the organisers
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The results are calculated using accuracy, as distribution between sexist
and non-sexist categories is balanced.
      </p>
      <p>In our experiments, we have tested three approaches: i) a language model
(LM), ii) the features of MultiAzterTest-Social together with a machine learning
classi er (henceforth, MATS-Sexism) and iii) a combination of the LM and the
MATS-Sexism approach.
3.1</p>
      <p>
        Language Model Approach
The LM approach uses the Bidirectional Encoder Representation from
Transformer (BERT) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], exactly the bert-base-uncased model, pre-trained on the
BooksCorpus [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] and English Wikipedia for English and BETO
(bert-basespanish-wwm-uncased) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for Spanish. Both models are provided by
HuggingFace [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. We have decided to use this approach because BERT achieves state
of the art results on many NLP tasks.
      </p>
      <p>This is our experimental setting: we have truncated all texts that had more
than 200 tokens and we have added two tokens to mark the beginning and the
end of the sequence to each input text, [CLS] and [SEP] respectively. We have
padded texts shorter than 200 tokens with zeroes. We have not performed any
text augmentation or pre-processing besides standard byte-pair encoding. We
have used the PyTorch framework to create our model.</p>
      <p>On top of BERT, we have probed with two sequential models: i) a dropout
layer to ght over tting. The dropout probability was set equal to 0.1. On top of
the dropout layer, we have added a linear layer and sigmoid activation function.
The input dimension of the linear layer was 768 and the output 2 (equal to the
number of classes); ii) a linear layer, ReLU activation function and linear layer
model. The input dimension of the rst linear was 768 and the output 50, and
the input dimension of the second linear was 50 and the output 2 (equal to the
number of classes).</p>
      <p>We have used the cross-entropy loss function for each of the outputs.</p>
      <p>
        We have trained the model in the Google Colaboratory framework. We have
split the training data into 80 % for train and 20 % for validation. The training
batch size was made equal to 32 and the model was trained for 10 epochs using
early stopping technique. We have obtained the best result in the validation data
after running 4 epochs, setting the tweet length to 200, and the learning rate to
5e-5 with linear-ReLU-linear sequential model and the Adam optimizer [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
3.2
      </p>
      <p>
        Approach Based on Linguistic Features and Machine Learning
The second approach, the MATS-Sexism approach, consists of the outputs of the
tool plus a classical machine learning classi er. In order to know which is the
most adequate classi er, we have tested the Sequential Minimal Optimization
(SMO) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], Random Forest (RF) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Simple Logistics (SL) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] classi ers.
We have also carried out feature selection with the ten most predictive features
according to WEKA [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] based InfoGain attribute evaluator (Table 3), and in
the case of SMO, we also have reduced the number of features to 125 and 75.
All these preliminary experiments have been done with 10 fold cross-validation.
      </p>
      <p>
        In Table 2 we present the results of the MATS-Sexism approach with
different features and classi ers on the training data. As it happens in readability
assessment, SMO is the best classi er [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and, therefore, SMO will be the
classi er of MATS-Sexism. We also see, contrary to what happens in readability
assessment, that feature selection and feature reduction are not competitive.
      </p>
      <p>
        Before continuing with the approaches, let us analyse the most predictive
features presented in Table 3 from a linguistic and stylistic point of view. Four
of the most predictive features for English are descriptive (word, lemma and
syllable length), there are 2 semantic similarity features, and one of the social media
features (the percentage of capital letters), sentiment analysis (the VADER
compound score), the Flesch readability formula [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and word frequencies (minimum
word frequency).
      </p>
      <p>In the case of Spanish, it is remarkable that 4 features are related to rare
words. This can imply that ii) infrequent words have been used or ii) that the
spelling of the words was not the correct one and they have not been correctly
analysed. The importance of the hashtags is also noticeable (4 features). The use
of the rst person pronouns and the unclassi ed miscellaneous named entities
play also a role. Finally, 6 out of the 10 features were not in MultiAzterTest
and come from the update to social. This shows the validity of the new added
features.
The third approach is a combination of the results of the LM and MATS-Sexism.
To combine the results, we have two options: i) label as sexist if one of the tools
tags a tweet as sexist or ii) label as sexist only if both tools consider that a
tweet is sexist. We have decided to implement the second option (only if LM
and MATS-Sexism agree) in order to give more precision to our predictions
(although the o cial evaluation metric is accuracy). We think that in subjective
tasks striving for precision can avoid doing harm.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results in Test Data</title>
      <p>In this section we present the results in the test data as provided by the
organisers. In total, 72 systems were evaluated in Task1. In Table 4 we present the results
of our three approaches together with the baseline results (TF-IDF+SVM), also
provided by the organisers.</p>
      <p>The LM approach obtains the best results in all the settings: both languages,
and English and Spanish on their own. The combination stays in the middle and
the MATS-Sexism approach is the worst. Only the LM approach is above the
baseline. It is remarkable that all the approaches perform similarly in all the
settings: if we rounded numbers, the accuracy of the LMs will be 0.77 in both
Spanish and English and also in Spanish and English separately. The MATS-Sexism
approach has an accuracy of 0.59-0.60. The combination has more variation,
from 0.65 to 0.67. In general, we can say that there is a di erence of 17 point
between LM and MATS-Sexism, and a di erence of 10 between the LM and the
combination and 5-8 points between the combination and MATS-Sexism.</p>
      <p>Looking at the results, we see that approaches based on distributional
information such as the languages models or TF-IDF are very e ective when working
with short texts and features that take into account syntactic and discursive
information may not be so helpful in these classi cation tasks. Indeed, they worsen
the accuracy.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>In this paper we have presented the results of the MultiAzterTest team at the
rst task of the Exist-IberLEF 2021 shared task. The aim of these experiments
was to see if linguistically motivated features could identify sexism. Looking at
our results, we see that distributional approaches are very e cient and linguistic
features are not so important when classifying short texts.</p>
      <p>However, in the future, the outputs of the linguistically motivated approach
can be used to interpret the characteristics of sexism. It would be also an
interesting work to test these approaches in longer texts.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We acknowledge the following projects: DeepText (KK-2020/00088),
DeepReading RTI2018-096846-B-C21 (MCIU/AEI/FEDER, UE), BigKnowledge for Text
Mining, BBVA and IXA taldea, A motako ikertalde nkatua (IT1343-19).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2021</year>
          . CEUR Workshop Proceedings (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>von Ahn</surname>
          </string-name>
          , L.:
          <article-title>O ensive/profane word list</article-title>
          . https://www.cs.cmu.edu/ biglou/resources/, accessed:
          <fpage>2021</fpage>
          -05-14
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bassignana</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Hurtlex: A multilingual lexicon of words to hurt</article-title>
          .
          <source>In: 5th Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2018</year>
          . vol.
          <volume>2253</volume>
          , pp.
          <volume>1</volume>
          {
          <issue>6</issue>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bengoetxea</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Dios</surname>
          </string-name>
          , I.:
          <article-title>MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment. Manuscript from author (</article-title>
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bengoetxea</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Dios</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aguirregoitia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>AzterTest: Open Source Linguistic and Stylistic Analysis Tool. Procesamiento del Lenguaje Natural</source>
          <volume>64</volume>
          ,
          <issue>61</issue>
          {
          <fpage>68</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Random forests</article-title>
          .
          <source>Machine Learning 45(1)</source>
          ,
          <volume>5</volume>
          {
          <fpage>32</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Caselli</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitrovic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kartoziya</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granitzer</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>I feel o ended, don't be abusive! implicit/explicit messages in o ensive and abusive language</article-title>
          .
          <source>In: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>6193</volume>
          {
          <issue>6202</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <article-title>Spanish PreTrained BERT Model and Evaluation Data</article-title>
          .
          <source>In: PML4DC at ICLR</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Choudhary</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Linguistic feature based learning model for fake news detection and classi cation</article-title>
          .
          <source>Expert Systems with Applications</source>
          p.
          <volume>114171</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <issue>4186</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boifava</surname>
          </string-name>
          , G.:
          <article-title>Pro ling italian misogynist: An empirical study</article-title>
          .
          <source>In: Proceedings of the Workshop on Resources</source>
          and
          <article-title>Techniques for User and Author Pro ling in Abusive Language</article-title>
          . pp.
          <volume>9</volume>
          {
          <issue>13</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Flesch</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A new readability yardstick</article-title>
          .
          <source>Journal of applied psychology 32(3)</source>
          ,
          <volume>221</volume>
          (
          <year>1948</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Fortuna</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wanner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Toxic, hateful, o ensive or abusive? what are we really classifying? an empirical analysis of hate speech datasets</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>6786</volume>
          {
          <issue>6794</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gonzalez-Dios</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aranzabe</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          , D az de Ilarraza,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Salaberri</surname>
          </string-name>
          , H.:
          <article-title>Simple or complex? assessing the readability of basque texts</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: Technical Papers</source>
          . pp.
          <volume>334</volume>
          {
          <fpage>344</fpage>
          . DCU and
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , Dublin, Ireland (Aug
          <year>2014</year>
          ), https://www.aclweb.org/anthology/C14-1033
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutemann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>The WEKA data mining software: an update</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <volume>10</volume>
          {
          <fpage>18</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Hutto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gilbert</surname>
          </string-name>
          , E.:
          <article-title>Vader: A parsimonious rule-based model for sentiment analysis of social media text</article-title>
          .
          <source>In: Proceedings of the International AAAI Conference on Web and Social Media</source>
          . vol.
          <volume>8</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Landwehr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
          </string-name>
          , E.:
          <source>Logistic model trees 95(1-2)</source>
          ,
          <volume>161</volume>
          {
          <fpage>205</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Novak</surname>
            <given-names>Kralj</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Smailovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Sluban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mozetic</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Sentiment of emojis</article-title>
          .
          <source>PloS one</source>
          <volume>10</volume>
          (
          <issue>12</issue>
          ),
          <year>e0144296</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Pamungkas</surname>
            ,
            <given-names>E.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Misogyny Detection in Twitter: a Multilingual and Cross-domain study</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>57</volume>
          (
          <issue>6</issue>
          ),
          <volume>102360</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Platt</surname>
          </string-name>
          , J.:
          <article-title>Fast training of support vector machines using sequential minimal optimization</article-title>
          . In: Schoelkopf,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>Advances in Kernel Methods - Support Vector Learning</article-title>
          . MIT Press (
          <year>1998</year>
          ), http://research.microsoft.com/~jplatt/smo.html
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Poletto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanguinetti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Resources and benchmark corpora for hate speech detection: a systematic review</article-title>
          .
          <source>Language Resources</source>
          and Evaluation pp.
          <volume>1</volume>
          {
          <issue>47</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Bolton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Stanza: A python natural language processing toolkit for many human languages</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          . pp.
          <volume>101</volume>
          {
          <issue>108</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Rodr guez-Sanchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Automatic Classi cation of Sexism in Social Networks: An Empirical Study on Twitter Data</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <issue>219563</issue>
          {
          <fpage>219576</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Rodr guez-Sanchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Albornoz</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Comet</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donoso</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Overview of exist 2021: sexism identi cation in social networks</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26. Shari rad, S.,
          <string-name>
            <surname>Jacovi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Learning and understanding di erent categories of sexism using convolutional neural network's lters</article-title>
          .
          <source>In: Proceedings of the 2019 Workshop on Widening NLP</source>
          . pp.
          <volume>21</volume>
          {
          <fpage>23</fpage>
          . Association for Computational Linguistics, Florence,
          <source>Italy (Aug</source>
          <year>2019</year>
          ), https://www.aclweb.org/anthology/W19-3609
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Waseem</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Are you a racist or am i seeing things? annotator in uence on hate speech detection on twitter</article-title>
          .
          <source>In: Proceedings of the rst workshop on NLP and computational social science</source>
          . pp.
          <volume>138</volume>
          {
          <issue>142</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruppenhofer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greenberg</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Inducing a lexicon of abusive words{a feature-based approach</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers). pp.
          <volume>1046</volume>
          {
          <issue>1056</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delangue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cistac</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rault</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funtowicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Huggingface's transformers: State-ofthe-art natural language processing</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>03771</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urtasun</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fidler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Aligning books and movies: Towards story-like visual explanations by watching movies and reading books</article-title>
          .
          <source>In: Proceedings of the IEEE international conference on computer vision</source>
          . pp.
          <volume>19</volume>
          {
          <issue>27</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>