<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Italian Symposium on Advanced Database Systems, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Gender Discriminatory Language Identification with an Hybrid Algorithm based on Syntactic Rules and Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Bellandi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Siccardi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, Università Degli Studi di Milano</institution>
          ,
          <addr-line>Via Celoria 18 Milano (MI)</addr-line>
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>9</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>In the last years, gender discrimination in textual documents has emerged as an open problem and is undergoing analysis. The dificulty of identifying sentences in which this discrimination is present is linked to the context used and the formalisms adopted. This work describes an exploratory activity linked to the context of regulations and oficial documents of Italian public administrations. A hybrid algorithm based on syntactic rules and machine learning is therefore proposed, capable of identifying a specific subset of possible gender discrimination.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Gender Discrimination</kwd>
        <kwd>Syntactic Rules</kwd>
        <kwd>Entities Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Discriminatory attitudes against minorities or gender related have been reported in many areas;
often they are conveyed by language in the form of open “hate speech” or in more subtle forms,
for instance associating the discriminated group to specific social roles or professions. Several
social networks, like Facebook and Twitter, define and ban hate speech (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). Nevertheless,
[9] analyzing more than 2 millions tweets in a 7 months period, found that women (at the first
place), immigrants, gay and lesbian persons, Muslims, Jews and disabled persons were addressed
by more than 100 thousands of hateful tweets. Another study ([7]) reports that around 10% of
social media users reports being victimized by online hate speech. On the other hand, several
institutions have approved guidelines to promote the usage of an inclusive language in their
oficial documents, that is a language that does not carry any explicit or implicit diference
between genders. For instance, we quote the European Parliament (see [8]) and the University
of Milan (see [10]). This work aims at helping the detection of non inclusive language usage to
facilitate their correction.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Natural Language Processing techniques to mitigate gender bias have been reviewed by [15].
They start from the observation that NLP systems containing bias in training data, resources,
pretrained models (e.g. word embeddings), and algorithms can produce gender biased predictions
and sometimes even amplify biases present in the training sets. An example, driven from
Machine Translation is that translating “He is a nurse. She is a doctor.” to Hungarian and back
to English results in “She is a nurse. He is a doctor.” The main contributions of the paper are
related to NLP itself, more than to user composed documents. In the same spirit, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] quantify the
degree to which gender bias difers with the corpora used for training. They look especially at
the impact of starting with a pre-trained model and fine-tuning with additional data. A diferent
strand of study uses sentiment analysis to try to detect gender of writers in blogs or social media,
for instance to establish a detailed health policy specialized into patient segments ([12]), or aims
at detecting gender and age of writers in order to improve sentiment analysis ([16]). [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] tested
the use of tools to simplify the task of information selection from collections of documents used
by qualitative researchers in order to analyze discrimination. They compare some methods and
ifnd that relevant words can be eficiently found, but results heavily depend on the quality of
pre processing. A tool to check German texts for gender discriminatory formulations has been
described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It is based on rules to detect nouns used to denote male persons and then to
iflter out correct cases, for instance when the noun refers to a specific male person (as a proper
noun is found nearby). In non correct cases the user is prompted with suitable messages and
hints. The way this tool works is basically the same of our rule based model. More recently,
Microsoft Word text editor started ofering a tool to check for non inclusive language, that,
when activated, prompts some hints during documents input. However useful, it is presently
not configurable and does not cover a large number of cases. Our method relies on Named
Entity Recognition capabilities of NLP software to extend the set of entities to check; a complete
review of the topic is however out of the scope of the present work and we will limit ourselves
to describe some specific points. [ 11] examined 200 English and 200 Spanish tweets containing
hate speech against women and immigrants using a model based on Spacy ([14]). Words were
divided into 27 categories according to the type of negative contents; an accuracy of 84% for
English and 62% for Spanish in identifying hate speech is reported. Bert has been used in [13]
to analyze 16,000 tweets, containing 1972 tagged as racist and 3383 as sexist. A goal of the
study was to reduce false positives to detect hate speech without undermining the freedom of
expression. The maximum achieved specificity, was 83.03. Another study, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], compared four
algorithms including Spacy, Bert and two monolingual algorithms (Flair and camemBERT) to
ifnd entities in a set of 500 legal French cases, where a pool of experts annotated the cases with
60,000 entity quotations. The monolingual tools reached the best precision and recall.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>The proposed methods consists of two interleaved pipelines, one is the “production” tool for
the detection of discriminatory language in oficial documents, the other is used to expand
the tables used by the system, so that more and more cases can be detected. Referring to fig.
1, the first pipeline, labelled 1, is fed with documents. A detection software, using rules and
tables with words used as “seed entities”, produces a tagged version of documents, to highlight
potentially discriminatory points. Each word included in the tables and not fitting one of the
required rules is tagged as an entity (in the sense of NER of NLP terminology), with a suitable
error tag. A user checks the tags, fixes the document and resubmit it, until a satisfactory result
is obtained. Examples of tables are: entities to check, like: (man: woman), (men: women); list of
male proper names, and so on. Examples of rules are:
1. Base rules, evaluating True or False: R1 = (word is in table of entities); R2 = (one of
neighbors of word = table of entities[word])
2. Compound rules, evaluating a Tag or None: if R1 and R2 then assign Tagxxx to word
The second pipeline is in turn divided into two branches. The branch labelled 2 builds a
model and the branch labelled 3 uses it to find new entities. In branch 2, the documents are
processed by a modified version of the detection software, that shares the same rules and tables.
However, instead of creating tagged versions of the documents, it creates a set of annotations.
These are single sentences, taken from the documents, with an occurrence of an entity, tagged
with the proper error, in the format required by the training program of the chosen NLP tool.
Moreover, whenever a “wrong” occurrence is found, all the “correct” version are added, as
separate annotations with the proper tags. The annotation are then used to train a NER model.
We used Spacy, but in principle any NLP system with trainable NER capabilities can be employed.
In branch 3, the model is fed with the documents and produces a new set of tagged ones. Tagged
words will not coincide with those found in pipeline 1; often we found more errors compared
to the rule based version. However new entities, not in the set of the seed ones, are in general
found. After a user’s review, the approved entities are added to the tables used in pipeline 1.
The role of the user is important, because the system often finds entities that are not of interest
or even completely wrong. This closes the loop, extending the capabilities of the “production”
rule system. Pipeline 2 can be run with the extended tables, until a satisfactory set of terms is
obtained.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>A selection of oficial documents of the Milan University, chosen among general, departmental,
security / privacy and staf regulations has been used to search for two basic types of “problems”:
i) sentences containing only the male form of a noun having a diferent female form, e.g. “uomo
– donna” (man – woman) ii) sentences containing nouns having the same male and female form,
without any other grammatical element to stress reference to both genders, e.g. “il docente”
instead of “il/la docente” (the teacher, no English analogous article form) Whenever one of the
above was found, the following annotations were created:
• the original sentence,
• the sentence with both forms of the noun for cases 1 above,
• the sentence with the unique form of the noun and both articles for case 2 above,
• the sentence with the male form of the noun and male article, followed by a randomly,
chosen proper male name,</p>
      <p>• the sentence with only the female form of the noun or the female article.</p>
      <p>We used these annotations to train the model to recognize the possible cases, assigning a label
to each one.</p>
      <sec id="sec-4-1">
        <title>4.1. Training with a rich seed entity set</title>
        <p>We used a set containing 23 entities typically found in University documents, like teacher,
student, researcher and so on and trained two diferent models, the first using 4683 annotations
the second one using 8272. After training, both models have been used to analyse the same
set of documents. In order to evaluate the accuracy, as a first approximation we considered
the rule based detector 100% correct for the seed entities and implemented a semi automatic
procedure to check detection errors (false positives). We found that the rule based model found
a total of 1846 errors, the first model 2337 including 634 false positives, so that accuracy is
72.9%; it missed 143 errors found by the rule based model, that is the 7.7%. The second model
found 2316 errors including 503 false positives, with accuracy 78.3%, missing 33 errors (1.8%)
found by the rule based model. The first model was able to detect 23 new correct entities (e.g.
“referent”, “warranter/sponsor”), the second one just 16. In other terms, a larger training set
reduces errors, but also the number of new entities found. It must be noted that the rule based
model was actually not 100% correct, a circumstance that may have had a negative impact on
Spacy’s training. A manual check performed on ≈ 15% of the documents showed ≈ 4.47 false
positives. For example the term “componente” (component) may indicate a person in a sentence
like “un componente dello staf...” (a component of the staf) or an abstract entity in the sentence
“la componente studentesca della popolazione” (the student component of the population).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Training with an incremental seed entity set</title>
        <p>Experiments were performed to check the ability of build a rich set of entities starting with just
a few ones. The workflow is shown in fig. 2: we started with a table consisting of a few seed
entities and used the rule based model to create annotations from a document. They were used
to train a model, that was run on the same document. If some new entities were found, they
were manually checked and, if correct, added to the entity table; then new annotations were
created and the model was trained again. If no new entities were found, a new document was
used without retraining the model. In the picture, continuous arrows denote the control flow,
dashed ones the data used at each step.</p>
        <p>Results are summarized in table 1.</p>
        <p>Two experiments were run, the first starting with 7 seed entities (the first 3 columns in the
table 1), the second with 3 (the last 3 columns 1). Each row shows the number of new entities
found if any, the number of corresponding annotations and the time in minutes needed to
retrain the model. Experiments were run on a “small” system, that is a commercial PC with an
i7 Intel processor, 6 cores, 2.7 GHz, 16 Gb Ram. We can summarize that:
• the first experiment stopped finding new entities after the 22nd
• 10 entities found by the first and 6 found by the second model were elements of the
original 23 seed entity set chosen by the user for the experiments in section 4.1</p>
        <p>• some “popular” entities sharply increase the number of annotations
• the training time roughly increases linearly with the number of annotations, even if it
does not depend only on it (see fig. 3)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and future work</title>
      <p>We showed that a combination of a rule based model with a trainable one is a promising way to
get an accurate and extensible tool to detect non inclusive language. We applied the method
to oficial documents of an Institution as a first test area, but we think that it can be applied
to wider types of documents and discrimination languages. In the future, we plan to consider,
for instance, textbooks and blog or newspaper articles. These imply to manage some more
subtle types of discriminatory language, related for instance to bad sentiments and stereotypes.
Therefore, we plan to perform some technical enhancements, such as: i) the rule engine and
rule set will be expanded to handle more complex cases and avoid the small percentage of errors
we found in the present work ii) we will compare performances of several Natural Language
Processing tool, instead of using just one and iii) we will include methods of the Sentiment
Analysis area.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The authors wish to thank their students Francesca Aru, Giovanni Lafiandra and Giulia Pagani
for their helpful work, especially during the experimental phase. This work was partly supported
by Università degli Studi di Milano under the program “Piano di sostegno alla ricerca”.
[6] Devlin, J. and Chang, M.-W. and Lee, K. and Toutanova, K., BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), url
https://aclanthology.org/N191423, doi:10.18653/v1/N19-1423, 2019
[7] Döring, N. and Mohseni, M. R., Gendered hate speech in YouTube and YouNow comments:
Results of two content analyses, SCM Studies in Communication and Media, vol. 9, n. 1,
2020
[8] Gender neutral language in the European Parliament, https://www.europarl.europa.eu/
cmsdata/151780/GNL_Guidelines_EN.pdf. Last accessed 28 Feb 2022
[9] Lingiardi, V. and Carone, N. and Semeraro, G. and Musto, C. and D’Amico, M. and Brena,
S., Mapping Twitter hate speech towards social and sexual minorities: a lexicon-based
approach to semantic content analysis, Behaviour &amp; Information Technology, vol. 39, n. 7,
2020
[10] Linee guida per l’adozione della parita di genere nei testi amministrativi e
nella comunicazione istituzionale dell’Universita degli Studi di Milano, (in
Italian), https://www.unimi.it/sites/default/files/regolamenti/Lineeguidalinguaggiodigenere_
2020_UniversitádegliStudidiMilano.pdf. Last accessed 28 Feb 2022
[11] Lai, M. and Stranisci, Marco A. and Bosco, C. and Damiano, R. and Patti, V. HaMor at the
Profiling Hate Speech Spreaders on Twitter, Working Notes of CLEF 2021-Conference and
Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 2936, pp. 2047–2055, 2021
[12] Park, S. and Woo, J. Gender Classification Using Sentiment Analysis and Deep Learning in
a Health Web Forum. Appl. Sci. 2019, 9, 1249. https://doi.org/10.3390/app9061249
[13] Gaurav Rajput and Narinder Singh punn and Sanjay Kumar Sonbhadra and Sonali Agarwal,</p>
      <p>Hate speech detection using static BERT embeddings, arxiv:2106.15537, 2021
[14] Spacy Homepage, https://spacy.io/. Last accessed 21 Feb 2022
[15] Tony Sun, T. and Gaut, A. and Tang, S. and Huang, Y. and ElSherief, M. and Zhao, J. and
Mirza, D. and Belding, E. and Chang, K-W. and Yang Wang, W., Mitigating Gender Bias in
Natural Language Processing: Literature Review, arxiv 1906.08976, 2019
[16] Volkova, S. and Wilson, T. and Yarowsky, D., Exploring Demographic
Language Variations to Improve Multilingual Sentiment Analysis in Social Media,
url=http://aclweb.org/anthology/D/D13/D13-1187.pdf, EMNLP. 2013</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Alatrista-Salas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hidalgo-Leon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <article-title>Nunez-del-</article-title>
          <string-name>
            <surname>Prado</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Documents Retrieval for Qualitative Research: Gender Discrimination Analysis</article-title>
          ,
          <source>2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , doi: 10.1109/LACCI.
          <year>2018</year>
          .
          <volume>8625211</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Babaeianjelodar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lorenz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gordon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Matthews</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Freitag</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <source>Quantifying Gender Bias in Diferent Corpora. Companion Proceedings of the Web Conference</source>
          <year>2020</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2020</year>
          . DOI:https://doi.org/10.1145/3366424.3383559
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Benesty</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>NER algo benchmark: spaCy, Flair, m-BERT and camemBERT on anonymizing French commercial legal cases</article-title>
          , https://towardsdatascience.com
          <article-title>/ benchmark-ner-algorithm-</article-title>
          <string-name>
            <surname>d4ab01b2d4c3</surname>
          </string-name>
          ,
          <source>Last accessed 22 Feb 2022</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bortone</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cerquozzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>L'</surname>
          </string-name>
          <article-title>hate speech al tempo di Internet, Aggiornamenti sociali</article-title>
          , vol.
          <volume>818</volume>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Carl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Garnier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Haller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Altmayer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Miemietz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Controlling gender equality with shallow NLP techniques</article-title>
          .
          <source>In Proceedings of the 20th international conference on Computational Linguistics (COLING '04)</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , USA,
          <fpage>820</fpage>
          -
          <lpage>es</lpage>
          .
          <year>2004</year>
          . DOI:https://doi.org/10.3115/1220355.1220473
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>