Towards a Gendered Innovation in AI
Silvana Badalonia and Francesca A. Lisib
a
       University of Padua, Via 8 Febbraio 2, Padua, 35122, Italy
b
       University of Bari “Aldo Moro”, Via E. Orabona 4, Bari, 70125, Italy


                                Abstract
                                In this paper we address the problem of including the gender dimension in the content of
                                Computer Science, notably in Artificial Intelligence (AI). We analyze first the fairness of
                                Machine Learning (ML) algorithms from a gender point of view. Due to their nature of being
                                bottom-up data-driven algorithms, the most common biases diffused in society about gender
                                and ethnicity can be captured, subsumed and reinforced by them, as many ML applications
                                show. Then, to understand how to develop a new gendered (Computer) Science and promote
                                a gendered innovation in AI, we show a formal reflection on the scientific method utilized to
                                produce innovation and a critical analysis of the logical rules underlying it.

                                Keywords 1
                                Gender issues, Bias, Fairness, Trustworthy AI.

1. Introduction

    The gender, diversity and inclusion dimension of science and technology has become a highly
visible and debated theme worldwide, impacting society at every level. In some fields of knowledge,
however, these issues are still not so impactful.
    If the term ‘AI for good’ is increasingly used in the scientific and technological context, there is
much less discussion about ‘AI for social good’ aiming at identifying the relationship between AI and
our societal goals, in particular, the goal of gender equality [1].
    In the field of AI many case studies show that the Machine Learning (ML) algorithms present an
“unfairness” from the gender point of view. The hypothesis is that these algorithms are not gender
neutral due to their nature of being bottom-up data-driven. They can capture and subsume the most
common biases diffused in society and even reinforce them, where for gender bias we adopt the
definition given by EIGE [2], i.e. prejudiced actions or thought based on gender-based perception that
women are not equal to men in rights and dignity.
    In the perspective of developing a trustworthy AI able to learn fair AI models even in spite of
biased data, as we will illustrate later, we intend to address the problem of framing the landscape of
gender equality and AI, trying to understand how AI can overcome gender bias and showing how an
interdisciplinary analysis can help in a re-calibration of the biased instruments. This problem is even
more important now since AI is often confused with tools, algorithms and technologies developed in
its framework [3].
    A recent UNESCO report on this subject [4] recognizes the absolute centrality of this topic and
provides recommendations on how to address gender equality considerations in AI principles. The
purpose of the UNESCO’s Dialogue on Gender Equality and AI identifies issues, challenges, and
good practices to help:
    • Overcome the built-in gender biases found in AI devices, data sets and algorithms;
    • Improve the global representation of women in technical roles and in boardrooms in the
         technology sector;

AIxIA 2020 Discussion Papers Workshop
EMAIL: Silvana.Badaloni@unipd.it (A. 1); Francesca.Lisi@uniba.it (A. 2)
ORCID: 0000-0002-5287-0468 (A. 1); 0000-0001-5414-5844 (A. 2)
                           © 2020 Copyright for this paper by its authors.
                           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Wor
Pr
   ks
    hop
 oceedi
      ngs
            ht
            I
             tp:
               //
                ceur
                   -
            SSN1613-
                    ws
                     .or
                   0073
                       g
                           CEUR Workshop Proceedings (CEUR-WS.org)
   •   Create robust and gender-inclusive AI principles, guidelines and codes of ethics within the
       industry.

The paper is structured as follows. Section 2 addresses the problem of gender bias in ML applications
and discusses some notable cases. Section 3 proposes an approach for developing a Gendered
Innovation in AI and Section 4 concludes the paper.

2. Gender bias in Machine Learning applications

    In this section we wonder about the fairness of ML algorithms from a gender point of view. The
question is the following: are ML tools, algorithms and technologies, gender neutral? We observe that
when following the data-driven paradigm underlying ML, it is necessary to check whether the data
used to train the algorithms includes all the bias about gender and other possible areas of
discrimination - for example ethnicity - diffused in the society. The answer is positive, since for any
ML system, the output is determined by the training data, in some cases driven by literally millions of
examples. So, these kinds of algorithms – in particular, Neural Networks and Deep Learning – being
conceived as learning systems, can upload the gender bias diffused in the society as shown in many
examples reported in [5]. The problem arises mainly because little attention is paid to how data is
collected, processed and organized. Indeed, the biases are substantially data-driven biases [6].
    We cite Joshua Bengio of the Montreal University who told: ‘AI can amplify discrimination and
biases, such gender or racial discrimination, because those are present in the data the technology is
trained on, reflecting people’s behavior.’ In other words: should we let data speak for itself?
    Recently many studies have shown how these ML techniques have brought to applications affected
by biases in different fields, from machine translation [7] to assessing geodiversity issues [8], from
predictors of crime recidivism [9] to predictors in medicine [10]. It should be mentioned that fairness
could be guaranteed, by following two different solution approaches:
    1. Data debiasing (such as in, e.g., [11])
    2. Model debiasing (such as in, e.g., [12])
    A state-of-the-art survey of works on bias and fairness goes beyond the scope of this paper. The
interested reader might refer to, e.g., [13] for an overview of problems and solutions in ML
concerning these crucial aspects. For the sake of brevity, and just for illustrative purposes, in the
remainder of this section we will focus on three applications which we deem particularly
representative: face recognition, word embedding, recruiting tools.


2.1.    Face recognition

   The systems for face recognition are increasingly used. However, often they turn out to be
insufficient for the proper recognition of people of different genders and races. Interesting results
about facial recognition technology determined by Joy Buolamwini, a researcher at the M.I.T. Media
Lab, have proved how some of the biases in the real world can seep into the facial recognition
computer systems [14, 15]. The author has directly experimented that the face of a black person may
not be recognized unless wearing a white mask.
   The performance of three leading face recognition systems - by Microsoft, IBM and Megvii of
China - were studied by classifying how well they could guess not only the gender of an individual
but also a man or a woman with different skin tones. The average predictive accuracy percentages
obtained were the following ones:
    - Lighter male 99 %
    - Lighter female 93 %
    - Darker male 88 %
    - Darker female 65 %

   The conclusion drawn by the author was: “A.I. software (we should say M.L. software) is only as


                                                   13
smart as the data used to train it. If there are many more white men than black women in the system, it
will be worse at identifying the black women.”
    The huge amount of data used to train the system has to be balanced respecting gender and racial
composition of the population.
    In a recent work [16] the problem of face recognition has been addressed by making the system
learn demographic information prior to learning the attribute detection task. The system called
InclusiveFaceNet, detects face attributes by transferring race and gender representations learned from
a held-out dataset of public race and gender identities. With this integration, the approach produces
satisfactory results.
    In 2020 there has been a policy change for facial recognition:
    - IBM quits the facial-recognition business. IBM will no longer sell “general purpose” facial-
recognition technology. Reforms and policy proposals have to address racial disparities, the company
opposes using technology for mass surveillance, racial profiling and violations of human rights.
    - Amazon halts police use of its facial recognition technology.

2.2.    Word embedding
    The word embedding tools show that the gender bias diffused in the society can be uploaded in the
system. In word embedding models, the representation of each word in a high dimensional vector
allows to detect the semantic relations between words as words with similar meaning occupy similar
parts of the vector space. It has been proved that these tools capture common stereotypes about
women and men [17]. In fact, when asking the database “father : doctor :: mother : x”, the answer is
x=nurse. And the query “man: computer programmer :: woman : x” gives x=homemaker.
    The word embedding tools can be terribly sexist as the society is and this constitutes another
important example in which a blind application of ML algorithms can lead to a strong reinforcement
of existing social and gender biases. As already mentioned, this is due substantially to the
mechanisms on which these AI methods are rooted - mainly bottom-up and data-driven methods. It is
important to be aware that the algorithms and the tools utilized to solve different problems are not
neutral and have to be analyzed deeply in this respect before application.
    Aware of this kind of not neutrality, the word embedding biasing capability has been exploited as a
quantitative lens to study the evolution of stereotypes and attitudes toward woman and ethnic
minorities in the 20th and 21st centuries in the United States [18]. This work provides an approach for
temporal analysis of word embedding and shows a new interesting intersection between ML and
social science.
    As proposed in [17] it is possible to de-bias the database since a vector space is a mathematical
object and it can be dealt with mathematical tools. To this aim it is sufficient to clean the database
searching for the couples “he : she” that belong to a list of gender biased pairs that need to be
removed. In this work the result is a vector space in which the gender bias is significantly reduced.
“One perspective on bias in word embeddings is that it merely reflects bias in society, and therefore
one should attempt to debias society rather than word embeddings,” say Bolukbasi and co. “However,
by reducing the bias in today’s computer systems (or at least not amplifying the bias), which is
increasingly reliant on word embeddings, in a small way debiased word embeddings can hopefully
contribute to reducing gender bias in society.” That seems a worthy goal. As the Boston team
concludes: “At the very least, machine learning should not be used to inadvertently amplify these
biases.” The problem is that it is not always possible to clean the database utilizing a list of possible
biased gender couples to be compared with the complete list of couples W/M.

2.3.    Recruitment
    Recently ML specialists at Amazon uncovered a big problem: “their new recruiting engine did not
like women” [19]. The company’s experimental hiring tool used ML algorithms to give job
candidates scores ranging from one to five stars - much like shoppers rate products on Amazon. The
system was trained to vet applicants by observing patterns in resumes submitted to the company over


                                                    14
a 10-year period. Most came from men, a reflection of male dominance across the tech industry. So
relentlessly the automatic recruitment tool preferred male candidates. The system was completely
changed and, as reported in [16], “Amazon’s recruiters looked at the recommendations generated by
the tool when searching for new hires, but never relied solely on those rankings, they said.


3. An approach to Gendered Innovations in Science

   Gendered Innovations harness the creative power of sex and gender analysis for innovation and
discovery. The most prominent researcher in this field, Londa Schiebinger, reports many case studies
in different disciplines, starting from the pregnant crash test dummies to machine translation, from
heart diseases in women to osteoporosis in men, from assistive technology for the elderly to urban
transport plan [20]. Overall, the collection provides a wide roadmap for sex and gender analysis in
order to promote reproducible, innovative and responsible research [21].
   Considering gender may: (i) add a valuable dimension to research, and (ii) take research in new
directions. For instance, research on heart diseases offers one of the most developed examples of
gendered innovations. It considers the fact that ischemic heart disease is the leading cause of death for
women of US and European populations.

3.1.    Gender dimension in Science

    Let us see how a new gendered Science can be developed together with new interpretations of
facts with respect to a universal male-point-of-view proposed as neutral.
    In general, it is important to understand how we can re-design the scientific theories, how we can
propose new hypothesis taking into account the gender dimension, how we can formulate new
scientific questions having the awareness that another science is possible, how we can produce a
critical view of the method in re-shaping the science. According to [22], “There is a need to go
beyond stereotypical feminization of products – so called “pinking” – as female preferences can be
drivers for substantial innovation”, the “pinking” method is not sufficient to produce a new gendered
innovation.
    Another point to take into consideration is the difference that women and men have in their
approach to the use of technology. While women tend to be more interested in the ease of use of
technological devices and in their social benefits, many men focus on the performance of the
technology and often, technological devices can become for them quite a ‘status symbol’. Also, social
needs and life models are different for women and men: this can largely influence technology and its
products. Since women represent the mentality, the preferences and the needs of every day by more
than 50 % of the human race it is important that, as reported in [22]: “If research institutions and
industry want to create valuable and sustainable research results and technologies for people (the
market), it is recommended to include women at all stages of the research and innovation process”.
    In [23, 24] we have studied this problem in the field of human-machine interaction showing that
the gender dimension influences in an important way the design of robots for assisting and interacting
with people. In scenarios where robots can assume complex behaviors, it is very important to consider
the gender factor for better results in terms of robot's robustness and efficiency in running the various
tasks.

3.2.    From confirming to falsifying argument
   With these premises, let us now consider a formal reflection on the scientific method and a critical
analysis of logical rules underlying the method used in Science [25].
   A very common belief is that, in the first instance, experiments are conducted to test the
hypothesis of a theory: if the expected observations of experiments are verified then the theory is fully
demonstrated. Formally, if the assumptions of the theory are H and O the observations, the rule
underlying the knowledge process can be the following:


                                                    15
                                                     H→O and
                                                          O
                                                     -------------
                                                          H
    From the premises that H implies O and O is true, we can deduce that H is true. The logical rule
that represents this schema goes under the name of confirming argument: it seems well representing
the process of innovation in scientific research. But it is a wrong logical rule, a fallacy of the
sillogism, i.e., an error of the reasoning. It is called the fallacy of affirmation of the consequent [26].
    It is easy to verify that, given the logical propositions p and q, the formula:
                                              ( (( p → q) ∧ q ) → p )
is not a logical tautology.
    More in general, as suggested by Popper’s theory [27] and Kuhn’s thought [28], Science does not
proceed for confirming argument and does not advance according to the progressive and continuous
accumulation of truth and knowledge.
    Science proceeds thanks to the attempts of refutation of the theories proposed. In other words, we
advance if there are errors in the accepted theory. So, the right logical rule associated to the
production of innovation is called falsifying argument, represented by:
                                                     H→O and
                                                         ¬O
                                                     -------------
                                                         ¬H
    From the premises H→O (H implies O) and ¬O (not O, O false) it can be deduced ¬H (not H, H
false). In other words, when the consequences of a theory are not verified in the experimental context
then the theory needs to be completely re-designed. This argument corresponds to the correct logical
rule called Modus Tollens.

3.3.    Gender in Computer Science

    The falsifying argument rule can be the basis of a scientific theory that takes gender into account.
    Suppose that a certain theory H does not consider the gender dimension (e.g., medicine vs gender
medicine). We need to put the following question: following the implication H → O, do we expect to
find the observations O foreseen by the theory true H?
    Evidently not, because 50% of the users of the innovations are women but, as evidenced by a large
literature, it is presumable state that the needs of this part of users are not incorporated in the theory
for innovation. Hence these observations can be false (¬ O) and the theories of departure, too (¬ H).
    The rule underlying the scientific method in the production of gendered innovations is just the
falsifying argument. This leads us to say that, in order to produce a new gendered science in all fields,
it is not sufficient to apply the ‘pinking method’ but it is necessary to radically change the
assumptions. Only a complete redefinition of the method and the research model with new
applications and new ways of observation can re-design the science in a gender perspective. Thus, in
order to design AI-based Computer Systems able to socially interact for facing complex challenges,
the gender dimension needs to be taken explicitly into account by re-formulating the questions that
can produce responsible research innovations.

4. Conclusions

   The problem we addressed in this paper is surely very complex. However, it is crucial for the
implementation of a Trustworthy AI that developers and users of AI-based tools do not pursue a blind
application of data-driven AI methods [29,30].
   This is only one of the aspects that assess the vulnerability of ML algorithms to adversarial attacks
(both at training and test time). Indeed, the blind application of ML algorithms can lead to a strong
reinforcement of existing social and gender bias. So, when we use ML tools we should check whether
the data used for training the underlying algorithms includes also all the bias about gender and


                                                     16
ethnicity diffused in the society. In particular, in order to train systems on balanced data sets, it is very
important to apply the debiasing method, i.e. a vector space can be cleaned from bias by compiling a
list of gender biased pairs to remove this warp. This has been applied in many applications. More in
general, new methods for debiasing data should be studied in order to develop Responsible Gendered
Research Innovation.
    Aware of these problems that affect the fairness of many algorithms, the next step should be to
address the problem of how the gender dimension can be taken into account in the content of the
scientific production both from a methodological point of view and from the applicative one [23,24,
25]. We have shown on the basis of a formal reflection on the scientific method and a critical analysis
of logical rules underlying the method used in Science that new Gendered Science can be developed
formulating new scientific questions with the awareness that another science is possible.

5. Acknowledgements
   A special thanks to Lorenza Perini of the University of Padova who contributed insights and
expertise to this research in the past.


6. References
[1] R. Vinuesa, H. Azizpour, I. Leite, M. Balaam, V. Dignum, S. Domisch, A. Felländer, S. D.
     Langhans, M. Tegmark, F. Fuso Nerini, The role of Artificial Intelligence in achieving the
     Sustainable      Development     Goals.    Nature     Communications        11,     233    (2020).
     https://doi.org/10.1038/s41467-019-14108-y
[2] European Institute for Gender Equality, https://eige.europa.eu/thesaurus/overview
[3] C. Bodei, L. Pagli, L’informatica non è un paese per donne. Mondo Digitale, 2017.
[4] UNESCO. Artificial Intelligence and Gender Equality. Key findings of UNESCO’s Global
     Dialogue (2020). https://en.unesco.org/AI-and-GE-2020
[5] J. Zou, L. Schiebinger. AI can be sexist and racist – it’s time to make it fair. Nature 559 (2018):
     324-326, https://www.nature.com/articles/d41586-018-05707-8
[6] K. Hammond, 5 unexpected bias in artificial intelligence. TechTrunch. 2016.
[7] G. Stanovsky, N. A. Smith, L. Zettlemoyer, Evaluating Gender Bias in Machine Translation. In
     A. Korhonen, D. R. Traum, L. Màrquez: Proceedings of the 57th Conference of the Association
     for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1:
     Long Papers. Association for Computational Linguistics 2019, ISBN 978-1-950737-48-2: 1679-
     1684, https://www.aclweb.org/anthology/P19-1164/
[8] S. Shankar, Y. Halpern, E. Breck, J. Atwood, J. Wilson, D. Sculley, No Classification without
     Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World,
     NIPS 2017 Workshop on Machine Learning for the Developing World,
     https://arxiv.org/abs/1711.08536
[9] COMPAS, http://www.equivant.com/solutions/inmate-classification
[10] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, S. Thrun, Dermatologist-
     level classification of skin cancer with deep neural networks, Nature 542 (2017): 115-118,
     https://www.nature.com/articles/nature21056
[11] D. Nozza, C. Volpetti, E. Fersini: Unintended Bias in Misogyny Detection. In: P. M. Barnaghi,
     G. Gottlob, Y. Manolopoulos, T. Tzouramanis, A. Vakali (Eds.): 2019 IEEE/WIC/ACM
     International Conference on Web Intelligence, WI 2019, Thessaloniki, Greece, October 14-17,
     2019. ACM 2019, ISBN 978-1-4503-6934-3: 149-155
[12] Y. Qian, U. Muaz, B. Zhang, J. Won Hyun: Reducing Gender Bias in Word-Level Language
     Models with a Gender-Equalizing Loss Function. In F. E. Alva-Manchego, E. Choi, D. Khashabi
     (Eds.): Proceedings of the 57th Conference of the Association for Computational Linguistics,
     ACL 2019, Florence, Italy, July 28 - August 2, 2019, Volume 2: Student Research Workshop.
     Association for Computational Linguistics 2019, ISBN 978-1-950737-47-5: 223-228


                                                      17
[13] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, Aram Galstyan: A Survey on Bias and
     Fairness in Machine Learning. CoRR abs/1908.09635 (2019)
[14] S. Lohr, Facial Recognition is Accurate, if You’re a White Guy. The New York Times (2018),
     https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-
     intelligence.html
[15] J. Buolamwini, T. Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial
     Gender Classification. In Proc. of the 1st Conference on Fairness, Accountability and
     Transparency, Proceedings of Machine Learning Research 81:77-91 (2018),
     http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
[16] H. Jung Ryu, H. Adam, M. Mitchell, InclusiveFaceNet: Improving Face Attribute Detection
     with Race and Gender Diversity, ICML 2018 Workshop on Fairness, Accountability, and
     Transparency in Machine Learning, Stockholm, Sweden.
[17] T. Bolukbasi, K.–W. Chang, J.Y. Zou, V.Saligrama, A.T. Kalai. Man is to computer programmer
     as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information
     Processing Systems, pp 4349-4357, 2016.
[18] N. Garg, L. Schiebinger, D. Jurafsky, J. Zou, Word embeddings quantify 100 years of gender and
     ethnic stereotypes. Proceedings of the National Academy of Sciences of the United States of
     America, April 17, 2018, 115 (16) https://doi.org/10.1073/pnas.1720347115
[19] J. Dastin. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters.
     2018.
[20] L. Schiebinger et al (eds), Gendered innovations in Science, Health&medicine, Engineering and
     Environment. https://genderedinnovations.stanford.edu
[21] C. Tannenbaum, R.P. Ellis, F. Eyssel, F. et al., Sex and gender analysis improves science and
     engineering. Nature 575, 137–146 (2019). https://doi.org/10.1038/s41586-019-1657-6
[22] Sanchez de Madariaga. http://www.genderste.eu/i_research01.html , 2013.
[23] S. Badaloni, L. Perini, The influence of the gender dimension in human-robot interaction, in
     S.M. Anzalone, A. Farinelli, A. Finzi, F. Mastrogiovanni: Proceedings of the 4th Italian
     Workshop on Artificial Intelligence and Robotics A workshop of the XVI International
     Conference of the Italian Association for Artificial Intelligence (AI*IA 2017), Bari, Italy,
     November 14-15, 2017. CEUR Workshop Proceedings 2054, CEUR-WS.org 2018, http://ceur-
     ws.org/Vol-2054/paper9.pdf
[24] G. Beraldo, S. Di Battista, S. Badaloni, E. Menegatti, M. Pivetti, Sex differences in expectations
     and perception of a social robot,    2018 IEEE Workshop on Advanced Robotics and its Social
     Impacts, ARSO 2018, Genova, Italy, September 27-29, 2018. IEEE 2018, ISBN 978-1-5386-
     8037-7, https://ieeexplore.ieee.org/document/8625826
[25] S. Badaloni, L. Perini, Are algorithms gender neutral? 10th Conf. on Gender Equality in Higher
     Education, Dublin, 2018. https://genderequalityconference2018.com/
[26] G. Federspil. Logica clinica. I principi del metodo in medicina. Mc-Graw-Hill, Pub. Group Italia,
     Milano, 2004.
[27] K.R. Popper. The Logic of Scientific Discovery. Routledge Classics, 1959
[28] T. Kuhn. The Structure of Scientific Revolutions (1st ed.). University of Chicago Press, 1962.
[29] High-Level Expert Group on Artificial Intelligence: Ethics Guidelines for Trustworthy AI.
     European         Commission,        Brussels       (2019),      https://ec.europa.eu/digital-single-
     market/en/news/ethics-guidelines-trustworthy-ai
[30] F.A. Lisi, Recent results and activities in Trustworthy Artificial Intelligence. In: Book of
     abstracts of “The ‘Good’ Algorithm? – Artificial Intelligence, Ethics, Law, Health”, Vatican
     City, Feb. 26-28, 2020, p. 27 http://www.academyforlife.va/content/pav/it/events/workshop-
     intelligenza-artificiale.html


                                                    18