<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Is the Common Approach used to Identify Social Biases in Artificial Intelligence also Biased?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ana Bucchi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriel M. Fonseca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Investigación en Odontología Legal y Forense (CIO), Facultad de Odontología, Universidad de La Frontera 4811230</institution>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Here, we ask whether the most common approaches used to identify demographic biases in artificial intelligence (AI) are also biased. We conducted a Scoping Review of papers indexed in Scopus and WoS on biases in a particular AI application (face recognition). Fourteen original articles met our inclusion criteria. Of these, the vast majority (13) used an a priori approach to identify bias, i.e., they started from a known background in which social groups were subject to low accuracy by the algorithms. Only one study found bias a posteriori, i.e., they examined the results without underlying assumptions about the discriminated groups. Remarkably, this single article identified that it was workers who suffered the negative effects of face recognition, a social segment not analyzed by any study using an aprioristic approach. Of the aprioristic studies, 79% examined skin color and ethnicity, 50% analyzed gender, and two (14%) studied age. Only two articles analyzed bias on-the-ground, while most focused on experiments. We argue that the almost exclusive use of the common approach (aprioristic and experimental designs) to identify systematic errors is a methodological bias. This precludes knowledge of other discriminated social groups or even biases towards humanity as a whole that have never been identified (deep-rooted biases), since their awareness depends on the historical context. To better describe AI models, we believe that eXplainable Artificial Intelligence (xAI) tools should work together with a posteriori bias identification strategies and the measurement of their direct effects on citizens' lives.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Computer Vision</kwd>
        <kwd>methodological bias</kwd>
        <kwd>awareness</kwd>
        <kwd>a posteriori</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        It is well known that AI systems embody human bias towards certain demographics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Precisely, preventing injustice and discrimination can be facilitated using xAI tools [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. xAI
produces more explainable models and enables humans to understand, trust, and effectively
manage the new generation of artificial intelligence [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, because the identification and
mitigation of biases in AI can only be ultimately performed by humans, it is clear that the
systematic errors whose recognition is facilitated by xAI are those that are conscious or easily
accessible and recognizable by a human [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In this study, we focused on whether there are
patterns in the way we approach AI biases that prevent the recognition of discriminated social
groups, which should be considered by users of xAI tools. To this end, we conducted a scoping
review to learn how social biases are identified and analyzed in a specific AI application (face
recognition).
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Material and Method</title>
      <p>
        A scoping review was conducted following the Preferred Reporting Items for Systematic reviews
and Meta-Analyses extension for Scoping Reviews (PRISMA ScR) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The electronic search was performed using the concepts (("bias*) AND ("fac* recognition" OR
"fac* verification" OR "fac* identification") AND ("artificial intelligence" OR "machine learning"
OR "deep learning")) in two indexed databases (Scopus and WoS). Only articles published in
English and with online accessible full texts were included. To be included, articles had to
specifically address biases that occur when AI reflects discrimination towards certain social
groups [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. They also had to focus on automatic face recognition systems, which involve detecting
a face in a photo or video and identifying or verifying who that person is [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Reviews and letters
to the editor were excluded.
      </p>
      <p>One reviewer (A.B.) conducted the search and each article was analyzed according to the
following variables:</p>
      <p>1. Types of knowledge: Whether biases were identified a priori or a posteriori. The former
refers to studies that, as an antecedent to the research, selected a social category (e.g., gender and
skin color) and evaluated the accuracy of face recognition systems according to them. In contrast,
by a posteriori study, we refer to studies that start without assumptions about which social group
is discriminated against or whether there was discrimination at all, but rather evaluated the
results of the AI for patterns in the errors.</p>
      <p>2. Research design: whether biases were identified in the field or in a controlled experiment.
The former studies examined recognition systems in practice, while the latter focused on
analyzing databases, algorithms, and predictions in controlled experiments.</p>
      <p>The search was conducted on April 13, 2023, and the identified articles were screened,
evaluated, and included between April 14, 2023, and May 15, 2023. The analyzed variables were
recorded using an Excel spreadsheet (Microsoft Excel).</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>The literature search yielded 178 articles. Following screening of titles and abstracts and after
establishing eligibility (i.e., whether they were related to the study objectives), 14 articles were
included in this qualitative synthesis. Figure 1 shows the flow of article selection from
identification to inclusion. Table 1 shows all the articles included according to the variables
assessed in this review.</p>
      <sec id="sec-3-1">
        <title>3.1. Types of Knowledge</title>
        <p>
          Of the 14 included articles, the vast majority (13) were aprioristic studies, while only one did not
start with underlying bias assumptions (a posteriori study) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Of the aprioristic studies, 79%
studied skin color and ethnicity, 50% analyzed gender, and two (14%) studied age (Table 1). The
a posteriori study analyzed UBER drivers' perceptions of their facial verification system and found
that drivers dynamically innovate and create numerous strategies to fix the verification errors:
they tilted their face, moved it closer to the light, removed their hat and glasses, changed their
hairstyle, bring their faces outside, placed it in front of headlights, and took it into well-lit
restrooms at gas stations or under bright lamps in dark parking lots.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Research Designs</title>
        <p>
          Two articles showed the biases associated with facial recognition systems in the field [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]:
Watkins' study [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] analyzed the use of Uber's verification system in New York City (USA) and
Toronto (Canada) through semi-structured interviews to find out workers' perceptions of UBER's
verification system, while Johnson et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] analyzed 1136 cases of arrests in the USA using facial
recognition and their relationship with black or white inmates. The other articles used
experimental designs to test whether the databases, algorithms or predictions produced different
accuracy according to certain social categories (all determined a priori).
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusions</title>
      <p>
        In this review, we found that there is a predominant way of approaching the problem of
identifying social biases in face recognition systems. This approach is both aprioristic and
experimental and here we will call it the common approach. In contrast, studies that do not start
from assumptions about users' opinions and effects on face recognition (a posteriori) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
determine their effects in practical cases [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] constitute a minority of cases. As we have seen,
one of these studies identified an affected social segment not considered by the a priori and
experimental studies: it was the workers who "demand significant investments of money, time,
and resourcefulness" to “best repair facial verification technology computational failures and
errors, and in doing so make themselves machine-readable" [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        We postulate that this common approach represents a methodological bias that affects the
realistic recognition of the problem of social biases in this AI application. This implies that the
number of biases may be much higher than that commonly recognized, which boils down to
discrimination by gender, age, skin color, or ethnicity. It should be remembered that the common
methodology starts from the basis of discrimination against groups recognized by society
(women, dark-skinned people, and the elderly) (Table 1); however, there is no reason to believe
that there are no other deep-rooted unconscious biases that have never been discussed by
society, since this depends on the historical context. This is especially important considering the
social categories of gender, race, and age are recognized as "the big three," or the three
particularly prominent social categories into which people automatically categorize individuals,
although there are infinite ways in which humans can create group distinctions [21][22]. People
are actually multidimensional (someone may be a white male, which would make him subject to
fewer errors in IA, but an old worker, which would make him more prone to these errors), so the
universe of systematic errors may be difficult to describe and mitigate. One could argue that
Watkins' study [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] implies that AI errors can actually affect all people, as she found that common
characteristics such as hairstyle or glasses can affect the outcome of AI.
      </p>
      <p>
        We think that users of xAI tools should take this into account, especially since the automation
of xAI-derived explanations brings about human overreliance and causes humans to bypass their
own correct answers and validate incorrect answers from AI [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [23]. Furthermore, the cognitive
effort to understand certain AI explanations negatively affects the interpretation of
recommendations [24]. Thus, explaining why an AI arrives at a result does not ensure that the
user comprehends the result. However, it has been shown that users who engage analytically
significantly increase the effectiveness of explainable AI [25]. We believe that IA explanations
have tremendous potential to facilitate awareness of deep-rooted biases and that this is possible
as long as there are conscious users who start with as few assumptions as possible. Here, we
postulate that to understand the real dimension of social biases and their effects on AI
applications, explainable IA and individuals who are cognitively involved in searching for social
biases must work with an a posteriori approach and research on real cases. To the best of our
knowledge, we are the first to postulate that an a posteriori approach can help reveal deep-rooted
biases.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This study was funded by the ANID - Fondecyt postdoctoral - 3210026.
[10] Albiero, Vitor, Kai Zhang, Michael C. King and Kevin W. Bowyer. “Gendered Differences in
Face Recognition Accuracy Explained by Hairstyles, Makeup, and Facial Morphology”. IEEE
Transactions On Information Forensics and Security 17(2022): 127–137.
[11] Franco, Danilo, Nicolò Navarin, Michele Donini, Davide Anguita and Luca Oneto. “Deep fair
models for complex data: Graphs labeling and explainable face recognition.”
Neurocomputing 470(2022): 318–334.
[12] M. Georgopoulos, Y. Panagakis, M. Pantic, Investigating bias in deep face analysis: The
KANFace dataset and empirical study. Image and Vision Computing 102(2020), 103954. doi
https://doi.org/10.48550/arXiv.2005.07302
[13] D. Celis, M. Rao, Learning facial recognition biases through VAE latent representations, in:
Proceedings of the 1st International Workshop on Fairness, Accountability, and
Transparency in MultiMedia, Association for Computing Machinery, New York, NY, 2019, pp.
26–32. doi: https://doi.org/10.1145/3347447.3356752
[14] J. Coe, M. Atay, Evaluating impact of race in facial recognition across machine learning and
deep learning algorithms, Computers, 10(2021) 113.</p>
      <p>Doi: https://doi.org/10.3390/computers10090113
[15] T.P. Pagano, R.B. Loureiro, F.V.N. Lisboa, G.O.R. Cruz, R.M. Peixoto, G.A.D.Guimaraes, E.L. S.</p>
      <p>Oliveira, I. Winkler, E. G. S. Nascimento, Context-Based Patterns in Machine Learning Bias and
Fairness Metrics: A Sensitive Attributes-Based Approach, Big Data And Cognitive Computing
7(2023), 27. doi: https://doi.org/10.3390/bdcc7010027
[16] Wang, Mei, Yaobin Zhang and Weihong Deng. “Meta Balanced Network for Fair Face
Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(2022):
8433–8448.
[17] I. Serna, A. Morales, J. Fierrez, N. Obradovich. Sensitive loss: Improving accuracy and fairness
of face representations with discrimination-aware deep learning, Artificial Intelligence
305(2022) 103682. doi: https://doi.org/10.1016/j.artint.2022.103682
[18] Jiang, Luo, Juyong Zhang and Bailin Deng. “Robust RGB-D Face Recognition Using
AttributeAware Loss”. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(2020),
2552–2566.
[19] López-López, Eric, Xosé M. Pardo, Carlos V. Regueiro, Roberto Iglesias and Fernando E.</p>
      <p>Casado. “Dataset bias exposed in face verification”. IET Biometrics 8(2019): 249–258.
[20] Muhammad, Jawad, Yunlong Wang, Caiyong Wang, Kunbo Zhang and Zhenan Sun.
“CASIAFace-Africa: A Large-Scale African Face Image Database”. IEEE Transactions on Information
Forensics and Security 16 (2021): 3634–3646.
[21] Taylor, Shelley E., Susan T. Fiske, Nancy L. Etcoff and Audrey J. Ruderman. “Categorical and
contextual bases of person memory and stereotyping”. Journal of Personality and Social
Psychology 36(1978): 778–793.
[22] Stangor, Charles, Laure Lynch, Changming Duan and Beth Glass. “Categorization of
Individuals on the Basis of Multiple Social Features”. Journal of Personality and Social
Psychology 62(1992): 207–218.
[23] M. Schemmer, N. Kühl, C. Benz, and G. Satzger. ON THE INFLUENCE OF EXPLAINABLE AI ON</p>
      <p>AUTOMATION BIAS. arXiv preprint 2204.08859, 2022.
[24] L.V. Herm. IMPACT OF EXPLAINABLE AI ON COGNITIVE LOAD: INSIGHTS FROM AN</p>
      <p>EMPIRICAL STUDY. arXiv preprints 2304.08861, 2023.
[25] Z. Buçinca, M.B. Malaya, K. Z. Gajos, To Trust or to Think: Cognitive Forcing Functions Can
Reduce Overreliance on AI in AI-assisted Decision-making, Proceedings of the ACM on
Human-Computer Interactionvol 5(2021) CSCW1. doi:
https://doi.org/10.48550/arXiv.2102.09692</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gwyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <source>Examining Gender Bias of Convolutional Neural Networks via Facial Recognition, Future Internet</source>
          <volume>14</volume>
          (
          <year>2022</year>
          ),
          <volume>375</volume>
          . https://doi.org/10.3390/fi14120375
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Georgopoulos</surname>
          </string-name>
          , Markos, James Oldfield,
          <article-title>Mihalis A. Nicolaou, Yannis Panagakis and Maja Pantic. “Mitigating Demographic Bias in Facial Datasets with Style-Based Multi-attribute Transfer”</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>129</volume>
          (
          <year>2021</year>
          ):
          <fpage>2288</fpage>
          -
          <lpage>2307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Baum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mantel</surname>
          </string-name>
          , E. Schmidt, T. Speith, From Responsibility to Reason-Giving
          <source>Explainable Artificial Intelligence, Philosophy and Technology</source>
          <volume>35</volume>
          (
          <year>2022</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          . https://doi.org/10.1007/s13347-022-00510-w
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gunning</surname>
            , David, Mark Choi Stefik,
            <given-names>Jaesik</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
          </string-name>
          , Timothy Stumpf,
          <source>Simone Yang and Guang Zhong “XAI-Explainable artificial intelligence”</source>
          .
          <source>Science Robotics</source>
          <volume>4</volume>
          (
          <year>2019</year>
          ):
          <fpage>4</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bertrand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Belloum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Eagan</surname>
          </string-name>
          , W. Maxwell,
          <article-title>How cognitive biases affect XAI-Assisted decision-making: A systematic review</article-title>
          ,
          <source>in: Proceedings of the AAAI/ACM Conference on AI</source>
          ,
          <string-name>
            <surname>Ethics</surname>
          </string-name>
          , and Society, Association for Computing Machinery, New York, NY,
          <year>2022</year>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>91</lpage>
          , vol.
          <volume>1</volume>
          . doi: https://doi.org/10.1145/3514094.3534164
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Tricco</surname>
            ,
            <given-names>Andrea C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lillie Erin</surname>
            <given-names>Zarin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wasifa O'Brien</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kelly K. Colquhoun</surname>
            , Heather Levac, Danielle Moher, David Peters,
            <given-names>Micah D.J.</given-names>
          </string-name>
          <string-name>
            <surname>Horsley</surname>
          </string-name>
          , Tanya Weeks, Laura Hempel, Susanne Akl,
          <article-title>Elie A</article-title>
          .
          <string-name>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christine</surname>
            <given-names>McGowan</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jessie</given-names>
            <surname>Stewart</surname>
          </string-name>
          , Lesley Hartling, Lisa Aldcroft, Adrian Wilson, Michael G. Garritty, Chantelle Lewin, Simon Godfrey,
          <string-name>
            <surname>Christina M. MacDonald</surname>
          </string-name>
          , Marilyn T. Langlois, Etienne V.
          <string-name>
            <surname>Soares-Weiser</surname>
            , Karla Moriarty, Jo Clifford, Tammy Tunçalp and
            <given-names>Sharon E Özge</given-names>
          </string-name>
          <string-name>
            <surname>Straus</surname>
          </string-name>
          <article-title>. “PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation”</article-title>
          .
          <source>Annals of Internal Medicine</source>
          <volume>169</volume>
          (
          <year>2018</year>
          ):
          <fpage>467</fpage>
          -
          <lpage>473</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brownlee</surname>
          </string-name>
          ,
          <article-title>Deep Learning for Computer Vision Image Classification, Object Detection, and Face Recognition in Python</article-title>
          .
          <source>Edition v1.3. Machine Learning Mastery, online</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Watkins</surname>
          </string-name>
          , Face Work:
          <article-title>A Human-Centered Investigation into Facial Verification in Gig Work</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>7</volume>
          (
          <year>2023</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          . doi: https://doi.org/10.1145/3579485
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.L.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.N.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , D.
          <string-name>
            <surname>McCurdy</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          <string-name>
            <surname>Olajide</surname>
          </string-name>
          ,
          <article-title>Facial recognition systems in policing and racial disparities in arrests</article-title>
          ,
          <source>Government Information Quarterly</source>
          <volume>39</volume>
          (
          <year>2022</year>
          ),
          <volume>101753</volume>
          . doi: https://doi.org/10.1016/j.giq.
          <year>2022</year>
          .101753
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>