<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fairness Auditing, Explanation and Debiasing in Linguistic Data and Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marta Marchiori Manerba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KDD Laboratory, ISTI, National Research Council</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This research proposal is framed in the interdisciplinary exploration of the socio-cultural implications that AI exerts on individual and groups. The focus concerns contexts where models can amplify discriminations through algorithmic biases, e.g., in recommendation and ranking systems or abusive language detection classifiers, and the debiasing of their automated decisions to become beneficial and just for everyone. To address these issues, the main objective of the proposed research project is to develop a framework to perform fairness auditing and debiasing of both classifiers and datasets, starting with, but not limited to, abusive language detection, thus broadening the approach toward other NLP tasks. Ultimately, by questioning the efectiveness of adjusting and debiasing existing resources, the project aims at developing truly inclusive, fair, and explainable models by design.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Responsible NLP</kwd>
        <kwd>Explainability</kwd>
        <kwd>Interpretability</kwd>
        <kwd>Fairness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        At every stage of a supervised learning process, biases can arise and be introduced in the
pipeline. Current models implemented with AI technologies have been shown to inherit
and perpetuate bias against specific demographic groups and protected attributes such as
sexual orientation or religion [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. These skews pose a severe risk and limitation to the
wellbeing of underrepresented minorities, ultimately amplifying pre-existing social stereotypes,
possible marginalization, and explicit harm [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. Given the sensitive contexts in which systems
are deployed, a robust value-oriented evaluation of models’ fairness is necessary to mitigate
unfairness and avoid discrimination.
      </p>
      <p>
        Besides fairness, another crucial aspect to consider lies in the opaqueness of models’ internal
behavior. If the dynamics leading a model to a particular automatic decision are not clear nor
accountable, significant problems of trust for the reliability of outputs could emerge, especially in
sensitive real-world contexts where high-stakes choices are made. Inspecting non-discrimination
of decisions and assessing that the knowledge autonomously learned conforms to human values
also constitutes a real challenge. Indeed, the objective of eXplainable Artificial Intelligence
(XAI) is to propose strategies and methods to render AI systems and automatic decisions more
intelligible to humans. In recent years, working towards transparency and interpretability
of black box models has become a priority: multiple approaches and methods have been
proposed [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        This research topic can not be limited only to constructing mathematical explanations or those
understandable only to data scientists, just as algorithmic fairness is not enough to efectively
counteract certain types of harms [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Therefore, the phenomenon’s complexity is not limited to
algorithms but is deeply rooted and bound in historical, cultural, and social perceptions. It can
not be solved by computational methods alone, nevertheless, what is guiding my commitment
are the pressing need in demanding transparency and the intent of developing truly inclusive
tools that can at least meet the needs of minorities’ experiences in diverse social spaces.
      </p>
      <p>To address these issues, the main objective of the proposed research project is to develop a
framework to perform fairness auditing and debiasing of both classifiers and datasets, starting
with, but not limited to, abusive language detection, thus broadening the approach toward other
NLP tasks. The intuition relies on leveraging explainability techniques to discover biases and
perform fairness auditing, e.g., generating counterfactuals and deploying interpretable proxies
for the black box models. Auditing output can consist of an explanation of the (un)fairness of
linguistic data and language models under analysis, i.e., specific reasons for which the resource
is considered unfair, ultimately exposing unjust behaviors more visibly and transparently. The
framework will propose several metrics and strategies to quantify and approach debiasing from
the identified discriminatory treatments, beginning with those attested within ML and NLP
communities. Ultimately, by contesting the efectiveness of adjusting and debiasing existing
resources, the project aims at developing truly inclusive, fair, and explainable models by design.
In Sections 3 and 4, we will describe in detail the proposed strategies.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        XAI and Fairness Approaches for ML and NLP. Overall, few approaches in the literature
are at the intersection of explainability and fairness. The use of XAI techniques to identify and
explain fairness issues is presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The authors, highlighting the gap in this direction
of research, outline generic recommendations for devising XAI tools, specifically proposing
guidelines for the development of a Fair Explainability toolkit, after outlining the steps in the
design, planning, deployment, and use of AI systems in which bias can potentially be introduced.
This toolkit should be able to: (1) investigate the source data, (2) highlight the impacts of the
choice and development of ML models, and (3) design explanations according to the identified
target audience. One branch emerging from this intersection is the assessment of the fairness of
the explanations by checking the fidelity score of the explanations calculated for each sensitive
group [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The investigation is carried out to evaluate if the explanations are more suitable
for a specific group w.r.t. others. However, this type of assessment is highly dependent on
the choice of explainer and how the hyperparameters are set: the unreliable nature, i.e., as
other explainers produce diferent explanations for the same instances, constitutes a significant
challenge. Another dimension concerns the study of how explanations impact users’ perceptions
of fairness if, indeed, they can increase human trust in the fairness and correctness of automatic
decision-making [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We refer to the review conducted in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where authors collect works
that propose strategies to tackle the fairness of NLP models through explainability techniques.
Generally, authors found that, although one of the main reasons for applying explainability to
NLP resides in bias detection, contributions at the intersection of these ethical AI principles
are very few and often limited in the scope, e.g., w.r.t. biases and tasks addressed. Additionally,
when considering the integration of fairness and XAI, it is essential to recognize the distinct
objectives of each. Fairness primarily emphasizes equitable outcomes, whereas XAI concentrates
on enhancing transparency and understanding of the underlying processes. As the authors point
out [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], there is a lack of metrics to address procedural fairness. Therefore, such approaches,
which apply XAI to fairness questions, are often reduced to checking that sensitive attributes
are not used in decisions, ultimately implementing the much more problematic approach of
fairness through unawareness, which attempts to achieve fairness by intentionally ignoring or
not considering sensitive attributes during decision-making processes. Indeed, fairness through
unawareness has been criticized for several reasons. First, it assumes that excluding sensitive
attributes automatically eliminates bias, disregarding the potential influence of other correlated
attributes that can still perpetuate discriminatory outcomes. Second, it overlooks that ignoring
sensitive attributes can hinder the identification and understanding of discriminatory patterns
and potential biases in the system. Consequently, fairness through unawareness can mask
underlying biases and hinder the ability to address and rectify unfairness efectively. Conversely,
fairness through awareness [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] acknowledges the existence of such attributes and takes into
explicit account the potential impacts on diferent groups. It aims to address biases and ensure
equitable outcomes by actively recognizing and mitigating disparities associated with these
attributes since they can be relevant and important factors in certain contexts.
      </p>
      <p>
        Challenges. We follow the insights from the review conducted in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], where authors provide
an overview of the current state of XAI and its relationship to NLP. Currently, explainability
approaches generally work on low-dimensional tabular data and take a long time to run, so
they do not scale to other types or large-scale datasets. Most explanation methods for NLP
applications are local, remain at the analysis of the linguistic surface, and therefore expose
mostly non-causal relations. Regarding the limitations of leveraging XAI to improve the fairness
of NLP models[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], if the explanation methods are not robust, consistent, and thus reliable, using
explanations to delegate or certify shallow fairness is a risk. Moreover, it is crucial to introduce
the concept of the “uncertainty level” of the explanation to help the user understand how much
it is possible to rely on the explanation. Both explainability and fairness face the challenge
of lacking shared terminology and recognized standards, as there is still no full agreement
within the field. This lack of consensus arises from the diverse range of datasets and models
used, making it dificult to establish consistent frameworks for systematic comparisons and
benchmarking. Although it is a priority to raise new and complex questions within
humancentered ML, assessing the impacts on individuals and understanding what users count as
fair, human-in-the-loop has its costs. Nevertheless, the opportunity to conduct robust user
testing would be essential, collecting human evaluation and fairness judgments to improve
the suitability and quality of explanations w.r.t. specific contexts. Although these challenges
are extremely limiting to the pursuit of developing fair and explainable NLP techniques that
are also robust and reliable, I believe these limitations can be a starting point for my project,
through which I intend to overcome some of them by addressing them together in a systemic,
participatory, co-design and continuous correction perspective, to include missing, unheard
voices and sensitivities, “interrogating and reimagining the power relations between technologists
and such communities” [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Research Questions and Approach</title>
      <p>
        Leveraging XAI and interpretability strategies to uncover fairness issues [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], we intend to
address this problem by designing a framework that deals with detection and mitigation aspects.
Defining solutions to address the unfairness requires considering various dimensions, such as
what constitutes a sensitive attribute to be protected or according to which criteria it is possible
to assess whether a decision is fair.
      </p>
      <p>1. Can explainability techniques contribute to discovering the source and the
reason for unfair, biased behaviors in NLP pipelines?
a) Which explainability techniques help most to uncover biases in NLP applications?
b) What consists of a meaningful explanation for NLP applications w.r.t. the developers
in order to expose potential harm?
c) What about the essential features of an explanation addressed to the final users, enabling
them to both understand the reasons behind automatic decisions and to appeal for
recourse?
2. How to implement explainable and fair by design approaches?</p>
      <p>
        Contributions at the intersection of these fields are still at the start, as reported by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where
trends in XAI and fairness in NLP research are reviewed. Current solutions are restricted to a
few tasks, address narrow biases, and leverage mainly local explanation methods. Since fairness
and explainability are young disciplines and lack solid theoretical foundations, collaboratively
building at the intersection of these two AI ethics principles might be a promising strategy for
exposing the bias. I want this work to position itself diferently from the existing literature,
starting with the clear articulation -still under development- of the concepts we want to deal
with, i.e., bias and unfairness in NLP, to identify what to measure and mitigate efectively.
Using explainability to uncover fairness issues is instead motivated by the lack of transparency.
The inability to provide explanations for AI systems is also often blamed as a source of bias
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Explainability, in this sense, becomes an analytical tool to shed light on both the outputs
and internal dynamics of systems to identify and motivate unjust automatic behaviors. This
contamination within NLP is so far potential and underinvestigated, as very few (and insuficient)
approaches have explored and devised solutions at the intersection of fairness and explainability,
as reported in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As for using post-hoc explainability to generate explanations of fairness
for model behavior, the intuition might be to build methods that produce explanations with
linguistic structure in consideration, exploiting it to account for implicit and less superficial
language dynamics, i.e., going beyond the counterfactual token fairness metric. Regarding
datasets, the goal could be to design efective guarantees that manage to train a model and issue
fair decisions even in the presence of biased data.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Research Directions and Next Steps</title>
      <p>
        The following section presents potential research lines and solutions to the questions under
investigation. The concrete aim of this research is intended to address, on the one hand, the
assessment and “adjustment” of existing tools through a strong value-oriented evaluation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
On the other hand, the urgent need to develop truly inclusive tools, fair and explainable by
design.
      </p>
      <p>Resources to counteract stereotypes and infer fairness explanations. We report a
workflow hypothesis of a Fairness Evaluation Loop (in Fig. 1 a visual representation).
Studying the interplay between XAI, fairness, and ethics, it aims at unmasking, detecting,
and counteracting bias within NLP applications, ultimately fostering responsible ML. The
framework, pursuing fairness as a multi-objective task, will combine:
• evaluation/detection: risk assessment approaches exploiting, among others, (1)
in-depth analysis of the performances/errors obtained over demographic groups;
(2) XAI techniques, e.g., the generation of counterfactuals and the deployment of
interpretable proxies; (3) other ML techniques, such as the detection of outliers
predictions;
• debiasing/mitigation of data, keeping the algorithm fixed but retraining it;
• debiasing/mitigation of the algorithm or fair algorithm development, with data for
which we can not guarantee;
• continuous cycle of monitoring and correction.</p>
      <p>
        The output will consist of diferent explanations according to diferent user types.
Resources that are responsible by design, meaning fair and explainable. Despite the
importance of mitigating unfairness and explaining opaque systems, the challenges
and limitations of current approaches demonstrate how complex and multifaceted the
task is and how occasionally, instead of solving the problem, others are introduced. A
promising research direction, beyond debiasing and explainability, could concern the
development of truly inclusive models, fair and explainable by design regardless of the
potential bias in the data [17, 18]. One contribution could be the collection and publication
of representative datasets containing instances of the misrepresented [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] phenomena.
It is crucial to assess and address especially the under-recognised ones since biases are
manifestations of distinct stereotypes and not all have received the same attention from
the scientific community so far. Another promising line, justified by the need for users’
acceptance, could regard the design of participatory approaches at diferent involvement
levels and stages of the ML pipeline for detecting risks and harms. Certain biases require
the engagement and the feedbacks of the afected groups to be efectively exposed and
addressed [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>To evaluate both lines of research, we will conduct several experiments in order to demonstrate
the novelty of our approach compared to other SOTA explainers and bias-assessment benchmark
procedures, proving the efectiveness of our framework in unmasking biases in both research
and commercial systems as well as the main benchmarks and gold standard datasets for several
NLP tasks of interest, starting with, but not limited to, abusive language detection. Since state
of the art techniques for dealing with fairness operate mainly on tabular data, we will build
on existing techniques by expanding the approaches toward NLP applications and models that
operate on textual data.</p>
      <p>
        As suggested in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], proposing a contribution within the NLP domain responsibly and
consciously means foremost acknowledging our own biases. This might mean starting by
recognising how most contributions currently reflect a dominant perspective and culture, thus
unconsciously incorporating stereotypes and marginalisation. Furthermore, it is crucial to
overcome the techno-solutionism, being aware that any solely technological solution will be
partial, as not considering the broader socio-political issue that is the source of these biases
means simplifying and “fixing” only on the surface [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We must remember that “resolving
the bias” does not guarantee the ethical use of technology. A systemic approach is necessary,
combined with creating a narrative that avoids misrepresenting and mystifying these complex
socio-technical tools. Regardless, we firmly believe that NLP pipelines need a robust
valuesensitive evaluation in order to assess unintended biases and avoid, as far as possible, explicit
harm or the amplification of pre-existing social prejudices, trying to ultimately build systems
that contribute in a beneficial way to the society and all its citizens.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the European Community Horizon 2020 programme
under the funding scheme ERC-2018-ADG G.A. 834756 XAI: Science and technology for the
eXplanation of AI decision making.
[17] C. Rudin, Stop explaining black box machine learning models for high stakes decisions
and use interpretable models instead, Nature Machine Intelligence 1 (2019) 206–215.
[18] C. Wang, B. Han, B. Patel, F. Mohideen, C. Rudin, In pursuit of interpretable, fair and
accurate machine learning for criminal recidivism prediction, CoRR abs/2005.04176 (2020).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Suresh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Guttag</surname>
          </string-name>
          ,
          <article-title>A framework for understanding unintended consequences of machine learning</article-title>
          ,
          <source>CoRR abs/1901</source>
          .10002 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehrabi</surname>
          </string-name>
          , et al.,
          <article-title>A survey on bias and fairness in machine learning</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2021</year>
          )
          <volume>115</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>115</lpage>
          :
          <fpage>35</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sorensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Thain</surname>
          </string-name>
          , L. Vasserman,
          <article-title>Measuring and mitigating unintended bias in text classification</article-title>
          , in: AIES, ACM,
          <year>2018</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <article-title>A survey of methods for explaining black box models</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>51</volume>
          (
          <year>2019</year>
          )
          <volume>93</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>93</lpage>
          :
          <fpage>42</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Adadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrada</surname>
          </string-name>
          ,
          <article-title>Peeking inside the black-box: A survey on explainable artificial intelligence (XAI)</article-title>
          ,
          <source>IEEE Access 6</source>
          (
          <year>2018</year>
          )
          <fpage>52138</fpage>
          -
          <lpage>52160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          ,
          <source>Artif. Intell</source>
          .
          <volume>267</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ntoutsi</surname>
          </string-name>
          , et al.,
          <article-title>Bias in data-driven artificial intelligence systems - an introductory survey</article-title>
          ,
          <source>Wiley Interdiscip. Rev. Data Min. Knowl. Discov</source>
          .
          <volume>10</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Alikhademi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Drobina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gilbert</surname>
          </string-name>
          ,
          <article-title>Can explainable AI explain unfairness? A framework for evaluating explainable AI</article-title>
          ,
          <source>CoRR abs/2106</source>
          .07483 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2106.07483. arXiv:
          <volume>2106</volume>
          .
          <fpage>07483</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Balagopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hamidieh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hartvigsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rudzicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghassemi</surname>
          </string-name>
          ,
          <article-title>The road to explainability is paved with bias: Measuring the fairness of explanations</article-title>
          ,
          <source>arXiv preprint arXiv:2205.03295</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Orphanou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Otterbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kleanthous</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Batsuren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bogina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Tal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hartman</surname>
          </string-name>
          , T. Kuflik,
          <article-title>Mitigating bias in algorithmic systems-a fish-eye view</article-title>
          ,
          <source>ACM Computing Surveys (CSUR)</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Balkir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kiritchenko</surname>
          </string-name>
          , I. Nejadgholi,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Fraser</surname>
          </string-name>
          ,
          <article-title>Challenges in applying explainability methods to improve the fairness of nlp models</article-title>
          ,
          <source>arXiv preprint arXiv:2206.03945</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pitassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Reingold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <article-title>Fairness through awareness</article-title>
          , in: S. Goldwasser (Ed.),
          <source>Innovations in Theoretical Computer Science</source>
          <year>2012</year>
          , Cambridge, MA, USA, January 8-
          <issue>10</issue>
          ,
          <year>2012</year>
          , ACM,
          <year>2012</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>226</lpage>
          . URL: https://doi.org/10.1145/2090236. 2090255. doi:
          <volume>10</volume>
          .1145/2090236.2090255.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Danilevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aharonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katsis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kawas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <article-title>A survey of the state of explainable AI for natural language processing</article-title>
          , in: K. Wong,
          <string-name>
            <given-names>K.</given-names>
            <surname>Knight</surname>
          </string-name>
          , H. Wu (Eds.), AACL/IJCNLP 2020, Suzhou, China, December 4-
          <issue>7</issue>
          ,
          <year>2020</year>
          , Association for Computational Linguistics,
          <year>2020</year>
          , pp.
          <fpage>447</fpage>
          -
          <lpage>459</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .aacl-main.
          <volume>46</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Blodgett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barocas</surname>
          </string-name>
          , H. D. III,
          <string-name>
            <surname>H. M. Wallach</surname>
          </string-name>
          ,
          <article-title>Language (technology) is power: A critical survey of "bias" in NLP</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Tetreault</surname>
          </string-name>
          (Eds.),
          <source>ACL 2020, Online, July</source>
          <volume>5</volume>
          -
          <issue>10</issue>
          ,
          <year>2020</year>
          , Association for Computational Linguistics,
          <year>2020</year>
          , pp.
          <fpage>5454</fpage>
          -
          <lpage>5476</lpage>
          . URL: https://doi.org/10.18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>485</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . acl-main.
          <volume>485</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Weidinger</surname>
          </string-name>
          , et al.,
          <article-title>Taxonomy of risks posed by language models</article-title>
          ,
          <source>in: 2022 ACM Conference on Fairness, Accountability, and Transparency</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Gilbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kohli</surname>
          </string-name>
          ,
          <article-title>A broader view on bias in automated decisionmaking: Reflecting on epistemology and dynamics</article-title>
          , CoRR abs/
          <year>1807</year>
          .00553 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>