<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>L. McCormack);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ethical AI Governance: Methods for Evaluating Trustworthy AI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Louise McCormack</string-name>
          <email>louise.mccormack@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malika Bendechache</string-name>
          <email>malika.bendechache@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Compostela, Spain</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Research Centre, University of Galway</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Artificial Intelligence (AI)</institution>
          ,
          <addr-line>Trustworthy AI (TAI), Evaluation Methods, AI Ethics, TAI Assessment</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1860</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Trustworthy Artificial Intelligence (TAI) integrates ethics that align with human values, looking at their influence on AI behaviour and decision-making. Primarily dependent on self-assessment, TAI evaluation aims to ensure ethical standards and safety in AI development and usage. This paper reviews the current TAI evaluation methods in the literature and ofers a classification, contributing to understanding self-assessment methods in this field.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Artificial intelligence (AI) is increasingly integrated into numerous sectors, making ethical
considerations and trustworthiness in AI systems more critical than ever. Behavioural science
is utilised to achieve objectives in areas such as climate change mitigation and educational
attainment[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a trend which also extends to Trustworthy AI (TAI). TAI is a crucial concept
within the field of ethical AI, which encompasses the ethical considerations essential in the
development and use of AI systems[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Leading TAI frameworks[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] incorporate behavioural
science principles to ensure AI systems align with human values, considering their impact on
behaviour and decision-making. Additionally, bidirectional human-AI alignment emphasises
aligning AI to human values and enabling humans to adjust to AI advancements cognitively
and behaviourally[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The European Commission Assessment List for Trustworthy AI (ALTAI)[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the European
Union (EU) AI Act[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are essential TAI guidelines, emphasising a human-centred,
interdisciplinary approach. One recommended governance approach is establishing Standard-Setting
Organisations that ensure minimum standards for testing, documentation and public
reporting[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Despite the availability of various standards such as ISO/IEC 42001[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], evaluating and
auditing AI systems remains challenging.
      </p>
      <p>
        Several key surveys, such as those by Liu et al.[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Chamolaetal et al.[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], compile
summaries of existing technical methods and technology in TAI. However, these surveys do not
focus on methods to score the areas of TAI. Ojewale et al.[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] propose a process for AI auditing,
and although this work highlights the need for metrics and standards, it does not delve into the
methods for calculating such metrics.
      </p>
      <p>In this paper, we summarise and propose a classification and sub-classification for existing
methods and systems to govern, evaluate, and score AI systems for trustworthiness aligned
with the interdisciplinary human-centred approach taken by the EU. We also discuss challenges
and future work in this area.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Methodology</title>
      <sec id="sec-3-1">
        <title>2.1. Review Technique</title>
        <p>Our survey was conducted through a Google Scholar query to identify methods used in the
literature for TAI evaluation. In addition, we added articles, regulatory documentation, and ISO
standards in this area through snowballing.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Research Questions</title>
        <p>The following are the identified research questions for this review:
• Q1: What TAI evaluation methods and systems exist in the literature?
• Q2: What barriers to evaluating TAI are highlighted in the literature?</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Research Search and Data Extraction Strategy</title>
        <p>A search string for Google Scholar was designed to capture papers discussing topics in machine
learning, trust and evaluation areas. Two researchers independently screened titles first and the
abstract second to find papers that included TAI evaluation methods, resulting in 380 papers
from the search string and an additional 12 papers through snowballing being reviewed. These
papers were narrowed further, bringing the number of papers contributing to the core findings
to 34. These papers were then summarised by both researchers and used to create a classification
for the TAI evaluation methods.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Methods for Evaluating Trustworthy AI</title>
      <p>In this section, we propose a classification for evaluating and scoring TAI. Of the papers
reviewed, we found several approaches to AI scoring methods that considered various areas
within TAI. Based on maturity and the type of solution proposed, we classed these papers
into four categories: conceptual evaluation methods, Manual evaluation methods, Automated
Evaluation Methods and Semi-Automated Evaluation Methods. In addition to this, we proposed
a sub-classification based on the topic being evaluated. These sub-classifications are fairness
&amp; compliance evaluation, transparency evaluation, risk &amp; accountability evaluation and trust
&amp; safety evaluation. As outlined in Figure 1, the most common approaches are conceptual
approaches, indicating the lack of maturity in this field. This figure also shows the breakdown
of evaluation approaches by topic, particularly the number of automated and semi-automated
evaluation methods already developed in fairness and compliance, one of the more researched
areas of trustworthy AI.</p>
      <sec id="sec-4-1">
        <title>3.1. Conceptual Evaluation Methods</title>
        <p>The existing research includes several high-level governance frameworks that consider multiple
dimensions of trustworthy AI throughout the AI lifecycle. Conceptual evaluation methods are
high-level methods that do not provide implementation details or are not tested and validated.
While conceptual frameworks in the literature can be holistic, they can also lack detail.</p>
        <sec id="sec-4-1-1">
          <title>3.1.1. Fairness &amp; Compliance Evaluation</title>
          <p>
            Several conceptual approaches sought to evaluate and improve fairness and compliance in
AI systems, introducing concepts like policy violation detection[
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], using AI to define
ethical behaviour[
            <xref ref-type="bibr" rid="ref15">15</xref>
            ][
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] and automating fairness auditing[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ][
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. Researchers used a variety
of approaches in deciding what was fair, including incorporating existing established
ethical guidelines[
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], extracting ethical guidelines from social media[
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], using a third-party
regulator[
            <xref ref-type="bibr" rid="ref18">18</xref>
            ],[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] and extracting guidelines from policy documents[
            <xref ref-type="bibr" rid="ref14">14</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3.1.2. Transparency Evaluation</title>
          <p>
            Researchers proposed approaches that included evaluating transparency in areas such as
healthcare[
            <xref ref-type="bibr" rid="ref19">19</xref>
            ] and finance[
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. The proposed framework by Lee[
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] involved scoring fairness and
interoperability, allowing humans to oversee and make conscious choices afecting both. The
approach is context-conscious fairness and considers the trade-of between accuracy and
interpretability and the trade-of between aggregate benefit and inequity. Trade-ofs are benchmarked
to make transparent, context-based, informed choices when using Machine Learning (ML) for
decision-making. Jia et al.[
            <xref ref-type="bibr" rid="ref19">19</xref>
            ] proposed a framework to measure and improve technical
robustness, safety, and transparency. It involved quantifying performance and XAI and establishing a
trade-of between these trust properties for the ML algorithm selection for their healthcare use
case.
          </p>
        </sec>
        <sec id="sec-4-1-3">
          <title>3.1.3. Risk &amp; Accountability Evaluation</title>
          <p>
            Researchers also proposed conceptual governance frameworks that focused on risk management
and accountability. These included ethical AI risk evaluation frameworks that built on the
existing concepts such as operational design domain (ODD)[
            <xref ref-type="bibr" rid="ref21">21</xref>
            ][
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. The importance of defined
safety boundaries was also highlighted[
            <xref ref-type="bibr" rid="ref23">23</xref>
            ][
            <xref ref-type="bibr" rid="ref22">22</xref>
            ].
          </p>
          <p>
            Lu et al.[
            <xref ref-type="bibr" rid="ref24">24</xref>
            ] published a Responsible Artificial Intelligence (RAI) Pattern Catalogue, which
was divided into multi-level governance patterns, trustworthy process patterns, and
RAI-bydesign product patterns, considering stakeholders at the industry, organisation, and team levels.
This is important as researchers have shown engineers, legal experts, and users all require
diferent levels of transparency from AI systems[
            <xref ref-type="bibr" rid="ref25">25</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-1-4">
          <title>3.1.4. Trust &amp; Safety Evaluation</title>
          <p>
            Conceptual evaluation frameworks also addressed trust[
            <xref ref-type="bibr" rid="ref26">26</xref>
            ][
            <xref ref-type="bibr" rid="ref27">27</xref>
            ] and safety[
            <xref ref-type="bibr" rid="ref28">28</xref>
            ]. These
frameworks firstly focused on identifying evaluation criteria or trust risk areas, then on methods to
address these risk areas to improve trust[
            <xref ref-type="bibr" rid="ref26">26</xref>
            ][
            <xref ref-type="bibr" rid="ref27">27</xref>
            ].
          </p>
          <p>
            Fisher et al.[
            <xref ref-type="bibr" rid="ref28">28</xref>
            ] discuss several use cases, focusing on safety-critical domains that require new
standards and verification, validation, and certification methods. They include a classification
of verification methods, including formal exhaustive static methods like model checking and
theorem proving, non-exhaustive dynamic semi-formal methods like runtime verification and
software testing, and non-exhaustive static methods like static analysis. The paper highlights
the dificulty in certifying autonomous systems due to their complexity and evolving nature.
Multiple stakeholder involvement creates complexity in establishing a consensus on acceptable
ethical standards or evaluation criteria that do not disclose sensitive information.
          </p>
          <p>
            Um et al.’s[
            <xref ref-type="bibr" rid="ref26">26</xref>
            ] layered trust framework includes a Trust Agent for data extraction, a Trust
Analysis layer for computing trust metrics, and a Trust Management layer, addressing risk,
fairness, security, design, traceability, data security, data privacy, and data pre-processing.
Broderick et al.[
            <xref ref-type="bibr" rid="ref27">27</xref>
            ] created a taxonomy of trust in AI, which includes a process diagram for
assessing the areas in which trust in ML can fail. They considered real-world use cases for
ifnance, healthcare, and politics and subsequently provided ways to mitigate the risk and
increase trust at each stage. Their conceptual process seeks to assess and mitigate the level of
user trust, specifically the trust of an expert in their field at each stage.
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Manual Evaluation Methods</title>
        <p>
          One method proposed for assessing TAI is a manual questionnaire. Beyond the questions from
the EU ALTAI[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and ISO/IEC standards[
          <xref ref-type="bibr" rid="ref29">29</xref>
          ][
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], six additional questionnaires were identified
to score AI systems for trustworthiness. Manual questionnaires align with this area’s regulation,
considering multiple EU TAI principles. The disadvantage of the manual approach is that these
questionnaires are typically time-consuming. This can lead to business constraints in completing
the questions due to limited information about the external data the systems used[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
        </p>
        <sec id="sec-4-2-1">
          <title>3.2.1. Fairness &amp; Compliance Evaluation</title>
          <p>
            Approaches to improve fairness and achieve compliance in machine learning were proposed
by researchers[
            <xref ref-type="bibr" rid="ref31">31</xref>
            ][
            <xref ref-type="bibr" rid="ref32">32</xref>
            ]. One approach was a practical questionnaire to help improve fairness
by detecting bias[
            <xref ref-type="bibr" rid="ref31">31</xref>
            ]. A second approach to audit and score fairness in ML considered twelve
metrics in this area[
            <xref ref-type="bibr" rid="ref32">32</xref>
            ]. The first six metrics focus on the stages of data collection, model
development, feature selection and model performance—three metrics related to the human
relationship with the model’s decisions or predictions. The final metrics focus on assessing
fairness from a broader social impact and include the three meta-components: cultural context,
respect, and the research design process.
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>3.2.2. Transparency Evaluation</title>
          <p>
            Transparency-focused questionnaires that focused on assessing the transparency of several
TAI principles were also proposed by some researchers[
            <xref ref-type="bibr" rid="ref33">33</xref>
            ][
            <xref ref-type="bibr" rid="ref30">30</xref>
            ][
            <xref ref-type="bibr" rid="ref34">34</xref>
            ]. A notable questionnaire
in the area of transparency is Bommasani et al.[
            <xref ref-type="bibr" rid="ref34">34</xref>
            ] who proposed The Foundation Model
Transparency Index (FMTI), which included 100 indicators for transparency to be self-scored
using a three-tier questionnaire and included benchmarks for leading organisations such as
Open AI, AWS and Meta. Other researchers created separate transparency criteria for diferent
tiers of stakeholders[
            <xref ref-type="bibr" rid="ref33">33</xref>
            ] and proposed using weighted questions using a 3-point scale for each
question[
            <xref ref-type="bibr" rid="ref30">30</xref>
            ]. Transparency was also a consideration by researchers who looked at other areas
such as user trust[
            <xref ref-type="bibr" rid="ref35">35</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-2-3">
          <title>3.2.3. Risk &amp; Accountability Evaluation</title>
          <p>
            For security evaluation, researchers[
            <xref ref-type="bibr" rid="ref36">36</xref>
            ] scored existing questionnaire-based frameworks used
in industry NIST[
            <xref ref-type="bibr" rid="ref37">37</xref>
            ], COBIT[
            <xref ref-type="bibr" rid="ref38">38</xref>
            ], ISO27001[
            <xref ref-type="bibr" rid="ref29">29</xref>
            ], and ISO42001[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] for their potential usage
for AI’s that incorporate Large Language Models (LLMs). Additionally, researchers developed
a framework to evaluate the MITRE ATLAS[
            <xref ref-type="bibr" rid="ref39">39</xref>
            ] framework’s efectiveness in protecting ML
systems from poisoning attacks, scoring multiple TAI principles using a qualitative severity
rating scale[
            <xref ref-type="bibr" rid="ref40">40</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-2-4">
          <title>3.2.4. Trust &amp; Safety Evaluation</title>
          <p>
            Several questionnaire-based papers focused on trust and safety evaluation, typically asking
users about their trust in various AI systems[
            <xref ref-type="bibr" rid="ref41">41</xref>
            ][
            <xref ref-type="bibr" rid="ref42">42</xref>
            ][
            <xref ref-type="bibr" rid="ref35">35</xref>
            ].
          </p>
          <p>
            One approach was a simple unweighted user survey-based questionnaire, which scored
several aspects of TAI evaluation, including intent and limitations, data, explainability, safety
and robustness, audibility, and accountability[
            <xref ref-type="bibr" rid="ref41">41</xref>
            ]. Researchers also developed frameworks
that used surveys to quantify and improve user trust by improving the transparency of the
system[
            <xref ref-type="bibr" rid="ref42">42</xref>
            ][
            <xref ref-type="bibr" rid="ref35">35</xref>
            ]. Both papers successfully indicated a correlation between increased transparency
and increased user trust in AI.
          </p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Automatic Evaluation Methods</title>
        <p>This section includes papers investigating automated scoring methods for TAI Principles.
Automatic methods ensure consistency in evaluation, however they rely on predefined metrics
which do not exist for many aspects of trustworthy AI. The automated methods published to
date are technical methods to evaluate and score the technical aspects of trustworthy AI with
established methods and metrics.</p>
        <sec id="sec-4-3-1">
          <title>3.3.1. Fairness &amp; Compliance Evaluation</title>
          <p>
            Several automated methods published to date are technical methods to evaluate and score
fairness[
            <xref ref-type="bibr" rid="ref43">43</xref>
            ][
            <xref ref-type="bibr" rid="ref44">44</xref>
            ][
            <xref ref-type="bibr" rid="ref45">45</xref>
            ]. Notable methods include using data sampling techniques to measure
and understand root causes of bias[
            <xref ref-type="bibr" rid="ref44">44</xref>
            ] and a sentence-based evaluation that used sentence
likelihood diference (SLD) to calculate gender bias in LLMs[
            <xref ref-type="bibr" rid="ref45">45</xref>
            ]. Certification of fairness in
AI systems was also considered by researchers who proposed a standard operating procedure
(SOP) for fairness certification, Fairness Score and Bias Index, noting that diferent metrics
would be needed to score pre-processing and in-processing and that the approach would be
required to vary by use-case[
            <xref ref-type="bibr" rid="ref46">46</xref>
            ]. Researchers found that specific algorithms scored better for
one set of individual features than others, indicating a link between fairness evaluation and
algorithm selection[
            <xref ref-type="bibr" rid="ref47">47</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-3-2">
          <title>3.3.2. Trust &amp; Safety Evaluation</title>
          <p>
            The automated evaluation of trust and safety of AI systems was also considered by
researchers[
            <xref ref-type="bibr" rid="ref48">48</xref>
            ][
            <xref ref-type="bibr" rid="ref43">43</xref>
            ]. Researchers proposed an automated trust scoring process that used machine
learning to develop a trust value for their use case of file sharing in peer-to-peer networks,
automating a process to score the technical safety and likelihood of the file being dangerous[
            <xref ref-type="bibr" rid="ref48">48</xref>
            ].
Additionally, researchers developed a process that combined privacy and fairness evaluation,
scoring both and proposing a trade-of for accuracy for each[
            <xref ref-type="bibr" rid="ref43">43</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Semi-automated Evaluation Methods</title>
        <p>
          This section covers approaches to scoring, which involve automated and manual steps. These
methods are primarily in the area of fairness and compliance. They require a human at some
stage, balancing automation and human eficiency. Researchers have shown the need to tailor
evaluations by using case[
          <xref ref-type="bibr" rid="ref49">49</xref>
          ][
          <xref ref-type="bibr" rid="ref18">18</xref>
          ][
          <xref ref-type="bibr" rid="ref50">50</xref>
          ] and to incorporate considerations such as cultural
diferences in fairness evaluation[
          <xref ref-type="bibr" rid="ref51">51</xref>
          ]. In the case of healthcare, researchers reported that context
was important in fairness evaluation for clinicians, noting a preference for a human-in-the-loop
approach rather than a fully automated system[
          <xref ref-type="bibr" rid="ref52">52</xref>
          ].
        </p>
        <sec id="sec-4-4-1">
          <title>3.4.1. Fairness &amp; Compliance Evaluation</title>
          <p>
            Researchers have proposed several semi-automated evaluation methods for fairness and
compliance in AI[
            <xref ref-type="bibr" rid="ref49">49</xref>
            ][
            <xref ref-type="bibr" rid="ref53">53</xref>
            ][
            <xref ref-type="bibr" rid="ref54">54</xref>
            ][
            <xref ref-type="bibr" rid="ref55">55</xref>
            ][
            <xref ref-type="bibr" rid="ref56">56</xref>
            ]. A number of these frameworks were automated methods of
fairness evaluation combined with a human element to set thresholds or decide trade-ofs
between metrics. One approach included developing transparent processes that mapped trade-ofs
between metrics[
            <xref ref-type="bibr" rid="ref49">49</xref>
            ], while a second involved injecting controls, wrapping existing operations
and extending workflow primitives[
            <xref ref-type="bibr" rid="ref53">53</xref>
            ]. A third method included allowing a human to define
the fairness requirement, specifying assumptions and assertions so that the tester can generate
inputs that satisfy these assumptions and violate assertions[
            <xref ref-type="bibr" rid="ref54">54</xref>
            ]. A semi-automated user-centred
approach to fairness evaluation called FairHIL (Fair Human-in-the-Loop) was developed that
ofers a visual user interface that provides a combination of visualisations including outcome
features, feature intersection and causal graphs to help users identify bias and unfairness[
            <xref ref-type="bibr" rid="ref55">55</xref>
            ].
Users can add labels and adjust the feature weighting to retrain the model until they achieve an
acceptable user fairness outcome. The tool focuses on accessibility and explainability for non-AI
experts. Researchers also evaluated the efects of cultural diferences in users interacting with
the FairHil tool[
            <xref ref-type="bibr" rid="ref56">56</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-4-2">
          <title>3.4.2. Risk &amp; Accountability Evaluation</title>
          <p>
            One paper proposed a semi-automated method for risk evaluation. This structured method
provides an open vocabulary for AI risks (VAIR)[
            <xref ref-type="bibr" rid="ref57">57</xref>
            ], facilitating the automation of AI risk
category identification, a required step for AI assessment in the EU AI Act.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Industry Tools for Evaluating TAI</title>
      <p>
        In addition to the aforementioned academic works in evaluating TAI, various industry tools
are in use today that aim to ensure AI systems adhere to ethical, legal, and performance
standards. The most commonly used tools are manual questionnaire-based tools such as
the ALTAI[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and ISO/IEC 42001[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which rely on self-assessments based on established
principles, aligning with the self-assessment requirements of the EU AI Act[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These tools rely
on human judgment and expert evaluations to identify risks and compliance issues, ensuring
a thorough, albeit time-consuming, evaluation process. These manual methods are often
supplemented by frameworks such as the NIST AI Risk Management Framework[
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], which
provides comprehensive guidelines for assessing safety, fairness, and transparency.
      </p>
      <p>
        Automated assessment tools are becoming increasingly prevalent in the industry due to their
eficiency and scalability. Tools like IBM’s AI Fairness 360 and Microsoft Fairlearn are used
to evaluate AI models for bias, fairness, and transparency without human intervention[
        <xref ref-type="bibr" rid="ref58">58</xref>
        ].
However, these are not accompanied by scientific, peer-reviewed papers evaluating their tools
against the state-of-the-art works in this area[
        <xref ref-type="bibr" rid="ref59">59</xref>
        ]. Johnson et al.[
        <xref ref-type="bibr" rid="ref59">59</xref>
        ] publish an open-source
toolkit called fair kit-learn, which is designed to support engineers in training fair machine
learning models which found a better trade-of between fairness and accuracy than students
using state-of-the-art tools sci-kit-learn and IBM AI Fairness 360[
        <xref ref-type="bibr" rid="ref59">59</xref>
        ].
      </p>
      <p>
        These tools use sophisticated algorithms to identify and mitigate potential issues in AI
systems, providing a scalable solution for large-scale AI deployments. Automated and
semiautomated tools are particularly valuable, ofering continuous monitoring and evaluation,
enabling companies to maintain high standards of trustworthiness as AI systems evolve.
Semiautomated tools such as Amazon SageMaker[
        <xref ref-type="bibr" rid="ref60">60</xref>
        ] combine automated algorithms with human
oversight, ensuring a balance between eficiency and expert insight. Amazon SageMaker has
features and tools that can be used to continuously monitor real-time data, concepts, bias,
and feature attribution drives in models. These tools require human intervention at critical
stages to set parameters and make interpretive decisions, ensuring that ethical and fairness
considerations are adequately addressed.
      </p>
      <p>
        Despite these advantages, recent research has highlighted several challenges practitioners face
when using these tools. Practitioners find it dificult to translate real-world fairness concerns
into quantifiable metrics that these toolkits can assess[
        <xref ref-type="bibr" rid="ref61">61</xref>
        ]. There is also a need for toolkits to
be able to integrate more seamlessly into existing ML pipelines and to provide more guidance
and resources for responsible usage[
        <xref ref-type="bibr" rid="ref61">61</xref>
        ]. Referring specifically to mitigating age bias in job
selection using Microsoft Fairlearn and AI Fairness 360, researchers also found that significant
human efort was required to make these toolkits work efectively to mitigate bias, making
them impractical for usage in real-world applications[
        <xref ref-type="bibr" rid="ref58">58</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>5. Barriers to Trustworthy AI Evaluation</title>
      <p>The complexity required for a complete evaluation of TAI presents several challenges. The
barriers to evaluating TAI found in the literature include the following:</p>
      <sec id="sec-6-1">
        <title>Diversity in Trustworthy AI Evaluation Method Evaluation methods exist for all aspects</title>
        <p>of TAI. However, the more mature areas of TAI have more advanced evaluation methods. For
example, with several established methods, fully automated evaluation methods are available for
fairness evaluation. Areas like risk and safety have some automatic and semi-automatic methods
showing potential for more automation of technical aspects of AI where metrics are available.
Evaluation approaches that considered less researched areas of TAI or holistic methods that
considered multiple areas of TAI were primarily conception or manual methods.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Lack of Standardisation or Metrics for Evaluation Within the various TAI principles,</title>
        <p>there is a lack of consistency across all evaluation methods regarding what was being assessed.
Even in similar industries using similar methods, the evaluation criteria or metrics used for
evaluation were inconsistent. Regardless of the method used, this lack of consistency around
evaluation criteria and metrics is a barrier to TAI evaluation and highlights a need to establish
use case-specific benchmarks and acceptable thresholds for TAI evaluation.</p>
      </sec>
      <sec id="sec-6-3">
        <title>Use Case Specific Evaluation Methods Required Clinicians found that context was</title>
        <p>essential when deciding acceptable evaluations for AI fairness. AI systems are complex, and
their design varies by use case. Due to this complexity, the evaluation method will vary by use
case. For example, evaluating a decision-making AI system requires a diferent approach versus
other AI use cases such as an LLMs.</p>
        <p>
          Human-in-the-Loop is Essential Although some automated methods exist to evaluate
aspects of TAI, the semi-automated evaluation method is preferable if it integrates a
human-inthe-loop. Additionally, due to a lack of maturity in many TAI principles, which have no metrics
or automated methods for evaluation, a manual questionnaire-based stage is required for a
comprehensive TAI evaluation. Additionally, even with more developed TAI principles such as
fairness, a decision must be made manually to decide what is fair for the given use case.
Discrepancies Between Stakeholders Researchers found that diferent stakeholders all
required diferent levels of transparency, meaning diferent methods and criteria for evaluation
may be required for various groups of stakeholders. There are additional discrepancies between
what stakeholders, such as AI and law experts, consider fair and what a layperson considers
fair. There have been some semi-automated approaches to establishing ethical norms that can
include multiple perspectives to combat this. One proposed conceptual method[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] involved
extracting ethics from social media, which humans would then review for evaluation. Another
approach was a semi-automated method[
          <xref ref-type="bibr" rid="ref55">55</xref>
          ] involving the development of a user interface with
TAI metrics agreed upon by the AI developer that enabled human stakeholders to evaluate,
make adjustments and decide trade-ofs between TAI metrics.
        </p>
      </sec>
      <sec id="sec-6-4">
        <title>Auditing and Third Party Accreditation is Required The research showed a need for</title>
        <p>governance in TAI evaluations that involved some form of access to the AI system. Several
researchers who published conceptual governance frameworks proposed the inclusion of a
third-party accreditation body which did this. These bodies would aim to provide the needed
audits and governance for TAI evaluation. The research showed the potential to automate the
audit and certification process for some TAI principles based on agreed metrics and benchmarks.</p>
      </sec>
      <sec id="sec-6-5">
        <title>Fragmented Development and Accountability AI systems built using multiple organisa</title>
        <p>tions, including third-party data providers, face significant evaluation barriers. AI producers
may lack access to necessary information from contributing organisations which they require for
comprehensive TAI evaluations. For example, AI trained on data purchased from a third party
might lack insight into data consent and acquisition processes, hindering thorough evaluation.
In such instances, the AI producer struggles to assume accountability for development steps
outsourced to other entities, making it challenging to perform a complete TAI assessment.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Future Directions for Trustworthy AI Evaluation</title>
      <p>To successfully evaluate TAI, the literature calls for future AI systems to have ongoing
semiautomated evaluation capabilities. Successful prototypes include using transparent or
explainable models, with an interface allowing human decision-making of thresholds, trade-ofs and/or
definitions to be input into the model. This can be done by an expert in the field or a third-party
accreditation body. Universal evaluation criteria and thresholds do not apply from one use case
to the next, meaning that TAI principles would need a specific evaluation criterion for each use
case.</p>
      <p>There is a disconnect between the tools and research in this area. Tools used at the industry
level have typically not been peer-reviewed and, when evaluated by researchers, are insuficient
for comprehensive TAI evaluation versus the state of the art in the literature.</p>
      <p>The findings of this paper have significant implications for AI policy. The research underscores
the necessity for standardised evaluation frameworks to assess the trustworthiness of AI systems.
The current EU approach relies primarily on self-assessment and does not include methods
or evaluation criteria for TAI evaluation, which the literature shows a clear need for. TAI
standards developed by policymakers must be applied across use-case-specific AI applications
to ensure ethical and fair practices. To facilitate comprehensive TAI evaluations for AI systems,
governance frameworks in the literature propose third-party certification and standard methods
and evaluation criteria, including metrics agreed upon by regulatory bodies based on their
industry-specific needs and use cases. There is a disconnect between what policymakers, AI
experts, and a standard non-expert user consider fair, along with diferences based on culture,
showing a need for more input from various laypeople to decide acceptable TAI evaluation
approaches for individual use cases.</p>
      <p>Authors and Afiliations All authors have reviewed and consented to the publication of the
manuscript as presented. This research received partial support from the Science Foundation
Ireland under grants 13/RC/2106P2 (ADAPT) and is co-funded by the European Regional
Development Fund (ERDF).The data employed in this review are sourced from publicly available
materials, including published research articles, ISO standards, books, and openly accessible
databases and industry publications. All sources are duly cited and listed in the reference section
of this paper. No new data was generated or collected specifically for this review.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hallsworth</surname>
          </string-name>
          ,
          <article-title>A manifesto for applying behavioural science</article-title>
          ,
          <source>Nature Human Behaviour</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>310</fpage>
          -
          <lpage>322</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Uslu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Rittichier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Durresi</surname>
          </string-name>
          ,
          <article-title>Trustworthy artificial intelligence: a review, ACM computing surveys (CSUR) 55 (</article-title>
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Floridi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cowls</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beltrametti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chatila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chazerand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dignum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Luetge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Madelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Pagallo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rossi</surname>
          </string-name>
          , et al.,
          <article-title>Ai4people-an ethical framework for a good ai society: opportunities, risks, principles, and recommendations</article-title>
          ,
          <source>Minds and machines 28</source>
          (
          <year>2018</year>
          )
          <fpage>689</fpage>
          -
          <lpage>707</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiebes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sunyaev</surname>
          </string-name>
          , Trustworthy artificial intelligence,
          <source>Electronic Markets</source>
          <volume>31</volume>
          (
          <year>2021</year>
          )
          <fpage>447</fpage>
          -
          <lpage>464</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Uslu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Durresi</surname>
          </string-name>
          ,
          <article-title>Requirements for trustworthy artificial intelligence-a review</article-title>
          ,
          <source>in: Advances in Networked-Based Information Systems: The 23rd International Conference on Network-Based Information Systems (NBiS-2020) 23</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Knearem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Alkiek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ma</surname>
          </string-name>
          , S. Petridis,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qiwei</surname>
          </string-name>
          , et al.,
          <article-title>Towards bidirectional human-ai alignment: A systematic review for clarifications, framework, and future directions</article-title>
          ,
          <source>arXiv preprint arXiv:2406.09264</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>A. HLEG</surname>
          </string-name>
          ,
          <article-title>Assessment list for trustworthy artificial intelligence (altai) for self-assessment</article-title>
          , https://digital-strategy.ec.europa.eu/en/library/ethics-guidelinestrustworthy-ai,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Union</surname>
          </string-name>
          ,
          <article-title>Final draft of the artificial intelligence act as of 2nd february 2024</article-title>
          , https:// artificialintelligenceact.eu/ai-act-explorer/,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Laux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wachter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mittelstadt</surname>
          </string-name>
          ,
          <article-title>Three pathways for standardisation and ethical disclosure by default under the european union artificial intelligence act</article-title>
          .
          <source>ssrn electron j</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>I. O. for Standardization</surname>
          </string-name>
          , the International Electrotechnical Commission, Iso/iec 42001:
          <year>2023</year>
          , information technology -
          <source>artificial intelligence - management system, Standard</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Trustworthy ai: A computational perspective</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Chamola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Hassija</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Sulthana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dhingra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sikdar</surname>
          </string-name>
          ,
          <article-title>A review of trustworthy and explainable artificial intelligence (xai)</article-title>
          ,
          <source>IEEE Access</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ojewale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Steed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vecchione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Birhane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. D.</given-names>
            <surname>Raji</surname>
          </string-name>
          ,
          <article-title>Towards ai accountability infrastructure: Gaps and opportunities in ai audit tooling</article-title>
          ,
          <source>arXiv preprint arXiv:2402.17861</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Vishwakarma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Ramamurthy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Wei</surname>
          </string-name>
          ,
          <article-title>An end-to-end machine learning pipeline that ensures fairness policies</article-title>
          ,
          <source>arXiv preprint arXiv:1710.06876</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Buenfil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Abruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Korpela</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence ethics: Governance through social media, in: 2019 IEEE international symposium on technologies for homeland security (HST)</article-title>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Umbrello</surname>
          </string-name>
          , I. Van de Poel,
          <article-title>Mapping value sensitive design onto ai for social good principles</article-title>
          ,
          <source>AI and Ethics</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <fpage>283</fpage>
          -
          <lpage>296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          , S. Kim, Y.-s. Lim,
          <article-title>Fairness audit of machine learning models with confi@inproceedingstial computing</article-title>
          ,
          <source>in: Proceedings of the ACM Web Conference</source>
          <year>2022</year>
          ,
          <year>2022</year>
          , pp.
          <fpage>3488</fpage>
          -
          <lpage>3499</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>D. van de Sande</surname>
            , J. van Bommel,
            <given-names>E. Fung Fen</given-names>
          </string-name>
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Gommers</surname>
            ,
            <given-names>M. E. van Genderen</given-names>
          </string-name>
          ,
          <article-title>Algorithmic fairness audits in intensive care medicine: artificial intelligence for all?</article-title>
          ,
          <source>Critical Care</source>
          <volume>26</volume>
          (
          <year>2022</year>
          )
          <fpage>315</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>McDermid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lawton</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Habli</surname>
          </string-name>
          ,
          <article-title>The role of explainability in assuring safety of machine learning in healthcare</article-title>
          ,
          <source>IEEE Transactions on Emerging Topics in Computing</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>1746</fpage>
          -
          <lpage>1760</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M. S. A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Context-conscious fairness in using machine learning to make decisions</article-title>
          ,
          <source>AI</source>
          Matters
          <volume>5</volume>
          (
          <year>2019</year>
          )
          <fpage>23</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Roski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Maier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vigilante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Kane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Matheny</surname>
          </string-name>
          ,
          <article-title>Enhancing trust in ai through industry self-governance</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>28</volume>
          (
          <year>2021</year>
          )
          <fpage>1582</fpage>
          -
          <lpage>1590</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mattioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sohier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delaborde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Amokrane-Ferka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awadid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chihani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khalfaoui</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Pedroza,</surname>
          </string-name>
          <article-title>An overview of key trustworthiness attributes and kpis for trusted ml-based systems engineering</article-title>
          ,
          <source>AI and Ethics</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>G.</given-names>
            <surname>Stettinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Weissensteiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khastgir</surname>
          </string-name>
          ,
          <article-title>Trustworthiness assurance assessment for high-risk ai-based systems</article-title>
          , IEEE Access (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Whittle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zowghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jacquet</surname>
          </string-name>
          ,
          <article-title>Responsible ai pattern catalogue: A collection of best practices for ai governance and engineering</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>T. Van Nuenen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ferrer</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Such</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Coté</surname>
          </string-name>
          ,
          <article-title>Transparency for whom? assessing discriminatory artificial intelligence</article-title>
          ,
          <source>Computer</source>
          <volume>53</volume>
          (
          <year>2020</year>
          )
          <fpage>36</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>T.-W. Um</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Trust management for artificial intelligence: A standardization perspective</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <fpage>6022</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Broderick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <article-title>Toward a Taxonomy of Trust for Probabilistic Machine Learning</article-title>
          ,
          <source>Science Advances</source>
          <volume>9</volume>
          (
          <year>2023</year>
          )
          <article-title>eabn3999</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fisher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mascardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Y.</given-names>
            <surname>Rozier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-H.</given-names>
            <surname>Schlinglof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Winikof</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Yorke-Smith, Towards a framework for certification of reliable autonomous systems</article-title>
          ,
          <source>Autonomous Agents and Multi-Agent Systems</source>
          <volume>35</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>I. O. for Standardization</surname>
          </string-name>
          , Iso/iec 27001:
          <article-title>Information technology - security techniques - information security management systems - requirements,</article-title>
          <string-name>
            <surname>Standard</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Jaramillo-Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Oala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Gröschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bierwirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Balachandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Werneck-Leite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lippert</surname>
          </string-name>
          ,
          <article-title>Piloting a survey-based assessment of transparency and trustworthiness with three medical ai tools</article-title>
          ,
          <source>in: Healthcare</source>
          , volume
          <volume>10</volume>
          ,
          <string-name>
            <surname>MDPI</surname>
          </string-name>
          ,
          <year>2022</year>
          , p.
          <year>1923</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.-G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Roh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Whang</surname>
          </string-name>
          ,
          <article-title>Machine learning robustness, fairness, and their convergence</article-title>
          ,
          <source>in: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery &amp; data mining</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4046</fpage>
          -
          <lpage>4047</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>R. N.</given-names>
            <surname>Landers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Behrend</surname>
          </string-name>
          ,
          <article-title>Auditing the ai auditors: A framework for evaluating fairness and bias in high stakes ai predictive models</article-title>
          .,
          <source>American Psychologist</source>
          <volume>78</volume>
          (
          <year>2023</year>
          )
          <fpage>36</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Chaudhry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cukurova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Luckin</surname>
          </string-name>
          ,
          <article-title>A transparency index framework for ai in education</article-title>
          ,
          <source>in: International Conference on Artificial Intelligence in Education</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Klyman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Longpre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kapoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Maslej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Liang,
          <article-title>The foundation model transparency index</article-title>
          ,
          <source>arXiv preprint arXiv:2310.12941</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Daly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Alkan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mattetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cornec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Knijnenburg</surname>
          </string-name>
          ,
          <article-title>Building trust in interactive machine learning via user contributed interpretable rules</article-title>
          ,
          <source>in: 27th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>537</fpage>
          -
          <lpage>548</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>T. R. McIntosh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Susnjak</surname>
            , T. Liu,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Watters</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Nowrozy</surname>
            ,
            <given-names>M. N.</given-names>
          </string-name>
          <string-name>
            <surname>Halgamuge</surname>
          </string-name>
          , From cobit to iso 42001:
          <article-title>Evaluating cybersecurity frameworks for opportunities, risks, and regulatory compliance in commercializing large language models</article-title>
          ,
          <source>Computers &amp; Security</source>
          (
          <year>2024</year>
          )
          <fpage>103964</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <article-title>National Institute of Standards and Technology (NIST), Framework for Improving Critical Infrastructure Cybersecurity</article-title>
          ,
          <source>Tech Report, National Institute of Standards and Technology (NIST)</source>
          ,
          <year>2014</year>
          . URL: https://nvlpubs.nist.gov/nistpubs/CSWP/NIST.CSWP.
          <volume>04162018</volume>
          .pdf, https://nvlpubs.nist.gov/nistpubs/CSWP/NIST.CSWP.
          <volume>04162018</volume>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>[38] Information Systems Audit and Control Association (ISACA)</source>
          ,
          <article-title>COBIT 2019 Framework: Governance and Management Objectives</article-title>
          ,
          <string-name>
            <surname>ISACA</surname>
          </string-name>
          ,
          <year>2018</year>
          . URL: https://www.isaca.org/ bookstore/cobit/.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>MITRE</given-names>
            <surname>Corporation</surname>
          </string-name>
          ,
          <article-title>Mitre atlas</article-title>
          , https://atlas.mitre.org/,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wymberry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jahankhani</surname>
          </string-name>
          ,
          <article-title>An approach to measure the efectiveness of the mitre atlas framework in safeguarding machine learning systems against data poisoning attack</article-title>
          ,
          <source>in: Cybersecurity and Artificial Intelligence: Transformational Strategies and Disruptive Innovation</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dvorak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schibel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tribelhorn</surname>
          </string-name>
          ,
          <article-title>Towards evaluating ethical accountability and trustworthiness in ai systems</article-title>
          ,
          <source>Journal of Computing Sciences in Colleges</source>
          <volume>37</volume>
          (
          <year>2021</year>
          )
          <fpage>11</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>J.</given-names>
            <surname>Druce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Harradon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tittle</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence (xai) for increasing user trust in deep reinforcement learning driven autonomous systems</article-title>
          ,
          <source>arXiv preprint arXiv:2106.03775</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>M. M. Khalili</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Abroshan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sojoudi</surname>
          </string-name>
          ,
          <article-title>Improving fairness and privacy in selection problems</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>35</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>8092</fpage>
          -
          <lpage>8100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>J.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Developing a novel fair-loan-predictor through a multi-sensitive debiasing pipeline: Dualfair</article-title>
          , arXiv preprint arXiv:
          <volume>2110</volume>
          .08944 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>K. G. BARZA</given-names>
            ,
            <surname>TOWARDS A ROBUST GENDER BIAS EVALUATION IN</surname>
          </string-name>
          <string-name>
            <surname>NLP</surname>
          </string-name>
          ,
          <source>Ph.D. thesis</source>
          , American University of Beirut,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <article-title>Fairness score and process standardization: framework for fairness certification in artificial intelligence systems</article-title>
          ,
          <source>AI and Ethics</source>
          <volume>3</volume>
          (
          <year>2023</year>
          )
          <fpage>267</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <surname>M. Z. Alam</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Rahman</surname>
          </string-name>
          ,
          <article-title>A random forest based predictor for medical data classification using feature ranking</article-title>
          ,
          <source>Informatics in Medicine Unlocked</source>
          <volume>15</volume>
          (
          <year>2019</year>
          )
          <fpage>100180</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kurdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Altoaimy</surname>
          </string-name>
          ,
          <article-title>A neural network-based trust management system for edge devices in peer-to-peer networks</article-title>
          .,
          <string-name>
            <surname>Computers</surname>
          </string-name>
          ,
          <source>Materials &amp; Continua</source>
          <volume>59</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>M. S. A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Floridi</surname>
          </string-name>
          ,
          <article-title>Algorithmic fairness in mortgage lending: from absolute conditions to relational trade-ofs</article-title>
          ,
          <source>Minds and Machines</source>
          <volume>31</volume>
          (
          <year>2021</year>
          )
          <fpage>165</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <article-title>Trade-ofs between fairness, interpretability, and privacy in machine learning</article-title>
          ,
          <source>Master's thesis</source>
          , University of Waterloo,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Pagano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Loureiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. V.</given-names>
            <surname>Lisboa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Peixoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Guimarães</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. O.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Araujo</surname>
            ,
            <given-names>L. L.</given-names>
          </string-name>
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Oliveira</surname>
          </string-name>
          , et al.,
          <article-title>Bias and unfairness in machine learning models: a systematic review on datasets, tools, fairness metrics, and i@inproceedingstification and mitigation methods</article-title>
          ,
          <source>Big data and cognitive computing 7</source>
          (
          <year>2023</year>
          )
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nadal</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Doherty, Integrating fairness in the software design process: An interview study with hci and ml experts</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>29296</fpage>
          -
          <lpage>29313</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>N.</given-names>
            <surname>Antunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Balby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Figueiredo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lourenco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Meira</surname>
          </string-name>
          , W. Santos,
          <article-title>Fairness and transparency of machine learning for trustworthy cloud services</article-title>
          ,
          <source>in: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>188</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wehrheim</surname>
          </string-name>
          ,
          <article-title>Automatic fairness testing of machine learning models</article-title>
          ,
          <source>in: Testing Software and Systems: 32nd IFIP WG 6</source>
          .1 International Conference, ICTSS 2020, Naples, Italy, December 9-
          <issue>11</issue>
          ,
          <year>2020</year>
          , Proceedings 32, Springer,
          <year>2020</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>271</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nakao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Strappelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stumpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Naseer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Regoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Gamba</surname>
          </string-name>
          ,
          <article-title>Towards responsible ai: A design space exploration of human-centered artificial intelligence user interfaces to investigate fairness</article-title>
          ,
          <source>International Journal of Human-Computer Interaction</source>
          <volume>39</volume>
          (
          <year>2023</year>
          )
          <fpage>1762</fpage>
          -
          <lpage>1788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>S.</given-names>
            <surname>Stumpf</surname>
          </string-name>
          , E. Taka,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nakao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sonoda</surname>
          </string-name>
          , T. Yokota,
          <article-title>The need for user-centred assessment of ai fairness and correctness</article-title>
          ,
          <source>in: Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>523</fpage>
          -
          <lpage>527</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>D.</given-names>
            <surname>Golpayegani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Pandit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <article-title>To be high-risk, or not to be-semantic specifications and implications of the ai act's high-risk ai applications and harmonised standards</article-title>
          ,
          <source>in: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>905</fpage>
          -
          <lpage>915</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>C.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <article-title>Mitigating age biases in resume screening ai models</article-title>
          ,
          <source>in: The International FLAIRS Conference Proceedings</source>
          , volume
          <volume>36</volume>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          [59]
          <string-name>
            <given-names>B.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , J. Bartola,
          <string-name>
            <given-names>R.</given-names>
            <surname>Angell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Witty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Giguere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Brun</surname>
          </string-name>
          ,
          <article-title>Fairkit, fairkit, on the wall, who's the fairest of them all? supporting fairness-related decision-making</article-title>
          ,
          <source>EURO Journal on Decision Processes</source>
          <volume>11</volume>
          (
          <year>2023</year>
          )
          <fpage>100031</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nigenda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Karnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Zafar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramesha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Donini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kenthapadi</surname>
          </string-name>
          ,
          <article-title>Amazon sagemaker model monitor: A system for real-time insights into deployed machine learning models</article-title>
          ,
          <source>in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3671</fpage>
          -
          <lpage>3681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>W. H.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yildirim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eslami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Holstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Madaio</surname>
          </string-name>
          ,
          <article-title>Investigating practices and opportunities for cross-functional collaboration around ai fairness in industry practice</article-title>
          ,
          <source>in: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>705</fpage>
          -
          <lpage>716</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>