<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Auditing of AI systems through explainability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jorge Vindel-Alfageme</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Antonio Recio-Garcia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>María Belén Díaz-Agudo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Complutense de Madrid (UCM), C/ del Profesor José García Santesmases</institution>
          ,
          <addr-line>9, 28040, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Artificial intelligence (AI) application has expanded widely. One of the fields that has benefited the most from it is healthcare. The complexity of each biological system is to consider when implementing AI. AI models are susceptible to bias. These errors impact people's lives in healthcare. Explainable AI (XAI) methods allow us to delve deeper into AI models to understand them. Therefore, they become a tool in their bias analysis. On the other hand, Case-Based Reasoning (CBR) is a framework based on empirical evidence that allows deciding the optimal solution from experience gained from real cases. It is worth highlighting the importance these systems can have in healthcare, where each patient represents a case, and new patients' analyses would be facilitated by being fitted within its framework. This PhD project focuses on the development of methodologies for auditing healthcare AI systems, with an emphasis on identifying and mitigating biases. Using XAI, combined with CBR, the aim is to create a reusable framework that assesses the fairness of predictive models in medical diagnosis. The work includes the formalization of an ontology to structure risks and solutions, as well as the implementation of a technological platform that integrates validated use cases in healthcare.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Artificial intelligence bias</kwd>
        <kwd>artificial intelligence in healthcare care</kwd>
        <kwd>case-based reasoning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        It is often said that biases associated with artificial intelligence (AI) models are the result of the negative
legacy left by the data used to train the model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, there can be algorithmic biases linked to
the model’s training and the algorithm itself, and not just those due to the input data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These are the
two main sources of bias that we can find in AI models. Thus, these sources of bias are present in AI
models generally applied in AI implementation domains, including healthcare.
      </p>
      <p>To fully understand the specific biases that can cause AI models to sufer from unexpected error rates,
we can associate each type of bias with a data processing stage. In a data analysis workflow, multiple
steps in data processing can be identified. Speaking in the same general terms as above, negative legacy
would result from altered data recording or error-inducing preprocessing, and algorithmic bias would
appear during model training.</p>
      <p>
        Analysis can be considered to begin with the very recording of data using specialized tools. Data
recording gives them a characteristic structure that determines their subsequent processing. During data
recording, there may be underestimation in the sampling of sample classes, underestimation of sample
subgroups depending on the values adopted by some variables (for example, the variable "sex" may
cause a sample group to be underrepresented), or an imbalance between the number of samples from
diferent classes and subgroups. These problems are what can induce a negative legacy in the model,
since it is the data in their raw form that determines biased results. To overcome these drawbacks, tools
such as resampling or synthetic sample generation can be used to help maintain a balance between the
sample types in the data set [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Evaluating the data set in its raw version can serve to detect possible
biases in the model and mitigate their efect through resampling techniques.
      </p>
      <p>To assess the presence of biases associated with the AI model, classification model performance
metrics can be used. These metrics are based on measuring the samples classified by an AI model
according to the following types: true positives (samples correctly classified in a model where there is
a class with a condition diferent from the control or reference class), false positives, true negatives
(samples correctly classified in a model where there is a class with the control or reference class), and
false negatives. These values are typically represented by the following acronyms, respectively: TP, FP,
TN, and FN. Thus, diferent formulas are defined based on these four values, which represent metrics
that speak to the model’s efectiveness.</p>
      <p>There are many defined metrics and some common ones used for model performance evaluation, such
as accuracy (obtained by dividing the number of correctly classified samples by the full sample size),
specificity (the probability that a result is negative, conditioned on the sample being truly negative),
positive predictive value (PPV) (the probability that a sample with a positive result on a test has the
condition being assessed), etc.</p>
      <p>The bias present in the model can be assessed by calculating disparities in error rates. Using the
formula for one of the previous metrics, rates for these metrics can be established to evaluate the
disparity between sample groups, based on the values of these error measures associated with each
of the groups. From the data sets, subgroups of samples can be established according to the criterion
applied by the user. One criterion for assessing bias is grouping samples according to the value of
variables considered sensitive, such as sex (female or male), ethnicity (Caucasian, African, Latino or
Hispanic, Middle Eastern Asian, East Asian, Pacific Islander, etc.), or age. Evaluating the disparity
rates between the selected groups allows us to verify whether the ratio between performance metrics
exceeds a lower (e.g., 0.8) or upper (e.g., 1.25) threshold. This way, you can see if there are subgroups of
samples that tend to be classified better or worse by the model based on the above metrics compared to
a reference subgroup. There are criteria for defining the reference subgroup. For example, the majority
subgroup can be used as the reference group.</p>
      <p>
        The bias assessment of AI models allows us to verify the presence of biases in the models, which
are often fed with data volumes where some sample subgroups are larger than others, or where
the underestimation of some subgroups leads to the extraction of patterns in the data that are not
representative of reality [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        To date, there are multiple studies that have studied the presence of biases in AI, specifically when
applied to the field of healthcare [
        <xref ref-type="bibr" rid="ref10 ref3 ref7 ref8 ref9">3, 7, 8, 9, 10</xref>
        ]. These studies have attempted to identify biases such
as those described above using assessment methods such as disparity calculation, which has revealed
biases associated with the AI classifiers produced. However, as this is a novel area, there is no clear
consensus among the diferent studies on how to classify the types of biases and, therefore, what
would be the appropriate methods to mitigate them accordingly. There is also general talk of a lack of
transparency and standards associated with data recording and publication, to make it easier to detect
biases and correct them by considering both the processing and the origin of the data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>In general, there is a lack of a clear protocol for bias detection and mitigation in data processing
workflows that ultimately produce AI classifiers with applications in healthcare. This protocol would
need to establish what type of input data is being handled, how to detect potential types of bias in
each case (to achieve this, it is important to consider error evaluation metrics and potential disparities
associated with these metrics), and how these can be mitigated after detection. Considering the lack
of consensus among researchers, this area reveals that further work is still needed to clarify these
problems, something that has a clear impact on people’s lives, as it is a healthcare issue.</p>
      <p>Given this problem, the first milestone to be achieved in this doctoral project would be to conduct a
literature review of the state of the art regarding bias detection and mitigation in AI models in healthcare.
After gathering bibliographic information on the screened cases that fit this topic, an ontology could be
designed that would gather information on how to process each data set, considering bias assessment,
until the classification model is generated. This would establish an organized and consistent protocol
for any data set, enabling the design of more accurate and efective AI models.</p>
      <p>Case-Based Reasoning (CBR) is a framework based on empirical evidence that allows for determining
an optimal solution based on the experience gained from the real cases it was defined with. The above
ontology would be represented by a CBR model, where the analyzed datasets define the framework of
the model, which invites us to follow guidelines for analyzing and assessing biases, depending on the
nature of the initial dataset and based on the empirical evidence collected after researching the state of
the art. To date, there is no record of an ontology applied in this area of healthcare knowledge capable
of guiding experts in assessing biases in AI.</p>
      <p>It is important to highlight the importance that CBR models can have in healthcare, where each
patient represents a case, and the analysis of new patients would be facilitated by fitting within this
framework. CBR models would provide a highly suitable system to ensure correct data processing,
given the absence of clear standards.</p>
      <p>Finally, this ontology could be tested with new datasets to verify whether the system is truly capable
of ensuring bias analysis and assessment. This would launch a resource with a significant impact on
the medical field, ensuring the ethical and responsible use of AI tools in research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Research plan</title>
      <p>Considering the PhD plan with the corresponding funding, the objectives described in the introduction
would be carried out over a total of three years. These three years represent a time span that can be
divided into phases linked to milestones to be met throughout the PhD.</p>
      <p>The first phase would correspond to the first year of the PhD. During this phase, the literature related
to the generation of AI models from datasets in the healthcare field for which bias assessment has been
considered would be compiled and reviewed. Based on the compilation of all the cases gathered with
the literature, an ontology for data analysis would be designed to guide the expert-level bias assessment,
implemented according to a CBR model.</p>
      <p>During the second phase of the doctorate, which would correspond to years 2 and 3 of the project,
the CBR model is expected to be tested with new cases, which would represent new discoveries within
the CBR system. Finally, all the findings would be compiled in the final project report.</p>
      <sec id="sec-2-1">
        <title>2.1. Research objectives</title>
        <p>The research objectives are related to elucidating the clear bias detection and mitigation mechanisms
that afect AI models in healthcare, given that no clear standards currently exist. This can be achieved
through empirical evidence reflected in an ontology used to construct a CBR model, which can guide
expert-level analysis of new medical datasets.</p>
        <p>Surrounding this main objective are several other objectives that must be met. For example, it is
necessary to clearly establish the standard structures of the datasets that can be analysed, since, to date,
there is still no well-defined standard for datasets for diferent medical problems. Next, it is important
to understand the diferent stages of data processing and how, throughout them, it is possible to analyse
the presence of biases that will afect the final AI model. Among the available evaluation methods, we
ifnd the disparity rates of performance metrics between user-defined sample subgroups. Another issue
surrounding these performance metrics is their most relevant use, considering the type of dataset being
manipulated. There is no literature clarifying the most appropriate use of each performance metric.
Finally, tasks such as transforming case evidence into a CBR model and ontology would constitute a
paradigm in the field of artificial intelligence and the medical domain, as we have not encountered a
similar system to date. Thanks to the CBR architecture, a tool would be created capable of ensuring
better use of artificial intelligence in the medical field.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Approach / Methodology</title>
        <p>Considering the research plan and the objectives to be carried out, the first part of the project would
consist of preparing a bibliographic review of the state-of-the-art bias assessment in data set analysis in
the healthcare field. To do this, it would first be necessary to determine which article search engines are
most appropriate for conducting this research based on the topic. An advanced search would then be
carried out in the relevant databases to find the raw bibliographic corpus. The articles found would then
be manually screened to finally identify the valuable articles for conducting the bibliographic review.</p>
        <p>During the writing of the bibliographic review, the data analysis would be adapted to each data set
based on the accumulated empirical evidence. This would allow for the establishment of calculations
and tools for assessing the bias that may afect each data set and at each stage. With these tools, analyses
can be performed to understand their scope and thus better understand the possibilities that the final
ontology will cover. Finally, in this first stage, the ontology and fundamental structure of the CBR
model would be designed.</p>
        <p>During the second phase of the doctoral program, the ontology’s principles would be applied to
process new medical datasets to validate the ontology’s usefulness. Based on these new cases, the
ontology may be redesigned to truly represent all possible aspects of dataset evaluation and serve as a
means of providing the best possible support during expert analysis of medical datasets.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Progress summary</title>
      <p>
        A tool has been developed for measuring bias in classification models using the disparity ratios of
diferent metrics called Aequitas [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Aequitas is a tool specifically designed to facilitate the assessment
of bias in the form of disparity ratios, using the classification results of an AI model as input. By forming
subgroups of samples based on the values they adopt for certain attributes considered sensitive, the user
can easily calculate the performance metrics for each subgroup and establish disparity ratios between
them, in order to calculate and diagram the presence of bias, as well as calculate the statistical significance
associated with these disparity ratios. It is an easy-to-use and versatile tool for AI classification models.
Thus, it is considered a valuable method for assessing bias in AI classification models used in the
healthcare domain.
      </p>
      <p>
        Aequitas is a versatile tool that has been studied to understand how it can be applied to medical
datasets. Its simplest use relies on the results of a binary classification from an AI model, which also
includes attributes with values from which new sample subgroups can be established. This approach
allows the user to establish any subgroups and compare the classification rates between them. In the
case of bias, there would be a disparity rate far from the quotient equal to 1. This approach allows
Aequitas to be used with relatively simple tabular datasets, although it can also be applied in a less
straightforward way to evaluate disparity rates, as is the case with image datasets [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>So far, Aequitas has been tested on simple datasets to learn how to calculate disparity rates
between subgroups, how to represent the results in disparity rates and how to calculate their statistical
significance, and how to mitigate model bias by reweighting the model.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and future work</title>
      <p>
        The creation of an ontology and a CBR model that records the instances of datasets and AI models
applied in the medical domain and allows for dataset-specific bias assessment is a paradigm that is
lacking. The lack of transparency in certain datasets [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and the absence of a standard for bias detection
and mitigation currently makes the issue of bias assessment in AI models applied in the medical field
of notable importance. The emergence of calculations such as disparity rates points to some of the
consensus that has been established, although it is still an area in which much work remains to be done,
among other things due to the high direct impact that changes in the healthcare domain have on the
general population. The establishment of a reference framework for bias analysis at the expert level
intercedes in the responsible and ethical use of AI, an area in constant expansion due to its applicability,
which also occurs at a dizzying speed. This project will help shed light on the most appropriate use of
bias assessment in a domain where AI is of enormous importance. The defined ontology can then be
constantly tested on new datasets, further refining it and better adapting it to a constantly evolving
ifeld of knowledge.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Microsoft Translator Service for the purpose
of: assisting translation from Spanish to English. After using this service, the author(s) reviewed and
edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kamishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Akaho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Asoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sakuma</surname>
          </string-name>
          ,
          <article-title>Fairness-Aware Classifier with Prejudice Remover Regularizer</article-title>
          , in: D.
          <string-name>
            <surname>Hutchison</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Kanade</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kittler</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mattern</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <string-name>
            <surname>Mitchell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Naor</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Nierstrasz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pandu Rangan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stefen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sudan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Terzopoulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Tygar</surname>
            ,
            <given-names>M. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vardi</surname>
            , G. Weikum,
            <given-names>P. A.</given-names>
          </string-name>
          <string-name>
            <surname>Flach</surname>
          </string-name>
          , T. De Bie, N. Cristianini (Eds.),
          <source>Machine Learning and Knowledge Discovery in Databases</source>
          , volume
          <volume>7524</volume>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2012</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>50</lpage>
          . URL: http://link.springer.com/10.1007/978-3-
          <fpage>642</fpage>
          -33486-
          <issue>3</issue>
          _3. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -33486-
          <issue>3</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hooker</surname>
          </string-name>
          ,
          <article-title>Moving beyond “algorithmic bias is a data problem”, Patterns 2 (</article-title>
          <year>2021</year>
          )
          <article-title>100241</article-title>
          . URL: https://linkinghub.elsevier.com/retrieve/pii/S2666389921000611. doi:
          <volume>10</volume>
          .1016/j.patter.
          <year>2021</year>
          .
          <volume>100241</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>31</volume>
          (
          <year>2024</year>
          )
          <fpage>1172</fpage>
          -
          <lpage>1183</lpage>
          . URL: https://academic.oup. com/jamia/article/31/5/1172/7634193. doi:
          <volume>10</volume>
          .1093/jamia/ocae060.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Straw</surname>
          </string-name>
          , H. Wu,
          <article-title>Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction</article-title>
          ,
          <source>BMJ Health &amp; Care Informatics</source>
          <volume>29</volume>
          (
          <year>2022</year>
          ). URL: https://informatics.bmj.com/content/29/1/e100457. doi:
          <volume>10</volume>
          .1136/bmjhci-2021-100457.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Seyyed-Kalantari</surname>
          </string-name>
          , G. Liu,
          <string-name>
            <given-names>M.</given-names>
            <surname>McDermott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          , M. Ghassemi,
          <article-title>CheXclusion: Fairness gaps in deep chest X-ray classifiers</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2003</year>
          .00827. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2003</year>
          .
          <volume>00827</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Seyyed-Kalantari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. B. A. McDermott</surname>
            ,
            <given-names>I. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ghassemi</surname>
          </string-name>
          ,
          <article-title>Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations</article-title>
          ,
          <source>Nature Medicine</source>
          <volume>27</volume>
          (
          <year>2021</year>
          )
          <fpage>2176</fpage>
          -
          <lpage>2182</lpage>
          . URL: https://www.nature.com/articles/s41591-021-01595-0. doi:
          <volume>10</volume>
          .1038/s41591-021-01595-0.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nagendran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Lovejoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Gordon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Komorowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Harvey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Topol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P. A.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          , G. S. Collins,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maruthappu</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies</article-title>
          ,
          <source>BMJ</source>
          (Clinical research ed.)
          <volume>368</volume>
          (
          <year>2020</year>
          )
          <article-title>m689</article-title>
          . doi:
          <volume>10</volume>
          .1136/bmj.m689.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sasseville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ouellet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rhéaume</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sahlia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Couture</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Després</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-S.</given-names>
            <surname>Paquette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Darmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bergeron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.-P.</given-names>
            <surname>Gagnon</surname>
          </string-name>
          ,
          <source>Bias Mitigation in Primary Health Care Artificial Intelligence Models: Scoping Review, Journal of Medical Internet Research</source>
          <volume>27</volume>
          (
          <year>2025</year>
          )
          <article-title>e60269</article-title>
          . doi:
          <volume>10</volume>
          .2196/60269.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Perets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stagno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Yehuda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>McNichol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Anthony</given-names>
            <surname>Celi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rappoport</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dorotic</surname>
          </string-name>
          ,
          <article-title>Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias, medRxiv: The Preprint Server for Health Sciences (</article-title>
          <year>2024</year>
          )
          <year>2024</year>
          .
          <volume>04</volume>
          .09.24305594. doi:
          <volume>10</volume>
          .1101/
          <year>2024</year>
          .04.09.24305594.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , H.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <article-title>A scoping review of fair machine learning techniques when using real-world data</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>151</volume>
          (
          <year>2024</year>
          )
          <article-title>104622</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.jbi.
          <year>2024</year>
          .
          <volume>104622</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Daneshjou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rotemberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review</article-title>
          ,
          <source>JAMA dermatology 157</source>
          (
          <year>2021</year>
          )
          <fpage>1362</fpage>
          -
          <lpage>1369</lpage>
          . doi:
          <volume>10</volume>
          .1001/jamadermatol.
          <year>2021</year>
          .
          <volume>3129</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Saleiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kuester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hinkson</surname>
          </string-name>
          , J. London,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anisfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. T.</given-names>
            <surname>Rodolfa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghani</surname>
          </string-name>
          ,
          <source>Aequitas: A Bias and Fairness Audit Toolkit</source>
          ,
          <year>2019</year>
          . URL: http://arxiv.org/abs/
          <year>1811</year>
          .05577. doi:
          <volume>10</volume>
          . 48550/arXiv.
          <year>1811</year>
          .
          <volume>05577</volume>
          , arXiv:
          <year>1811</year>
          .05577.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>