<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The COMPAS case: an educational journey for explaining fairness in AI-based applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Rodà</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padova</institution>
          ,
          <addr-line>via Gradenigo 6a, 35131, Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article explores the COMPAS (Correctional Ofender Management Profiling for Alternative Sanctions) case as an invaluable educational resource for elucidating the complexities of fairness assessments in real-world AI applications. The controversial history of this high-profile legal and media case is briefly reviewed, highlighting the multifaceted nature of algorithmic bias in criminal justice. By examining the various analytical approaches proposed to address the COMPAS algorithm's fairness, this paper underscores the limitations and challenges inherent in each methodology. The case study serves as a compelling illustration of the intricate balance between statistical accuracy, social justice, and ethical considerations in AI-driven decision-making systems. Through this analysis, the article aims to provide students and practitioners with a nuanced understanding of the practical dificulties in achieving and measuring fairness in AI applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fairness</kwd>
        <kwd>Algorithmic Bias</kwd>
        <kwd>Criminal Justice</kwd>
        <kwd>Education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The COMPAS (Correctional Ofender Management Profiling for Alternative Sanctions) case has been
extensively documented and analyzed from various perspectives, generating a substantial body of
literature (see e.g., [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6">1, 2, 3, 4, 5, 6</xref>
        ]). It has become a focal point in discussions about algorithmic fairness,
transparency in artificial intelligence (AI) systems, and the ethical implications of using predictive tools
in the criminal justice system.
      </p>
      <p>
        In this article, I aim to look at this case from another point of view: the pedagogical value of the
COMPAS case study, as an invaluable educational resource for elucidating the complexities of fairness
assessments in real-world AI applications. This work stems from the experience of teaching a course
titled Gender Knowledge and Ethics in Artificial Intelligence [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which has been active for a few
years at the University of Padua. I will present materials and an analytical framework that considers
the viewpoints of the key stakeholders (ProPublica, Northpointe, and the Wisconsin Supreme Court),
critically reviewing their conclusions. This approach is designed for students of a bachelor degree in
computer science/engineering and aims to systematically deconstruct hasty judgments, encouraging
students to exercise critical thinking skills. The ultimate goal is to foster in students an awareness that
only a perspective that embraces the complexity of real-world problems can lead to robust analyses that
are less prone to error. By engaging with the multifaceted nature of the COMPAS controversy, students
will develop a nuanced understanding of the challenges in algorithmic fairness and the importance of
considering multiple dimensions when evaluating AI systems in sensitive domains such as criminal
justice. The original materials created by the author and the links to the main documents cited in this
article are publicly available at the repository https://github.com/aro971/CompasDebunking.
      </p>
      <p>The remainder of this paper is structured as follows. After providing a brief summary of the COMPAS
controversy (Section 2), reviewing the key events and positions taken by the major stakeholders -
ProPublica, Northpointe, and the Wisconsin Supreme Court, I will outlines an educational journey (Section
3) through the COMPAS case, presenting an analytical framework that systematically deconstructs the
arguments and evidence put forth by each party.</p>
    </sec>
    <sec id="sec-2">
      <title>2. A brief summary of the COMPAS controversy</title>
      <p>
        COMPAS is a risk assessment tool developed by Northpointe (now Equivant) to assist courts in making
decisions about pretrial release, sentencing, and parole [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. COMPAS uses an algorithm to predict an
ofender’s likelihood of recidivism based on various factors, including criminal history, substance abuse,
and social-familial risk [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In May 2016, the investigative journalism organization ProPublica published a groundbreaking article
titled "Machine Bias" [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This exposé alleged that the COMPAS algorithm was biased against African
American defendants, claiming that the tool was almost twice as likely to falsely label black defendants
as future criminals compared to white defendants.
      </p>
      <p>
        In response to ProPublica’s allegations, Northpointe issued a technical report challenging ProPublica’s
methodology and conclusions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The company argued that their tool satisfied statistical fairness
criteria, specifically emphasizing equal positive and negative predictive values across racial groups.
      </p>
      <p>
        The controversy surrounding COMPAS reached a critical point in the case of Loomis v. Wisconsin in
2016. Eric Loomis, sentenced partly based on his COMPAS score, challenged the use of the algorithm
in sentencing, arguing that a) it violated due process and b) relied on gender-based assessments. In
particular, the appellant argued that he had been subjected to discrimination on the basis of his gender,
specifically because he is male. In July 2016, the Wisconsin Supreme Court ruled on the Loomis case,
upholding the use of COMPAS in sentencing while also highlighting concerns about its proprietary
nature and potential biases[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The court mandated that judges be informed of the tool’s limitations,
including its potential for gender and racial bias, when considering COMPAS scores in sentencing
decisions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. An educational journey</title>
      <sec id="sec-3-1">
        <title>3.1. Step 1: From Anecdotal Evidence to Statistical Analysis</title>
        <p>The educational pathway begins by examining ProPublica’s exposé on the COMPAS system. Students
are initially presented with a powerful visual juxtaposition, directly taken from the ProPublica’s article1:
a photograph of an African American woman who received a high-risk score from COMPAS, alongside
an image of a Caucasian man who was assigned a low-risk score, although they were both arrested for a
petty theft. This striking contrast serves as a compelling entry point into the discussion of algorithmic
bias. When asked to interpret these images, students often swiftly conclude that they are witnessing
a clear-cut case of racial discrimination. This immediate reaction provides an excellent opportunity
to caution them against drawing hasty conclusions based on anecdotal evidences, no matter how
emotionally impactful it may be.</p>
        <p>To challenge this initial perception, it is crucial to present counterexamples. These might include
cases, extracted from the same database used by ProPublica, of recidivate African Americans who
received low-risk scores, and Caucasians not recidivate who were assigned high-risk scores, as in
Figure 3.1. This balanced approach helps students understand that individual cases, while powerful, can
be misleading when assessing the fairness of complex systems like COMPAS. Through this exercise,
students begin to grasp that determining unfairness in algorithmic decision-making systems requires
more than anecdotal evidence. It necessitates a rigorous statistical approach that can account for the
complexity and scale of the data involved.
1Look at the Vernon Prater and Brisha Borden pictures in
https://www.propublica.org/article/machine-bias-risk-assessmentsin-criminal-sentencing. Accessed on February, 15th 2025.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Step 2: From distribution analysis to societal context</title>
        <p>The educational journey continues by examining the distribution of risk scores assigned by COMPAS,
ranging from 1 (low risk) to 10 (high risk), segregated by race as presented in ProPublica’s article 2. The
visual representation reveals stark diferences between Caucasian and African American populations.
For African Americans, the distribution shows a gradual decline, with approximately 370 individuals
receiving a score of 1 and about 220 receiving a score of 10. In contrast, the distribution for Caucasians
exhibits a steeper decline, starting with around 600 cases scoring 1 and dropping to merely 50 cases
scoring 10. The disparity between these distributions is conspicuous, usually leading students to
conclude that the system appears discriminatory towards African Americans, also from a statistical
point fo view.</p>
        <p>These data provide an opportunity to broaden the students’ perspective. It’s crucial to emphasize that
what was shown aren’t just numbers, but represent real people living within specific social, economic,
and cultural contexts. In this case, the data pertains to individuals in a world still grappling with
significant racial disparities. To contextualize these findings, students are presented with statistics
highlighting racial disparities in the United States, such as those reported by the Economic Policy
Institute (EPI), an independent think tank that researches the impact of economic trends and policies
on working people in the United States. According to the EPI’s study on racial disparities 3
• the median household income in 2023 was $56,490 for black people and $89,050 for white people
• in 2023, 12.6% of black people has less than a high school education, whereas this percentage
drops to 6.1% for white people
• black women are over twice as likely to die from a pregnancy-related cause (49.5 deaths per
100,000 births) as white women (19.0 deaths per 100,000 births)
• over 1,000 out of every 100,000 U.S. residents who are Black or were imprisoned in 2023, more
than four times the proportion related to white U.S. residents (229 out of 100,000)
2https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed on February, 15th 2025.
3https://www.epi.org/publication/disparities-chartbook/</p>
        <p>This broader view encourages students to consider how societal inequities might be reflected in the
risk assessment scores. At this point, an alternative hypothesis can be introduced: the distribution
diferences in the risk scores between Caucasians and African Americans might be due to an actual
higher risk of the latter, suggesting that COMPAS could be accurately performing its intended function.
To test this hypothesis, the analysis must move beyond descriptive statistics. Students are guided to
consider the system’s performance in terms of the errors it commits. This transition sets the stage for a
more nuanced examination of fairness metrics, emphasizing the importance of evaluating algorithmic
systems not just on their outputs, but on their accuracy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Step 3: Unveiling the Complexities of Fairness Metrics</title>
        <p>At this third stage of our educational journey, it becomes crucial to analyze data that reflect COMPAS’s
predictive capabilities. Table 1 presents the confusion matrices separated by race for African Americans
and Caucasians, as reported by ProPublica 4. These data were calculated from a public database 5,
containing 7214 entries with criminal history, jail and prison time, demographics and COMPAS risk
scores for defendants from Broward County in the period 2013-2014. The author has independently
verified the accuracy of ProPublica’s reported values. To limit the complexity of the discussion, which
is of little use for didactic purposes at this stage, I will only use the fairness metrics considered by
ProPublica. An exhaustive treatment of fairness metrics can be proposed to students at a later stage or
as personal study, after properly motivating them on the usefulness of considering multiple metrics.</p>
        <p>In this context, ’recidivated’ refers to individuals who were re-arrested within two years of receiving
their risk score, while ’survived’ indicates those who avoided arrest during the same period. Let’s first
examine the ’survived’ row, representing individuals who "behaved well" and for whom COMPAS was
expected to predict a low risk. The False Positive Rate (i.e. people "unjustly" assigned a high risk) shows
a significant disparity:
• For African Americans: 44.85% of survived individuals (calculated as 805/(805 + 990) × 100)
• For Caucasians: 23.45% of survived individuals (calculated as 349/(349 + 1139) × 100)
Now, let’s turn our attention to the ’recidivated’ row. The False Negative Rate (i.e. people "unjustly"
assigned a low risk) also reveals a notable diference:
• For African Americans: 27.99% of recidivated individuals (calculated as 805/(805 + 990) × 100)
• For Caucasians: 47.72% of recidivated individuals (calculated as 349/(349 + 1139) × 100)
These findings led ProPublica to conclude:
"Black defendants who do not recidivate were nearly twice as likely to be classified by
COMPAS as higher risk compared to their white counterparts (45 percent vs. 23 percent).
4https://www.ProPublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm. Accessed on February, 15th 2025
5https://github.com/propublica/compas-analysis</p>
        <p>White defendants who re-ofended within the next two years were mistakenly labeled low
risk almost twice as often as black re-ofenders (48 percent vs. 28 percent)."</p>
        <p>At this juncture, the data on False Positive Rate (FPR) and False Negative Rate (FNR), as summarized
in the aforementioned quote, seem to leave little doubt about COMPAS’s discriminatory behavior
towards African Americans. However, it is crucial to guide students towards the understanding that
fairness metrics can be deceptive if not fully comprehended in context.</p>
        <p>Specifically, the disparate FPR values between Caucasians and African Americans do not necessarily
imply discrimination. To demonstrate this, we present students with a counter-example, by using a
simplified risk prediction system, which I’ll call MYCOMPAS.</p>
        <p>Consider two populations, labeled ’Green’ and ’Purple’, each comprising 100 individuals. We will
examine the following two scenarios.</p>
        <sec id="sec-3-3-1">
          <title>Scenario 1: Ideal World (No Bias).</title>
          <p>In this scenario, we assume an equal proportion of survived and recidivated people in both populations,
specifically 80 survived and 20 recidivated. MYCOMPAS was designed to behave "perfectly" equitably,
according demographic parity, and assigns a high risk score to 20 individuals, making 5 errors, for both
Green and Purple populations.</p>
          <p>The confusion matrices for this scenario are presented in Table 2. For both populations, we observe
   = 5/(75 + 5) = 6.25%
    = 5/(75 + 5) = 6.25%
as expected given that MYCOMPAS is fair by design and there are no disparities between populations.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Scenario 2: Unequal World.</title>
          <p>In this scenario, we introduce a disparity in the proportion of recidivated people. In particular,
we assume that the Green population has a greater proportion of recidivated people (60 survived, 40
recidivated) in comparison to the Purple one (80 survived and 20 recidivated), i.e. Green people are
more prone to reofend. MYCOMPAS maintains its equitable behavior, assigning a high risk score to 20
individuals and making 5 errors for both populations. The resulting confusion matrices are shown in
Table 3. From these matrices, we derive:
   = 5/(55 + 5) = 8.33%
    = 5/(75 + 5) = 6.25%</p>
          <p>In this second scenario, therefore, we obtained that FPR is higher for Green population than Purple
one. Given that MYCOMPAS is fair by design, it can be concluded that a disparity in FPR does not imply
a discriminatory behavior of the predictive algorithm. Therefore, the diference in FPR arises not from
discriminatory behavior of the algorithm, but from the underlying diferences in the recidivism rates
between the two populations. In other words, FPR depends on the ratio between False Positives and
the number of people who do not commit further crimes (survived). When the latter decreases, FPR
increases.</p>
          <p>Going back to COMPAS now, the higher FPR value in African Americans does not necessarily imply
a higher number of False Positives (which would suggest discrimination against African Americans),
but rather a lower number of survived individuals.</p>
          <p>Using the FPR metric to argue racial discrimination of COMPAS, as done in the ProPublica document,
is therefore fallacious. The picture becomes even more complex when considering the Positive Predicted
Value metric, also correctly reported by ProPublica, although not carefully examined. This metric is
equal to the fraction of recidivated people among those who received a high score. A low value implies
that COMPAS was unnecessarily severe, harming many people with a high risk score who then behaved
well. From Table 1, we obtain:
   = 1369/(805 + 1369) = 0.63
   ℎ = 505/(349 + 505) = 0.59
(1)
(2)
(3)
Green defendants</p>
          <p>Low</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Step 4: From racial bias to gender discrimination</title>
        <p>
          After deconstructing ProPublica’s allegations and questioning their methodology and conclusions, we
turn our attention to the document produced by Northpointe (now Equivant) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This document,
using diferent arguments from those presented in the previous sections, concludes that COMPAS does
not produce discriminatory outputs against African Americans. This document can be presented to
students, noting that its analysis is more comprehensive than ProPublica’s and more rigorous from a
methodological standpoint. At this juncture, many students may concur with the notion that COMPAS
is, on balance, fair.
        </p>
        <p>However, it is crucial to challenge this perception as well, introducing the concept of intersectionality.
Intersectionality, as defined by Kimberlé Crenshaw (1989), refers to the interconnected nature of social
categorizations such as race, class, and gender, regarded as creating overlapping and interdependent
systems of discrimination or disadvantage.</p>
        <p>To avoid overcomplicating the analysis, let’s consider only the gender dimension. Table Z, which
shows confusion matrices derived from the same public database used by ProPublica and Northpointe,
but this time segmented by gender. Recalculating the Positive Predictive Value (PPV) metric for male
and female groups yields the following:</p>
        <p>The resulting values of PPV, that are lower for females than males, indicate that COMPAS was more
severe with women than with men, that is, of all the women who received a high risk score, only 52%
went on to commit further ofenses. Therefore, while ProPublica may not conclusively demonstrate
COMPAS’s discrimination against African Americans, it is also not possible conclude that COMPAS is
simply fair, as asserted by NorthPointe: some metrics, PPV in particular, indicate indeed a potential bias
against women. It is surprising that public debate has focused solely on racial discrimination, while
discrimination against women has been overlooked. This oversight is even more striking considering
that one of the contentions in the Loomis v. Wisconsin case was alleged discrimination against the
appellant as a male. Paradoxically, our data analysis shows the opposite - females appear to receive
"harsher" treatment from the COMPAS system.</p>
        <p>These findings underscore the necessity for considering diferent (possible all) social categorizations
in fairness analysis. Possibly adopting an intersectional approach that considers multiple dimensions of
identity simultaneously. The intersectional perspective is crucial for a comprehensive understanding of
algorithmic fairness, particularly in complex systems like COMPAS that have far-reaching implications
in the criminal justice system.</p>
        <p>While the debate surrounding COMPAS has primarily centered on racial bias, our analysis reveals a
more nuanced picture of potential discrimination.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Step 5: From fairness metrics to human-AI interaction</title>
        <p>After critically discussing both ProPublica’s document, which aimed to demonstrate racist behavior in
COMPAS, and Northpointe’s assertion of non-discrimination, further useful discussions with students
can arise by analyzing the Wisconsin Supreme Court’s ruling in the Loomis case. The court held that the
use of the COMPAS recidivism risk calculation software by an ordinary court in criminal proceedings
does not violate the defendant’s right to due process, thus siding with Northpointe. However, it is
interesting to cite the passage where the judges are keen
"to clarify that while our holding today permits a sentencing court to consider COMPAS, we
do not conclude that a sentencing court may rely on COMPAS for the sentence it imposes."
The judges consider this distinction so important that they further specify:
"Contrary to the manner in which the majority opinion sometimes employs ’consider’ and
’rely’, they are not interchangeable. ’Rely’ is defined as ’to be dependent’ or ’to place full
confidence’. Therefore, to permit circuit courts to rely on COMPAS is to permit circuit
courts to depend on COMPAS in imposing sentence. On the other hand, ’consider’ is
defined as ’to observe’ or ’to contemplate’ or ’to weigh’."</p>
        <p>These sentences allow us to introduce one of the pillars for a responsible use of AI-based systems:
the principle of meaningful human control. This final stage of the educational journey proposed in
this article aims at explaining what this term means, highlighting possible diferent levels of control (a
posteriori evaluation, human intervention in case of anomalous behavior, continuous control, etc.) and
the conditions (first of all explainability) under which control can be considered meaningful.</p>
        <p>
          Moreover, the distinction between ’consider’ and ’rely’ is related to human factors (what happens with
a human in the loop?) and the main biases that characterize human-machine collaboration, foremost
among them being automation bias [
          <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
          ]. Automation bias can be defined as the propensity for
humans to favor suggestions from automated decision-making systems over contradictory information
made without automation, even when the latter is correct. Can we rule out that a judge, dealing
with an overload of decisions to make and sentences to issue, decides to trust (rely) the machine
without exercising an efective critical control? What kind of training and awareness of the machine’s
limitations should a judge have to reduce this risk? What features should the interaction between judge
and machine have to reach this aim?
        </p>
        <p>It is important to note that the distinction between ’consider’ and ’rely’ can be extended to
numerous other applications. For instance, consider the case of generative AI and the recommendation to
consider generated texts without blindly relying on them. This distinction underscores the
importance of maintaining critical thinking and human judgment in the face of AI-generated content or
recommendations.</p>
        <p>The court’s careful distinction between ’considering’ and ’relying’ on COMPAS outputs serves as
a model for how we might approach the integration of AI systems in various domains. It suggests a
balanced approach that leverages the analytical capabilities of AI while preserving the essential role of
human discretion and expertise.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The COMPAS case serves as a compelling illustration of the complexities involved in assessing
algorithmic fairness, particularly when deploying AI systems in high-stakes domains like criminal justice.
Through an educational journey, we have deconstructed the arguments and analyses put forth by
various stakeholders, revealing the limitations and potential pitfalls of relying on any single fairness
metric or analytical approach in isolation.</p>
      <p>The case underscores the importance of adopting a nuanced, intersectional perspective that considers
multiple dimensions of identity and their potential interactions. By examining gender disparities in
addition to racial bias, we uncover potential sources of discrimination that may be obscured when
focusing solely on a single axis of analysis.</p>
      <p>
        Moreover, this case highlights the challenges of simultaneously satisfying diferent notions of
fairness [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], a phenomenon known as the impossibility of fairness. It illustrates the trade-ofs that must
be navigated between statistical accuracy, social justice, and ethical considerations in the design and
deployment of AI systems.
      </p>
      <p>Finally, the educational path traced in this article and the consequent analysis of the COMPAS case
can be the starting point to delve into other aspects only touched upon in this article. Of the many
possible fairness metrics, only those used by ProPublica have been considered here, so a first deepening
concerns the definition of other metrics and their application to the COMPAS case. All data analysis
was performed using the arrest to determine the truth values of the confusion matrix. This is a typical
case of a proxy variable, which replaces the true variable we would like to observe, namely whether the
defendant reofends or not. A second deepening therefore concerns the awareness that some biases can
be introduced by the use of proxy variables instead of unobservable variables. Further discussions may
lead to a analysis of the human biases inherent in decision-making processes and how these change
when the decision is supported by a technological means. Lastly, the COMPAS case ofers numerous
insights to introduce and discuss diferent ethical approaches (ethics of consequences, ethics of duties,
ethics of care, etc.) and its implications in relation to the fundamental human rights.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dressel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Farid</surname>
          </string-name>
          ,
          <article-title>The accuracy, fairness, and limits of predicting recidivism</article-title>
          ,
          <source>Science advances 4</source>
          (
          <year>2018</year>
          )
          <article-title>eaao5580</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Coker</surname>
          </string-name>
          ,
          <article-title>The age of secrecy and unfairness in recidivism prediction</article-title>
          ,
          <source>Harvard Data Science Review</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <article-title>1</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chouldechova</surname>
          </string-name>
          ,
          <article-title>Fair prediction with disparate impact: A study of bias in recidivism prediction instruments</article-title>
          ,
          <source>Big data 5</source>
          (
          <year>2017</year>
          )
          <fpage>153</fpage>
          -
          <lpage>163</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Linhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubert</surname>
          </string-name>
          ,
          <article-title>Code is law: how compas afects the way the judiciary handles the risk of recidivism</article-title>
          ,
          <source>Artificial Intelligence and Law</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Berk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Heidari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jabbari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kearns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Fairness in criminal justice risk assessments: The state of the art</article-title>
          ,
          <source>Sociological Methods &amp; Research</source>
          <volume>50</volume>
          (
          <year>2021</year>
          )
          <fpage>3</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Portela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tolan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karimi-Haghighi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Pueyo</surname>
          </string-name>
          ,
          <article-title>A comparative user study of human predictions in algorithm-supported recidivism risk assessment</article-title>
          ,
          <source>Artificial Intelligence and Law</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Badaloni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodà</surname>
          </string-name>
          , et al.,
          <source>Gender knowledge and artificial intelligence</source>
          ,
          <source>in: Proceedings of the 1st Workshop on Bias</source>
          ,
          <string-name>
            <surname>Ethical</surname>
            <given-names>AI</given-names>
          </string-name>
          ,
          <article-title>Explainability and the role of Logic and Logic Programming</article-title>
          , BEWARE-
          <volume>22</volume>
          , co-located
          <source>with AIxIA</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brennan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dieterich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ehret</surname>
          </string-name>
          ,
          <article-title>Evaluating the predictive validity of the compas risk and needs assessment system</article-title>
          ,
          <source>Criminal Justice and behavior 36</source>
          (
          <year>2009</year>
          )
          <fpage>21</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Northpointe</surname>
          </string-name>
          ,
          <article-title>Compas risk &amp; need assessment system: Selected questions posed by inquiring agencies</article-title>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Angwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mattu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kirchner</surname>
          </string-name>
          ,
          <article-title>Machine bias, in: Ethics of data and analytics</article-title>
          ,
          <source>Auerbach Publications</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>254</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dieterich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mendoza</surname>
          </string-name>
          , T. Brennan,
          <article-title>Compas risk scales: Demonstrating accuracy equity and predictive parity, Northpointe Inc 7 (</article-title>
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. v.</given-names>
            <surname>Loomis</surname>
          </string-name>
          ,
          <article-title>Wisconsin supreme court requires warning before use of algorithmic risk assessments in sentencing</article-title>
          ,
          <source>Harvard Law Review</source>
          <volume>130</volume>
          (
          <year>2017</year>
          )
          <fpage>1530</fpage>
          -
          <lpage>1537</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. L. Cummings</surname>
          </string-name>
          ,
          <article-title>Automation bias in intelligent time critical decision support systems, in: Decision making in aviation</article-title>
          , Routledge,
          <year>2017</year>
          , pp.
          <fpage>289</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>G. Tamburrini,</surname>
          </string-name>
          <article-title>The heuristics gap in ai ethics: Impact on green ai policies and beyond</article-title>
          ,
          <source>Journal of Responsible Technology</source>
          <volume>21</volume>
          (
          <year>2025</year>
          )
          <fpage>100104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Lagioia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rovatti</surname>
          </string-name>
          , G. Sartor,
          <article-title>Algorithmic fairness through group parities? the case of compassapmoc</article-title>
          ,
          <source>AI &amp; SOCIETY</source>
          <volume>38</volume>
          (
          <year>2023</year>
          )
          <fpage>459</fpage>
          -
          <lpage>478</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>