<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>G. R. Marseglia, M. Massaro, et al., Artificial in-
put. Depending on the mode, the risk assessment yields telligence and surgery: ethical dilemmas and open
varying risk levels. In fact, the diference between these issues, Journal of the American College of Surgeons
models lies in the input they provide for assessing likeli-</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Risk-based Approach to Trustworthy AI Systems for Judicial Procedures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Majid Mollaeefar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eleonora Marchesini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Carbone</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silvio Ranise</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics, University of Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fondazione Bruno Kessler, Center for Cybersecurity</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>235</volume>
      <issue>2022</issue>
      <fpage>268</fpage>
      <lpage>275</lpage>
      <abstract>
        <p>In the rapidly evolving landscape of Artificial Intelligence (AI), ensuring the trustworthiness of AI tools deployed in sensitive use cases, such as judicial or healthcare processes, is paramount. The management of AI risks in judicial systems necessitates a holistic approach that includes various elements, such as technical, ethical considerations, and legal responsibilities. This approach should not only involve the application of risk management frameworks and regulations but also focus on the education and training of legal professionals. For this, we propose a risk-based approach designed to evaluate and mitigate potential risks associated with AI applications in judicial settings. Our approach is a semi-automated process that integrates both user (i.e., judge) feedback and technical insights to assess the AI tool's alignment with Trustworthy AI principles.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Judicial AI</kwd>
        <kwd>Risk-aware</kwd>
        <kwd>Trustworthy AI</kwd>
        <kwd>Trustworthiness Risk Assessment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>countability and explainability of AI systems. As these
In recent years, the adoption of Artificial Intelligence (AI) systems become integral to decision-making processes, it
technologies has surged across various industries and is essential to comprehend how they reach their
concludomains. AI systems now play a pivotal role in making sions or recommendations. TAI increases transparency
critical decisions, automating tasks, and augmenting hu- and ofers mechanisms for interpreting the rationale
beman capabilities. However, with the expanding influence hind AI-generated decisions, allowing users and
stakeand complexity of AI, it is crucial to ensure the develop- holders to hold systems accountable. Cobianchi et al. [2]
ment and deployment of Trustworthy AI (TAI) systems. emphasize the importance of accountability, technical
roTAI encompasses the creation and implementation of AI bustness, and transparency in AI applications in surgery,
technologies adhering to a set of principles that promote which can be extended to other domains. Third, TAI
transparency, fairness, accountability, and robustness. By aids in mitigating risks associated with AI technologies.
designing TAI systems, the aim is to inspire trust among If developed or deployed irresponsibly, AI systems can
users, stakeholders, and society as a whole where these introduce numerous risks, including privacy breaches,
systems must operate reliably, ethically, and in a man- biased decision-making, safety concerns, and the
perpetner that respects fundamental rights and values. The uation of social inequalities. Addressing these risks is
significance of TAI cannot be overstated, as it has the vital to protect individuals, organizations, and society
potential to address pressing concerns that arise from from potential harm and adverse consequences.
increasing reliance on AI systems. Some notable rea- The AI Act draft proposal for a Regulation1 of the
Eurosons why it is critical for AI systems to be designed with pean Parliament and of the Council laying down
harmotrustworthiness in mind including the following three; nized rules on AI represents the first attempt to enact a
First, TAI cultivates user confidence and trust by ensur- horizontal AI regulation. This proposed legal framework,
ing that personal data is handled responsibly, decisions focusing specifically on the use of AI systems, advocates
made by AI systems are fair and unbiased, and privacy for a technology-neutral definition of AI systems in EU
is protected. This is critical for building user confidence legislation. It emphasizes a risk-based approach where
and trust in AI systems. The authors in [1] discuss the AI systems are classified with varying obligations
protheoretical framework of AI trustworthiness, including portional to their level of risk. The AI Act categorizes
aspects of privacy preservation and fairness, which are risks into four levels: minimal, limited, high, and
unackey to fostering user trust. Second, TAI bolsters the ac- ceptable (i.e., the latter are not permitted to be sold on
the EU market). It focuses on high-risk AI applications
(HRAI) by setting specific requirements and obligations
for both users and providers of these applications. This
includes a conformity assessment before market
place</p>
      <p>1https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=
CELEX:52021PC0206
developing techniques for better human understanding
of ML-generated algorithms. The choice between
traditional and modern methods depends on the specific
application’s needs, including considerations of security
and trustworthiness. An efective risk analysis is crucial
in determining the suitability of an AI-produced
algorithm for a given scenario.
ment or service commencement, enforcement measures
post-market placement, and a governance structure at
both European and national levels. The aim is to ensure
that obligations are aligned with the associated risk level
of each AI system.</p>
      <p>One of the areas where AI holds a sensible impact is in
the legal context, where for instance judges can benefit
from the presence of automated decision-making in
judicial proceedings [3, 4], potentially reducing the efort
required to search through documents, seek out relevant
legal provisions, or support them in complex cases where
the human capacity to detect patterns is limited [5]. AI
tools like ChatGPT, while useful, present several
limitations in legal contexts. They may produce inaccurate
information, as demonstrated in cases like Roberto Mata
vs Avianca2, where reliance on ChatGPT led to legal
issues due to the citation of non-existent cases. This
stresses the necessity for legal professionals, particularly
judges, to be acutely aware of the risks associated with
their use of HRAI Systems. In this paper, we introduce a
risk-based approach designed to evaluate and mitigate
potential risks associated with the trustworthiness of AI
applications in judicial settings.
2.2. Trustworthy AI
Trustworthiness is a prerequisite for people and
societies to develop, deploy and use AI systems. Without
AI systems—and the human beings behind them—being
demonstrably worthy of trust, unwanted consequences
may ensue, and their uptake might be hindered,
preventing the realization of the potentially vast social and
economic benefits that they can bring [ 6]. In the past
few decades, the success of ML has primarily been
evaluated based on its quantitative accuracy, which has made
training AI models much more manageable. Predictive
accuracy has also become the standard measure for
determining the superiority of an AI product. However,
with the widespread use of AI, the limitations of using
accuracy as the sole measurement have become apparent,
as new challenges have arisen, such as malicious attacks
2. Background and the misuse of AI. To address these challenges, the AI
community has recognized that factors beyond accuracy
Below we introduce background information to better need to be considered and improved when building an
perceive the approach. AI system. Recently, a number of enterprises, academia,
2.1. AI Algorithms public sectors, and organizations have identified
princiIn the realm of AI, the development of algorithms falls ples of AI trustworthiness that go beyond accuracy-based
into two primary views: traditional and modern. The measurements [7]. According to [8], the current degree
traditional approach involves human-created models for of trustworthiness of an AI system is dependent on how
specific problems or computations, where a limited set the user perceives its technical characteristics. Various
of features and a fixed sequence of instructions are em- organizations, including the G20, the EU Parliament, the
ployed. This method, exemplified by classical planning General Partnership on AI (GPAI), and the Organisation
in autonomous systems, relies on symbolic representa- for Economic Co-operation and Development3 (OECD)
tions and a predefined set of rules, necessitating heuris- have proposed diferent principles for ensuring
trustwortics to navigate the vast potential state spaces. Despite thiness in AI systems [9]. The OECD, for instance, has
its rigidity, this approach allows for the construction of put forward a set of five principles aimed at promoting
algorithms that are easily understood and verified by hu- TAI: (i) inclusive growth, sustainable development and
mans. Conversely, the modern perspective, dominated well-being, (ii) human-centered values and fairness, (iii)
by Machine Learning (ML), leverages large datasets to transparency and explainability, (iv) robustness, security
generate rules for problem-solving. Through processes and safety, and (v) accountability. The use of AI is
inlike training and deployment, algorithms are formulated tended to promote human good and well-being, and as
to classify or interpret data, such as classifying images such, it should not cause any harm. AI systems must be
of dogs and cats. The ML-based methods benefit from characterized by fairness, accuracy, and reliability, and
the ability to tackle complex problems without extensive should not be discriminatory. To be considered
trustworhuman ingenuity, employing powerful optimization tech- thy, AI systems must be transparent and explainable,
niques. However, it faces challenges such as potential meaning they should have the necessary capabilities,
imprecision, bias in training data, and the complexity of functions, and features to achieve user goals, with their
the resulting algorithms making them dificult for hu- algorithms being easily understood by users.
Additionmans to comprehend. Strategies to mitigate these issues ally, AI systems must be resilient to threats that may
include performance monitoring, dataset filtering, and try to exploit their normal behaviors and turn them into
harmful ones. In the literature, additional principles have
2https://law.justia.com/cases/federal/district-courts/new-york/
nysdce/1:2022cv01461/575368/54/
3https://oecd.ai/</p>
      <p>Unbiasedness
Non-discrimination</p>
      <p>Diversity
Compliance
Auditability
Traceability
been proposed such as accuracy [10], acceptance [11],
predictability and performance [12]. The AI HLEG [6],
has focused on the concept of TAI, ofering guidance
in the form of a framework and identifying seven key
ethical and technical requirements.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Our View on Trustworthiness</title>
      <p>In our analysis of the literature on finding principles of
trustworthiness in AI, the commonly agreed-upon prin- Privacy T
ciples are accuracy, robustness, privacy, explainability, ceh
aarcecowunidtealbyilaitcyk,naonwdlefadigrendesisn. tWhehliilteetrhateusrees,itxheprreinacriepaleds- RSeSesacilfuieertniytcyy Robustness ilacn
ditional considerations that can be incorporated within Integrity
them. For instance, the concept of “human in the loop” Reliability Accuracy
can be viewed as an aspect of fairness. We diferentiate be- Data Validity
tween properties and principles. While both concepts are
related and work together to ensure the overall trustwor- Figure 1: TAI principles and properties relationship.
thiness of AI systems, they represent diferent aspects
of the trustworthiness framework. Properties refer to
specific characteristics or attributes of an AI system that rithms [13]. Diferent AI models exhibit variability in
contribute to ensure a principle. For instance, integrity, how they align with TAI principles. This variation stems
reliability, and data validity can be considered as prop- from the inherent diferences in model structures,
trainerties relevant to the accuracy principle; Integrity refers ing methods, data used, and their intended applications.
to the quality of an AI system being honest, consistent, For example, a model designed for healthcare decision
and maintaining the integrity of the data and algorithms support may prioritize accuracy and privacy, while one
it operates on. It ensures that the AI system is resistant for autonomous vehicles might focus more on safety and
to unauthorized modifications or tampering. Reliability, robustness. The data used to train AI models significantly
focuses on the consistency and dependability of an AI afects their trustworthiness. A model trained on limited
system’s performance. A reliable AI system consistently or biased data may exhibit lower trustworthiness due to
produces accurate results over time and under diferent its potential to generate skewed or unfair results.
Addiconditions. Data validity refers to the quality and correct- tionally, the type of algorithm—whether it is rule-based
ness of the data used by an AI system to generate outputs. or learning-based—plays a crucial role in determining
Valid data ensures that the information processed by the the model’s reliability, fairness, and transparency [13].
AI system is accurate, relevant, and representative of
the problem domain. On the other hand, principles rep- 3.2. Algorithm-based Trustworthiness
resent high-level guidelines or concepts that guide the The relationship between algorithms and TAI principles
development and deployment of TAI systems. The rela- is a critical aspect of responsible AI development and
tionship between properties and principles lies in how deployment. TAI principles serve as benchmarks against
properties contribute to fulfilling the principles. Figure 1 which the performance and ethical considerations of
aldepicts the relationship between properties and six essen- gorithms can be evaluated. Each algorithm has its own
tial principles for TAI, categorized into either technical, set of advantages and limitations that align or conflict
ethical, or both. Accuracy and robustness serve as tech- with these principles, making it essential to investigate
nical principles, whereas fairness and accountability fall their compatibility in specific use cases. Since each
alwithin the ethical domain. Located in the center of the gorithm has a distinct set of characteristics, their
comifgure, privacy, and explainability are unique principles patibility with TAI principles can difer significantly; in
that encompass both the technical and ethical facets. other words, they have diferent compliance levels. To
define Algorithm-based Trustworthiness (ABT) levels, it
3.1. AI Algorithms &amp; Trustworthiness is essential to consider both the inherent characteristics
Trustworthiness in AI is a multifaceted concept, often of each algorithm and the specific attributes related to
seen as a relationship between two entities—the AI sys- each AI principle. We define the following qualitative
tem and its user. The trustworthiness of an AI system is levels for this assessment; High: The algorithm
inherlargely dependent on how it is perceived by the user in ently aligns with the AI principle in question, requiring
terms of its technical characteristics. This perception is minimal or no additional measures to ensure compliance.
influenced by various factors, including the type of AI Moderate: While the algorithm generally aligns with
model, its application context, and the underlying algo- the principle, additional safeguards or contextual
considerations may be necessary. Low: The algorithm poses achieve the same level of accuracy in complex
scenarchallenges or risks that make it dificult to align with ios as their more sophisticated counterparts. On the
the AI principle, and significant adjustments or limita- other hand, SVMs and neural networks, especially in
tions would be required for compliance. To conduct a their advanced forms, are capable of handling complex,
comparison between rule-based and ML-based AI algo- high-dimensional data with greater accuracy but often
rithms, we need to consider some assumptions such as sacrifice explainability, presenting a challenge in
underconsistency of environment (i.e., static or dynamic), the standing the rationale behind their decisions. When it
complexity of problems, availability and quality of data, comes to robustness, SVMs are distinguished by their
risk of bias, need for transparency, and explainability. high resilience, particularly against adversarial attacks,
With these considerations, in our judicial case, we take thanks to their strong generalization capabilities. NNs,
these assumptions; (i) the operational environment for despite their adeptness at complex pattern recognition,
the AI system is dynamic, (ii) the complexity of the prob- exhibit moderate to low robustness and are vulnerable to
lem can be considered as high, (iii) the high quality of adversarial examples, requiring specialized methods like
datasets are available, free of bias and sensitive personal adversarial training to enhance their robustness. DTs
information, and (iv) the explanation of the decisions is ofer a moderate level of robustness, valued more for
required. With these considerations, in the following, their interpretability than their resistance to adversarial
we qualitatively evaluate the compatibility of the two examples, while LR models are less robust, particularly
distinct types of algorithms with TAI principles. in complex datasets and adversarial environments. In
3.2.1. Rule-based AI terms of accountability, LR models excel due to their
These AI systems are perfectly suited to applications that straightforward and transparent nature, which makes
require small amounts of data and simple, straightfor- tracing decisions back to specific data points relatively
ward rules. These algorithms exhibit high accuracy due easy. DTs also score highly in this regard, due to their
to deterministic outcomes from well-defined rules. How- clear decision-making paths. SVMs, particularly with
ever, since the assumption of the operational environ- non-linear kernels, present a more complex picture,
oferment is dynamic and the problem is complex, we consider ing moderate to low accountability due to the intricacies
a moderate level for the accuracy principle. These algo- involved in their decision-making processes. NNs are at
rithms can be very robust if the rules are well-crafted the lower end of the spectrum in terms of accountability,
to handle various edge cases. But they may falter in often described as “black boxes” due to their complex,
layscenarios not covered by the existing rules, therefore, ered structures, although eforts like layer-wise relevance
their robustness can also be considered moderate. These propagation (LRP) and SHAP4 values are employed to
algorithms stand out for their high explainability and enhance their interpretability. The aspects of fairness and
accountability, as their rule-based nature makes them privacy are also pivotal in evaluating the TAI alignment
transparent and easy to understand, even for non-experts. of ML algorithms. The fairness of algorithms such as
LR, DTs, SVMs, and NNs is predominantly governed by
3.2.2. ML-based AI the nature of their training data. Since these algorithms
These AI systems, particularly suited for environments inherently lack bias, any unfairness in decision-making
with abundant data, vary in their alignment with TAI largely stems from biases present in the training data.
principles. For the sake of simplicity, we focus only on This reality highlights the importance of precise data
four key supervised ML models; Linear Regression (LR), collection and processing, ensuring that the data is
repDecision Trees (DT), Support Vector Machines (SVM), resentative and free of biases to maintain fairness in the
and Neural Networks (NNs). LR is chosen for its fun- outcomes. Alongside fairness, privacy considerations in
damental approach to data modeling. DTs ofer a more these algorithms are crucial, yet they are not intrinsic
intricate decision-making structure. SVMs are known to the algorithms themselves. Instead, privacy risks are
for their eficiency in high-dimensional spaces, while closely tied to how the data is handled. Ensuring the
priNNs, especially in deep learning, handle complex tasks vacy and security of data, especially sensitive personal
like image and language processing. These models col- information, is vital, regardless of the algorithm in use.
lectively represent the diverse capabilities of ML and Efective data handling practices, including
anonymizaprovide insights into their trustworthiness in dynamic, tion and secure storage, play a critical role in mitigating
data-intensive scenarios. For accuracy and explainability privacy risks in machine learning applications.
Thereprinciples, there is a notable trade-of observed across fore, in both fairness and privacy, the emphasis shifts
the algorithms. In the literature [14, 15], there has been from the algorithmic design to the careful management
a comprehensive comparison of diferent ML models in of the data they process. In Table 1, we summarized the
terms of their accuracy and explainability level. The ABT levels for rule-based and ML-based algorithms. This
LR and DT algorithms, while ofering high levels of
explainability due to their transparent nature, may not
Table 1 sesses the potential consequences of the principle being
Qualitative comparison between the algorithms and their compromised within the context of the tool’s application.
alignment with TAI principles. Legend; Low, Moderate, High Figure 2 illustrates the proposed approach is organized
TAI Principles Rule-based ML-based (Supervised) sequentially into four steps: Data Collection, Data
ModelAccuracy M LLR DHT SVHM NHNs ing &amp; Analyzing, Risk Evaluation, and Suggestion which
Robustness M L H M M operates in two modes: user-only (M1) or user-plus
develEAxcpcolauinnatabbiliiltiyty HH HH MM LL LL oper (M2). The figure employs a color-coded system to
Privacy Depends on data handling, not inherent to the model. diferentiate between the specific actions and processes
Fairness Depends on the data pipeline. associated with each mode: elements highlighted in blue
pertain to the User, those in green correspond to the
Developer, and the components in black apply to both
comparison, which provides a framework to gauge how modes. Below, we explain each step concisely.
various algorithms align with TAI principles, supports Data Collection. The data collection process is going to
the risk assessment process efectively. In the next sec- be performed by having comprehensive questionnaires
tion, we will propose a risk-based approach, where these that cover multiple factors regarding the development of
comparative insights become a vital factor in evaluating AI tools. Depending on the involvement of the AI
develAI trustworthiness and assessing risk levels. oper, three diferent questionnaires are provided—i.e.,
Q14. The Risk-based Approach TAI Implementation, Q2-Criticality, and Q3-Algorithmic.
Data Modeling &amp; Analysis. The results obtained from
The primary goal of this approach is to support judges the questionnaires in the previous step flow into this
and legal practitioners with a set of best practices when step as essential inputs. Based on the scenario mode, out
utilizing AI tools in their judicial work. This includes pro- of this step, two models can be generated; (i) the Basic
viding them with a clear understanding of the potential model, which considers M1 mode, and (ii) the Advanced
risks associated with these tools and ofering actionable model, which is enriched with the involvement of both
suggestions to mitigate these risks, ensuring responsible the AI developer and the user. The Advanced model
and informed use of AI in legal settings. The approach extends beyond user feedback by integrating technical
is a semi-automated process that requires user interac- insights, allowing for a more intricate analysis of the AI
tion at the beginning of the approach to collect useful tool’s alignment with TAI principles. There are diferent
information about the AI tool. This approach assesses automated processes in this step that are connected to
risks associated with the use of AI tools, focusing on their each obtained response for the questionnaires, namely,
alignment with TAI principles and their role in legal con- CE Assessment (P1), ABT Assessment (P2), Algorithmic
texts. Before diving into the approach, we consider some Estimation (P3), and Criticality Analysis (P4). Below, we
assumptions; (i) the user has some experience using the provide a brief description of each process; P1. This
proAI tool, (ii) the user does not know anything about the cess analyses responses to Q1, determining CE levels for
technical details behind the AI tool, (iii) the user knows each TAI principle. For each principle, specific properties
only about the required input and output. Typically risk are identified (as depicted in Figure 1), with each property
defines as a function of two values Likelihood and Impact being assessed through a series of targeted questions. P2.
(i.e., Risk = (L,I)). Similarly, we formulate the likelihood To conduct this analysis, preliminary we need to identify
as function of two values which are ABT and Control the algorithm used in the AI system. In M2 mode, this
efectiveness (CE), where the ABT refers to the degree identification is straightforward as the developer
specto which the AI tool’s algorithm aligns with TAI prin- ifies the algorithm. In M1 mode, two scenarios arise: if
ciples. It assesses whether the algorithmic design and the tool’s documentation is available and the user can
functionality inherently support or conflict with these specify its algorithm; if not or the user is unable to
specprinciples. For instance, the tool utilized with deep neural ify the algorithm, the user is prompted to complete Q3,
networks has a high level of accuracy in prediction while which is part of the subsequent P3 process. P3. This
their “black-box” nature makes them less explainable (see process performs in the case of M1 mode, which helps us
Table 1). Instead, the CE represents the efectiveness of uncover the algorithm through responding to Q3. The
implemented controls in mitigating risks associated with responses obtained from Q3 determine if the algorithm
the AI tool. For example, strict access controls and log- is rule-based or ML-based. P4. For this analysis, the
ging mechanisms increase condfientiality mitigate the user’s responses to Q2. We made a correlation between
risk to the privacy principle. The combination of these each question in Q2 and TAI principles (they are constant
two values produces the Likelihood level which collec- in our approach), which aids in assessing the extent to
tively evaluates the probability of a TAI principle being which the principles of TAI may be afected in light of
compromised. The Impact measures the criticality of the the specific use-case scenarios provided by the user.
use-case scenario in terms of each TAI principle. It as- Risk Evaluation. In this step, we conduct likelihood
Likelihood
Assessment</p>
      <p>Likelihood levels
Uncover the algorithm</p>
      <p>Risk Profile
Translation</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Risk</surname>
          </string-name>
          levels
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>