=Paper=
{{Paper
|id=Vol-3121/paper16
|storemode=property
|title=The FATE System Iterated: Fair, Transparent and Explainable Decision Making in a Juridical Case
|pdfUrl=https://ceur-ws.org/Vol-3121/paper16.pdf
|volume=Vol-3121
|authors=Maaike H.T. de Boer,Steven Vethman,Roos M. Bakker,Ajaya Adhikari,Michiel Marcus,Joachim de Greeff,Jasper van der Waa,Tjeerd A. J. Schoonderwoerd,Ioannis Tolios,Emma M. van Zoelen,Fieke Hillerström,Bart Kamphorst
|dblpUrl=https://dblp.org/rec/conf/aaaiss/BoerVBAMGWSTZHK22
}}
==The FATE System Iterated: Fair, Transparent and Explainable Decision Making in a Juridical Case==
<pdf width="1500px">https://ceur-ws.org/Vol-3121/paper16.pdf</pdf>
<pre>
The FATE System Iterated: Fair, Transparent and
Explainable Decision Making in a Juridical Case
Maaike H.T. de Boer1 , Steven Vethman1 , Roos M. Bakker1 , Ajaya Adhikari1 ,
Michiel Marcus1 , Joachim de Greeff1 , Jasper van der Waa1,2 ,
Tjeerd A. J. Schoonderwoerd1 , Ioannis Tolios1 , Emma M. van Zoelen1,2 ,
Fieke Hillerström1 and Bart Kamphorst1
1
    TNO, Anna van Buerenplein 1, 2595 DA, The Hague, The Netherlands
2
    Delft University of Technology, Mekelweg 5, 2628 DE, Delft, The Netherlands


                                         Abstract
                                         The goal of the FATE system is decision support with use of state-of-the-art human-AI co-learning,
                                         explainable AI and fair, secure and privacy-preserving usage of data. This AI-based support system is a
                                         general system, in which the modules can be tuned to specific use cases. The FATE system is designed
                                         to address different user roles, such as a researcher, domain expert/consultant and subject/patient, each
                                         with their own requirements. Having examined a Diabetes Type 2 use case before, in this paper we
                                         slightly iterate the FATE system and focus on a juridical use case. For a given new juridical case the
                                         relevant older court cases are suggested by the system. The relevant older cases can be explained using
                                         the eXplainable AI (XAI) module, and the system can be improved based on feedback about the relevant
                                         cases using the Co-learning module through interaction with a user. In the Bias module, the use of
                                         the system is investigated for potential bias by inspecting the properties of suggested cases. Secure
                                         Learning offers privacy-by-design alternatives for functionality found in the aforementioned modules.
                                         These results show how the generic FATE system can be implemented in a number of real-world use
                                         cases. In future work we plan to explore more use cases within this system.

                                         Keywords
                                         FAIR AI, Hybrid AI, Explainable AI, Bias, Secure Learning, Knowledge Engineering, Co-Learning


1. Introduction
More and more AI systems are used in real world cases in all types of domains. Those systems
are often highly specialized towards one user and one specific application. These personalized
systems often work with sensitive data, which makes it essential that they handle data in a
privacy-preserving manner. In our previous paper [1], we proposed the FATE system; a system
that combines state-of-the-art AI tools. The FATE system aims to provide decision support with
AI capabilities in a fair, understandable, trustworthy, controllable and secure manner [2]. The
main areas of research include human-AI co-learning, explainable AI and the fair and secure,
privacy-preserving usage of data. Especially in co-learning and explainable AI, hybrid AI plays
In A. Martin, K. Hinkelmann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.), Proceedings of the AAAI
2022 Spring Symposium on Machine Learning and Knowledge Engineering for Hybrid Intelligence (AAAI-MAKE
2022), Stanford University, Palo Alto, California, USA, March 21–23, 2022.
" maaike.deboer@tno.nl (M. H.T. d. Boer)
 0000-0002-2775-8351 (M. H.T. d. Boer)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: The Iterated FATE system

a big role, as symbolic and sub-symbolic information needs to be combined. The prototype of
the FATE system is set-up in a generic, model-agnostic and modular way, so that it can be used
in different use cases with different users and different instantiations of the modules.
   In this paper, we slightly iterate the FATE system, mainly in the definition of the user roles,
and apply this revised version of the system in a juridical use case in which for one (court) case
the relevant older cases have to be retrieved. The novelty in this paper is mainly the application
of the FATE system on this use case including a storyboard. In the next section, we describe the
revised FATE system. Section 3 provides information about the juridical use case, including
the task and the different user roles. In section 4, we present the research and results for the
research areas. Section 5 contains the conclusions and future work.


2. The Iterated FATE System
Figure 1 shows the FATE system. The largest improvement to its predecessor is the renewed
definition of the user roles. AI systems that provide elaborate interaction with users and that
include aspects of Explainable AI typically differentiate with respect to the user roles they cater
for, such as e.g. [3] and [4]. To provide the various users with the best fitting advice, we follow a
similar approach with the FATE system in which we differentiate on the system’s functionalities
for various user types. In the previous version of the FATE system the users were specified as
AI developer, expert user and lay user. The roles are now redefined as researcher, consultant
and subject to better reflect our experienced usage of decision support systems in practice.
   The domain researcher typically has experience in (data) science and related disciplines. She
wants to learn and obtain knowledge from the relation between the (historical) data of subjects
and a phenomenon of interest. In addition, the researcher is interested in how trustworthy the
system is in practice. As such, the subject-phenomenon relation in the use case should account
for a set of legal, ethical and policy conditions. The researcher balances these context conditions
with the utility in predicting the phenomenon. For the AI-system, the researcher is the only
user-role with the ability to balance these essential conditions, i.e. the settings of the researcher
hold for the consultant and subject as well. In practice, the researcher will do so with the help
of respective legal, ethical, and policy experts.
   The consultant is a domain expert who holds expert knowledge in a particular field. She
wants to advise or intervene on the data subject’s activities or behavior. Based on the subject’s
data and with the help of the system, an individualized outcome of the subject-phenomenon
relation is established. The consultant uses the system to obtain contextual information and
can question the subject for further information. The system provides the consultant with
its confidence and justification regarding advice or interventions targeting the subject. The
purpose of this support is on providing actionable information to the consultant. For instance,
information on how the subject’s behavior should change in order to have another outcome.
   The subject is a “naive” user who is neither schooled in AI and data science, nor holds any
domain knowledge per se. The subject is either subjected to the system’s output or has an
intrinsic interest in this output. As such, she uses the system for her own situation. The system
functions as an online consultant that monitors the subject’s data and generates actionable
advice according to the subject-phenomenon relation model. Personal conditions in relation to
the context conditions of the use case will be communicated when applicable.
   Based on feedback, other changes include that module 2 is now more general data shaping /
pre-processing instead of human-driven data shaping, the arrow from module 5 to 6 is now only
predictions instead of also advice, the arrow from module 7 is now ‘raw’ advice instead of advice
and the output of module 8 is not only pointed towards the subject but also the consultant.
Furthermore, we added in the figure the link to the research areas of XAI, Co-learning, Fairness
and bias and secure learning.


3. Juridical Use Case
For the juridical use case, the FATE system should provide decision support in a court case,
i.e. providing support for a judge, lawyer or defendant. This does not mean that the FATE
system passes a sentence autonomously, but rather it supports a user in the juridical process. In
this process there are multiple roles: the researcher can be working for the public prosecution
scientific office, a law faculty at a university, or an NGO interested in fair judgements, the
consultant can be a lawyer and the subject can be the defendant. These users have different
goals and different ways to use the FATE system. The goals of the researcher could be to explore
innovation in case law (broad goal), to recognize trends, to increase quality and efficiently of
case law (through automation), or to control or audit the juridical system. The goals for the
consultant could be to determine whether a case should be started against a defendant, getting
fair ruling or to inform the subject. The goals for the subject could be getting the lowest possible
sentence as outcome by for example finding out if you need a lawyer. The FATE system aims to
help the different users with their respective goals.
Figure 2: Storyboard of the juridical use case and FATE system


4. Methodology and Results
Figure 2 shows a storyboard illustrating our approach to the juridical use case, and describes a
high-level workflow for the consultant and researcher roles. This storyboard was constructed
and verified based on input from a domain expert (a solicitor general in the Dutch high council),
collected through an interview. The consultant, being a public prosecutor in this use case, has
the goal of constructing an argument and advise on the verdict given a certain juridical case.
The prosecutor searches for relevant information in the current case, and wants to find relevant
case law. The FATE system supports this process by categorizing the current case, using a topic
tree - a hierarchical knowledge structure created by domain experts -, and suggesting similar
cases based on the same categories and the vector distance (doc2vec embeddings with a cosine
distance metric). The predicted category of the current case is returned to the public prosecutor,
together with a top 5 of similar cases. In the storyboard, an example is shown for visualizing the
semantic clusters by means of a scatter plot that shows individual cases within their assigned
clusters (colored circles). The explainable AI module explains the cluster assignment of the
current case by highlighting the paragraphs in the case that are found to be indicative of the
cluster. It provides a similar explanation for the cases that are found to be semantically similar
to the current case, while also presenting a counterfactual explanation that indicates which
textual changes in the current case would lead to a different cluster assignment. Next, the
consultant interacts with the data visualisation and explanation of the FATE system, in order to
achieve different goals relating to XAI, bias, or co-learning. Through interaction, the consultant
provides feedback to the system. For example, the consultant might disagree with the system
about a case clustering, and suggest another category for this case. Such feedback is saved into a
user model containing individual search- and clustering preferences that changes the clustering
behavior of the system. Moreover, the feedback from consultants is aggregated and shown to
the researcher(s). The researchers explores the system’s model and the consultant feedback in
order to identify (bias) patterns in the clustering by system and consultants, and understand
clustering discrepancies between consultant and system. Researchers might decide to alter the
model by providing assignment feedback (i.e., assigning individual cases to a particular cluster)
and description feedback (i.e., change the cluster label that is used by the system). The model
updates that are made by the researchers alter the clustering behavior of the FATE-system,
triggering new feedback from consultants.
   In this use case, a publicly available dataset of semi-structured Dutch textual data1 is used.
   The following sections explain the research done in the different modules.

4.1. Explainable AI
The XAI module focuses on explaining how the system determines case similarity, used when
determining case suggestions. It aims to explain which words, sentences or segments play a
role in determining why the current document is similar to others. Counterfactual explanations
are used for this [5], to convey which text needs to be omitted for the case text to be treated
as significantly different to result in different case suggestions. As such it differs from regular
counterfactual explanations that deal with a classification task and what input changes are
required to alter the classification [6].
   To construct these counterfactual explanations, a topic clustering approach is used. A hierar-
chical topic tree is provided, and cases are assigned to such topics based on the system’s used
similarity. An example of such topics is the parent topic crimes against life with sub-topics of
murder and manslaughter. This allows the XAI module to explain the system’s behaviour in
terms of more generic behaviour, as opposed to one document versus another.
   1
       https://www.rechtspraak.nl/Uitspraken/Paginas/Open-Data.aspx
Figure 3: User interface to show how the system determines case similarity using counterfactual ex-
planation. The example snippet[7] comes from a real case-law text without personal details.


   With the help of this topic clustering, counterfactual explanations explain what textual omis-
sions are required for the case to change from one topic-cluster to another. These explanations
aim to convey what aspects of the case text the system finds important in its case suggestions
with respect to meaningful topics. This allows users to determine whether a case should indeed
be treated as belonging to a certain topic, and thus whether the case suggestions are in any way
meaningful. In addition, the use of topic clusters combined with counterfactual explanations
allows for a highly interactive interaction between user and the FATE system to facilitate
exploration and a deeper understanding.
   The module itself supports the use of existing methods to identify counterfactuals based on
searching the embedding space. These are methods such as C-LIME or C-SHAP [8], FACE [9]
and SEDC [10]. In practice, these methods do not scale well. The case texts in our juridical
use-case vary from a few hundred words to tens of thousands words. The current counterfactual
methods all perform a heuristic search on the word-level, which proved to be intractable on our
real-world text documents. Instead, we incorporate domain knowledge from the researcher to
identify text segments relevant for the use case. In the juridical use case this is the identification
of various legal arguments and decisions. This allows for a more efficient approach in the search
of an appropriate counterfactual.
   Figure 3 shows a view of the demonstrator currently under development. The user provides
text of a court case as input to the text area element on the left. The system automatically
shows different topics, such as murder found in the text. It provides information per topic, such
as “If the words sawed-off gunshot and shot were removed, the topic of manslaughter would
apply instead of murder”. Moreover, words which triggered the assignment of this topic are
highlighted in the court case. Our goal is to enrich the interface to review this type of data both
spatially, using a dimensional reduced scatter plot and textually using the topic hierarchy and
relevant case text snippets. In this way, the user will be able to understand, through exploratory
interaction, what case texts the system deems similar or dissimilar and what in those texts
causes this.

4.2. Co-learning


Figure 4: Co-learning technical architecture


   Human-AI co-learning stands for both collaborative and continuous learning. This is a process
in which a human and an AI system both learn (i.e. acquire new knowledge, change behavior
and/or build meaning) through their collaborative interactions, in an ongoing manner. The term
‘co-learning’ has relatively recently been introduced as a vital process in human-AI collaboration
and interaction [11], and can be positioned next to similar terms such as co-adaptation and
co-evolution [12]. A key aspect of co-learning is the creation of intuitive and fluent interaction
between humans and machines such that the humans can learn from the machines, and the
other way around.
   In this module, we enable co-learning by allowing users of the FATE system to extensively
explore the clustering, an automatic grouping of documents. The clustering is made by the
system and presented through a dashboard-like GUI, thereby allowing the human to learn how
data is clustered. The users can then provide feedback on different aspects of the clustering,
thereby allowing the system to learn about human expert knowledge. Users can provide feedback
by changing the textual description of clusters and by reassigning case laws to a new cluster.
Integrating this feedback from the user is done with an automatic structuring approach of the
textual data in a continuous process. Case recommendations are improved by adding feedback
from the user to the topic tree. The topic tree is then combined with an interactive clustering
method, allowing users to also directly provide feedback on how specific cases are clustered. The
clustering provides insights from the data, while the topic tree contains the domain knowledge.
With this method, the FATE system will benefit from both human and machine knowledge and
make better recommendations to the users.
   For the juridical use case, we have a large amount of unstructured textual data: past case laws.
We can automatically create clusters to organize and provide insight to this data. However, the
topics, or labels, of these automatically created clusters are not always intuitive. This can be
improved by involving a user. The user can influence the cluster topics by adding feedback about
the cluster descriptions. The user can also assign a case law to a cluster. Such semi-supervised
clustering models, or interactive clustering models, become increasingly more popular due to
their ability to produce more meaningful categories for the user. Dubey et al. [13] show how a
k-means clustering framework can be improved by using assignment and description feedback.
With assignment feedback, the user can reassign a data point from one cluster to another. With
description feedback, the user can modify the vector of a cluster by providing a description for
the cluster. We elaborate on this work by linking the topic tree to the clusters, such that the
clusters not only contain insights from the data, but are labeled using human knowledge. In
figure 4 an architecture containing these technical components is shown. The jurisdiction data
is clustered in two ways, by using the topics from the topic tree, and by using the interactive
K-means clustering. Afterwards they are aligned by using a contingency matrix, such that the
clusters now are labelled with the topics. This alignment is presented to the user, who can
provide feedback. The new descriptions are incorporated in the topic tree, and then used to
update the interactive clustering.
   The approach described above to enable co-learning in the FATE system has strengths and
weaknesses. This method uses assignment feedback and description feedback as proposed by
Dubey et al. [13]. In practice, the assignment feedback does not work as expected from their
work. Another weakness is that evaluating our system is a tedious task without labeled data and
user interactions to use as a ground truth. For future evaluation, either labeled data needs to be
created or domain experts should be interviewed to determine the performance. The strength
of this system is that meaningful and intuitive clusters are created for the user, which helps
them to be able to find and compare unlabeled documents. These clusters are created by both
human and machine; the user created a topic tree which provides context to the clusters, and
they provide feedback on these clusters, thereby continuously improving the results.

4.3. Fairness and Bias
Different from our previous paper on this topic [1], we focus this year on the practical perspective
of bias. Based on the use case, three hypotheses were observed:
   1. the association between men and violence may be embedded in the data, e.g. criminals
                                                (b) Experiment 2: Language bias measured based on
(a) Experiment 1: Representation of gender pro- the method of Bolukbasi et al. (2016) and the
    nouns and the number of times violence is de- number of times violence is described in case
    scribed in case law.                            law.


      are more often male [14], , i.e. bias in data
   2. and these associations can be captured in the language model trained on the data, e.g. the
      link between violence and masculinity in language can be captured in correlations or in
      their semantic similarity [15, 16] , i.e. bias in the model
   3. in turn, the bias in the language model may affect the suggestions, e.g. AI predictions for
      men may have more false positives than predictions for women , i.e. bias in outcomes
   From the dataset, the cases from 2020 concerning criminal law in the Dutch courts were
collected and annotated for gender and violence [17]. Annotation of gender was first explored
based on whether the majority of pronouns (such as ‘he’,‘she’, ‘him’, ‘her’) were female or
male, however, this was found not accurate enough. Therefore gender was annotated per case
based on reading the text and finding pronouns directly indicating the gender of the defendant.
Annotation of the case pertaining physical violence from the defendant to another human was
approximated by counting the occurrences of the word ‘geweld’ (violence in Dutch). A case
containing more than two mentions of the word ‘geweld’, is annotated as violent, while cases
with zero mentions of ‘geweld’ are annotated as non-violent. Cases with 1 or 2 mentions of
‘geweld’ were excluded. A closer inspection by a random sample of 50 non-violent and 50
violent cases based on this annotation approach showed that 96% of the ‘non-violent’ cases
did not contain indication of a violent defendant and 98% of the ‘violent’ cases contained
written indication of violent behavior of the defendant. From the set of 8240 case law of 2020,
cases were annotated until a 100 cases were found for each combination of a male or female
defendant concerning non-violent or violent criminal behavior. Based on this data, we set up
three experiments to investigate bias in the case law.
   Experiment 1 concerned bias in the data (hypothesis 1), in which all cases from 2020 were
investigated to see whether cases with dominantly male pronouns contained more mentions
of violence (‘geweld’). In Figure 5a, it is shown that cases with relatively more male pronouns
contain more mentions of violence. This skewness in the data might be or might not be
representative of the current justice system. Nonetheless, the skewness creates a risk for bias
in the use of the system, as the association between male and violence might be captured and
relied upon by the system. This may not always be desirable in the application. Thereafter, the
annotated violent cases for male and female defendants were compared but both sets had an
average of 5.02 and 5.21 mentions of violence, respectively. A two sample t-test showed with a
p-value of 0.849 that the hypothesis of equal means could not be rejected. Hence, by inspecting
the number of times violence is described and the gender of the defendant, the hypothesized
risk of bias in data between male and violence is not found in this experiment.
   Experiment 2 concerned bias in the model (hypothesis 2), a doc2vec model was trained on the
case law of 2020 and as a first iteration the Direct Bias measurement of the well-known paper
by Bolukbasi [18] was used to measure gender bias. In brief, the gender direction in the vector
representation of the language model was measured by the Dutch equivalents of the following
word-pairs ‘Man-Woman’, ‘Male-Female’, ‘Father-Mother’, ‘Brother-Sister’ given that these
were present in the text. The similarity of the vectors of other words in a case with respect to
this gender direction in the vector space is then used to measure whether a case has gendered
wording within the lens of the language model. Figure 5b does not indicate a strong relation
between the language bias measured and mentions of violence. The averages of the bias scores
for the violent and non-violent cases are very close, 0.278 and 0.268, respectively. The results of
experiment 2 suggest that the measure by Bolukbasi does not indicate that a model trained on
case law has captured a strong association between men and violence.
   Experiment 3 concerned the bias in use of the model and data (hypothesis 3), i.e. the bias in
the outcomes. To measure the bias in the outcomes, non-violent cases with male and female
defendants, one hundred each, were put into the downstream task of finding 10 suggestions of
similar case law. Unwanted bias may prevail when non-violent cases with a male defendant
more often receive violent suggestions than female non-violent cases. Results showed that
the suggestions based on both female and male non-violent cases received very little violent
suggestions. The difference for female defendants 4.1% and male defendants 3.0% was statistically
insignificant, with a p-value of 0.21.
   Hence, these set of experiments: (1) identified a potential risk for bias in the data between the
association between male and violence, (2) did not identify a strong association between male
language bias and violence in the model trained on the data and (3) did not identify unwanted
bias when using the system to suggest similar cases for non-violent cases.
   In the future, more iterations on these experiments are necessary to investigate the hypotheses
stemming from the use case and make strong claims on bias and fairness present in the use
case. To start, next to gender bias, risks for bias related to nationality, migration status and
social background as well as their intersection are also likely to be present in the use case
[19, 20]. Next to that, we also advice the investigation concerning gender bias to be extended.
One opportunity is to add different bias measurement techniques for Experiment 2, such as
[21]. Another is to add meta-data of the verdict of the case to the data, such that associations
between gender and verdicts can be demonstrated. Additionally, bias measured in the use of
data-driven AI on the text for case law may be compared to more expert-driven modelling
approaches of case law, e.g. by extracting the key features to represent a case. On top of that,
to investigate bias mitigation in this use case, we want to inspect the effect of changing the
gender representation of the data used for training the model on bias measured in the use of
the AI. This fits closely with our aim of future work in FATE: to further our understanding on
how to mitigate the impact of unwanted bias of AI in many different scenarios, where bias may
be mitigated in the data, the model or the use of the AI. More use cases are necessary to show
the multiple facets of bias and give direction when and which approaches for mitigation of
unwanted bias are suitable.

4.4. Secure Learning
The problem that the area of Secure Learning aims to solve, is that of privacy violation in
AI applications. The solutions created in the areas of explainable AI, co-learning and fair AI
explicitly work on centralized data that is freely available to be processed, but this assumption
is often unrealistic in today’s world due to two main issues, namely:

   1. Data containing privacy-sensitive information cannot simply be used for AI purposes,
      due to privacy concerns and laws such as the GDPR. This is the case in, for example, the
      health care and financial domain and any other domain where personal data is present.
   2. Data is often distributed over different entities. This data cannot always simply be sent
      to a central entity first, because either the data volume is too large - resulting in issues
      concerning bandwidth, battery life etc. - or because it is not desired due to the reason
      provided above.

   In order to tackle both issues, the area of Secure Learning offers techniques called Privacy-
Enhancing Technologies (PETs) to create privacy-by-design solutions to AI algorithms. Privacy-
by-design is a design principle that ensures that negligible information about sensitive data is
disclosed throughout a certain process. In our case, we are interested in algorithms that enable
certain AI functionalities on sensitive data, without leaking information about this sensitive data.
Even though some research has been done on such algorithms for general machine learning
applications [22][23][24], the combination with explainable AI, fair AI and co-learning has yet
remained completely unexplored.
   Nowadays, there is a variety of PETs that can be used to create privacy-by-design solutions.
However, the FATE system provides personalized decision support, which brings about new
privacy-related challenges that are not trivially captured by privacy-by-design solutions. For
example, some explainable AI algorithms, such as foil trees [25], output a data point from
the training set of the model to be explained. Such an algorithm can be converted into a
privacy-by-design solution using PETs, but the output would still violate privacy.
   In the juridical use case, all information is public and preservation of privacy plays a minor
role, so our work continued to focus on the Diabetes type 2 use case from 2020. Nevertheless,
the developed algorithms and insights gained are applicable to many other contexts.
   We set out to design a privacy-by-design algorithm that provides the same functionality as
the foil tree algorithm [25] but provides the strongest privacy guarantees possible, by keeping
the sensitive training data and the model encrypted throughout the entire algorithm, since
attacks are known that can reconstruct training data from a model [26].
   Our first contribution is a synthetic data generation subroutine, that generates synthetic data
based on the sensitive training data that contains negligible information on the sensitive data.
This synthetic data then completely replaces the sensitive data during the rest of the algorithm.
   Our second contribution is a cryptographic procedure for training a decision tree on data with
encrypted target variable values. After the synthetic data is generated, the encrypted model is
securely applied in a black-box manner to obtain encrypted target values. The synthetic data
and corresponding encrypted labels are then used to train a decision tree called the foil tree.
There are algorithms that can train decision trees on encrypted data [27][28]. However, these
algorithms generally assume that all the data is encrypted, but in our situation only the target
variable is encrypted. Our scenario therefore lends itself to a more efficient solution.
   Our third contribution is an algorithm that can extract a contrastive explanation from the
encrypted foil tree trained before. As the input to the foil tree training algorithm was purely
synthetic data, we can provide data points from that set as part of the explanation without
revealing any information about the sensitive training data or the model.
   Our experiments show that the algorithm can run with only seconds of delay, which is critical
to interactive personalized decision support. The solution works for numerical and categorical
data, but it is still an open problem to extend this to textual data for use cases such as the juridical
one. Additionally, improvements could be made to the synthetic data algorithm to make the
data more realistic using domain knowledge. This would ensure that the user experience is
affected minimally by adopting this algorithm.
   Our work only touched upon the surface of an intriguing pool of problems where personalized
AI and privacy-preservation coexist. We believe that this seemingly contradictory combination
has much potential and hopefully our work motivates others to investigate privacy-preserving
explainable AI.


5. Conclusion and Future Work
In this paper we have described the next iteration of the the FATE system, which is being
developed as a generic decision support system, taking into account aspects of bias and fair AI,
Explainable AI, co-learning and secure learning. In particular, we described how the system
can aid different types of users in a juridical use case, in which the system is monitored for
certain aspects of fairness by making possible biases explicit, presenting the results in a user-
understandable manner (XAI) and exploring related case law by identifying relevant clusters
through interaction with a user. Additionally, we described how a privacy-by-design approach
– deemed a necessity in many cases due to restrictions on data use from multiple sources –
impacts on the algorithms used, and how possible mitigation strategies, e.g. through the use of
synthetic data, might remedy potential violations of privacy.
   By taking up a new (juridical) use case we explored on a per research area basis (bias, XAI,
co-learning and secure learning) whether modules previously developed for the Diabetes use
case would generalize to this new context. This required (amongst others) an adaptation of low
level data and ML components, as in the Diabetes use case we dealt with (tabular) patient data
and in the juridical use case with textual data. Additionally, various improvements were made
within the different modules to be able to support the new user roles in their decisions, such as
the topic tree and clustering in XAI and co-learning.
   Future work will focus on the refinement of the FATE system around the research areas
of XAI, Co-learning, Fairness and bias and secure learning, through the adoption of the next
use case in yet another domain. By adopting a series of use cases from different domains, the
general applicability of the system is exemplified.


Acknowledgments
The FATE project is funded by the TNO Appl.AI program (internal AI program). We would like
to thank the AI4J project for their use case and data and the other members of the FATE project
for their valuable feedback, especially Klamer Schutte and Thijs Veugen.


References
 [1] J. de Greeff, M. H. de Boer, F. H. Hillerström, F. Bomhof, W. Jorritsma, M. A. Neerincx,
     The fate system: Fair, transparent and explainable decision making., in: AAAI Spring
     Symposium: Combining Machine Learning with Knowledge Engineering, 2021.
 [2] K. H. Tae, Y. Roh, Y. H. Oh, H. Kim, S. E. Whang, Data cleaning for accurate, fair, and
     robust models: A big data-ai integration approach, in: Proceedings of the 3rd International
     Workshop on Data Management for End-to-End Machine Learning, 2019, pp. 1–4.
 [3] G. Ras, M. van Gerven, P. Haselager, Explanation methods in deep learning: Users, values,
     concerns and challenges, in: Explainable and interpretable models in computer vision and
     machine learning, Springer, 2018, pp. 19–36.
 [4] S. Hepenstal, D. McNeish, Explainable artificial intelligence: What do you need to know?,
     in: International Conference on Human-Computer Interaction, Springer, 2020, pp. 266–275.
 [5] R. K. Mothilal, A. Sharma, C. Tan, Explaining machine learning classifiers through di-
     verse counterfactual explanations, in: Proceedings of the 2020 Conference on Fairness,
     Accountability, and Transparency, 2020, pp. 607–617.
 [6] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the
     black box: Automated decisions and the gdpr, Harv. JL & Tech. 31 (2017) 841.
 [7] People v. satchell, https://law.justia.com/cases/california/supreme-court/3d/6/28.html,
     2021.
 [8] Y. Ramon, D. Martens, F. Provost, T. Evgeniou, A comparison of instance-level counter-
     factual explanation algorithms for behavioral and textual data: Sedc, lime-c and shap-c,
     Advances in Data Analysis and Classification 14 (2020) 801–819.
 [9] R. Poyiadzi, K. Sokol, R. Santos-Rodriguez, T. De Bie, P. Flach, Face: Feasible and actionable
     counterfactual explanations, in: Proceedings of the AAAI/ACM Conference on AI, Ethics,
     and Society, 2020, pp. 344–350.
[10] Y. Ramon, D. Martens, F. Provost, T. Evgeniou, Counterfactual explanation algorithms for
     behavioral and textual data, arXiv preprint arXiv:1912.01819 (2019).
[11] K. van den Bosch, T. Schoonderwoerd, R. Blankendaal, M. Neerincx, Six challenges for
     human-ai co-learning, in: International Conference on Human-Computer Interaction,
     Springer, 2019, pp. 572–589.
[12] E. M. Van Zoelen, K. Van Den Bosch, M. Neerincx, Becoming team members: Identifying
     interaction patterns of mutual adaptation for human-robot co-learning, Frontiers in
     Robotics and AI 8 (2021).
[13] A. Dubey, I. Bhattacharya, S. Godbole, A cluster-level semi-supervision model for inter-
     active clustering, in: Joint European Conference on Machine Learning and Knowledge
     Discovery in Databases, Springer, 2010, pp. 409–424.
[14] CBS, Jaarrapport integratie 2020 - criminaliteit, https://longreads.cbs.nl/integratie-2020/
     criminaliteit/, 2020.
[15] R. Sarre, et al., Men are more likely to commit violent crimes. why
     is this so and how do we change it?, https://theconversation.com/
     men-are-more-likely-to-commit-violent-crimes-why-is-this-so-and-how-do-we-change-it/
     -157331, 2021.
[16] APA, Harmful masculinity and violence, https://www.apa.org/pi/about/newsletter/2018/
     09/harmful-masculinity, 2018.
[17] R. S. Centrum, Dutch case law search engine: "uitspraken, een deel van alle rechterlijke
     uitspraken wordt gepubliceerd op rechtspraak.nl. dit gebeurt geanonimiseerd.", 2020. URL:
     https://uitspraken.rechtspraak.nl/.
[18] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, A. T. Kalai, Man is to computer
     programmer as woman is to homemaker? debiasing word embeddings, Advances in neural
     information processing systems 29 (2016) 4349–4357.
[19] M. Tonry, The social, psychological, and political causes of racial disparities in the american
     criminal justice system, Crime and justice 39 (2010) 273–312.
[20] A. S. Hartry, Gendering crimmigration: The intersection of gender, immigration, and the
     criminal justice system, Berkeley J. Gender L. & Just. 27 (2012) 1.
[21] H. Gonen, Y. Goldberg, Lipstick on a pig: Debiasing methods cover up systematic gender
     biases in word embeddings but do not remove them, arXiv preprint arXiv:1903.03862
     (2019).
[22] P. Mohassel, Y. Zhang, Secureml: A system for scalable privacy-preserving machine
     learning, in: 2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 19–38. doi:10.
     1109/SP.2017.12.
[23] S. de Hoogh, B. Schoenmakers, P. Chen, H. op den Akker, Practical secure decision tree
     learning in a teletreatment application, in: Proceedings of the 18th International Confer-
     ence on Financial Cryptography, Lecture Notes in Computer Science, Springer, Netherlands,
     2014, pp. 179–194. URL: https://ifca.ai/fc14/. doi:10.1007/978-3-662-45472-5_12.
[24] R. Shokri, V. Shmatikov, Privacy-preserving deep learning, in: 2015 53rd Annual Allerton
     Conference on Communication, Control, and Computing (Allerton), 2015, pp. 909–910.
     doi:10.1109/ALLERTON.2015.7447103.
[25] J. van der Waa, M. Robeer, J. van Diggelen, M. Brinkhuis, M. A. Neerincx, Contrastive
     explanations with local foil trees, CoRR abs/1806.07470 (2018).
[26] Q. Wang, D. Kurz, Reconstructing training data from diverse ML models by en-
     semble inversion, CoRR abs/2111.03702 (2021). URL: https://arxiv.org/abs/2111.03702.
     arXiv:2111.03702.
[27] S. de Hoogh, B. Schoenmakers, P. Chen, H. op den Akker, Practical secure decision tree
     learning in a teletreatment application, in: N. Christin, R. Safavi-Naini (Eds.), Financial
     Cryptography and Data Security - 18th International Conference, FC 2014, Christ Church,
     Barbados, March 3-7, 2014, Revised Selected Papers, volume 8437 of Lecture Notes in Com-
     puter Science, Springer, 2014, pp. 179–194. URL: https://doi.org/10.1007/978-3-662-45472-5_
     12. doi:10.1007/978-3-662-45472-5\_12.
[28] M. Abspoel, D. Escudero, N. Volgushev, Secure training of decision trees with continuous
     attributes, Proc. Priv. Enhancing Technol. 2021 (2021) 167–187. URL: https://doi.org/10.
     2478/popets-2021-0010. doi:10.2478/popets-2021-0010.

</pre>