=Paper= {{Paper |id=Vol-2404/paper16 |storemode=property |title=Interpretable Narrative Explanation for ML Predictors with LP: A Case Study for XAI |pdfUrl=https://ceur-ws.org/Vol-2404/paper16.pdf |volume=Vol-2404 |authors=Roberta Calegari,Giovanni Ciatto,Jason Dellaluce,Andrea Omicini |dblpUrl=https://dblp.org/rec/conf/woa/CalegariCDO19 }} ==Interpretable Narrative Explanation for ML Predictors with LP: A Case Study for XAI== https://ceur-ws.org/Vol-2404/paper16.pdf

Workshop "From Objects to Agents" (WOA 2019)

Interpretable Narrative Explanation for ML
Predictors with LP: A Case Study for XAI
Roberta Calegari Giovanni Ciatto Jason Dellaluce Andrea Omicini
Dipartimento di Informatica – Scienza e Ingegneria (DISI)
A LMA M ATER S TUDIORUM–Università di Bologna, Italy
Email: roberta.calegari@unibo.it, giovanni.ciatto@unibo.it, jason.dellaluce@studio.unibo.it, andrea.omicini@unibo.it

Abstract—In the era of digital revolution, individual lives read—as any consumer is likely to be profiled by most of the
are going to cross and interconnect ubiquitous online domains companies and organisations he/she has interacted.
and offline reality based on smart technologies—discovering, In spite of the large adoption, intelligent agents whose
storing, processing, learning, analysing, and predicting from huge
amounts of environment-collected data. Sub-symbolic techniques, behaviour is the result of automatic synthesis / learning proce-
such as deep learning, play a key role there, yet they are often dures are difficult to trust for most people—in particular when
built as black boxes, which are not inspectable, interpretable, people are not expert in the fields of computer or data sciences,
explainable. New research efforts towards explainable artificial AI, statistics. This is especially true for agents leveraging on
intelligence (XAI) are trying to address those issues, with the final machine or deep learning based techniques, often producing
purpose of building understandable, accountable, and trustable AI
systems—still, seemingly with a long way to go. models whose internal behaviour is opaque and hard to explain
Generally speaking, while we fully understand and appreciate for their developers too.
the power of sub-symbolic approaches, we believe that symbolic There, agents often tend to accumulate their knowledge into
approaches to machine intelligence, once properly combined with black-box predictive models which are trained through ML or
sub-symbolic ones, have a critical role to play in order to achieve DL. Broadly speaking, the “black-box” expression is used to
key properties of XAI such as observability, interpretability,
explainability, accountability, and trustability. In this paper we refer to models where knowledge is not explicitly represented
describe an example of integration of symbolic and sub-symbolic – such as in neural networks, support vector machines, or
techniques. First, we sketch a general framework where symbolic Hidden Markov Chains –, and it is therefore difficult, for
and sub-symbolic approaches could fruitfully combine to produce humans, to understand what a black-box actually knows, or
intelligent behaviour in AI applications. Then, we focus in what leads to a particular decision.
particular on the goal of building a narrative explanation for ML
predictors: to this end, we exploit the logical knowledge obtained Such difficulty in understanding black-boxes content and
translating decision tree predictors into logical programs. functioning is what prevents people from fully trusting –
Index Terms—XAI, logic programming, machine learning, and thus accepting – them. In several contexts, such as the
symbolic vs. sub-symbolic medical or financial ones, it is not sufficient for intelligent
agents to output bare decisions, since, for instance, ethical
I. I NTRODUCTION and legal issues may arise. An explanation for each decision
Artificial intelligence (AI), machine learning (ML), and is therefore often desirable, preferable, or even required. For
deep learning (DL) are nowadays intertwined with a growing instance, applications dealing with personal data need to face
number of aspects of people’s every day life [1], [2]. In fact, the challenges of achieving valid consent for data use and
more and more decisions are delegated by humans to software protecting confidentiality, and addressing threats to privacy,
agents whose intelligent behaviour is not the result of some data protection, and copyright. Those issues are particularly
skilled developer endowing it with some clever code, but rather challenging in critical application scenarios such as health-
the consequence the agents’ capability of learning, planning, care, often involving the use of image (i.e., identifiable) data
or inferring what to do from data—or, roughly speaking, their from children. While issues of data ownership, data security,
artificial intelligence. and data access are important, other ethical issues may arise:
For instance, banks and insurance companies have adopted since the diagnostic accuracy and value of the result is
ML and statistical methods since decades, in order to decide determined by the amount and quality of data used in model
whether or not to grant a loan to a given customer, or to training, the first potential concern is to avoid algorithmic
estimate the most profitable insurance plan for her. Similarly, bias, which may lead to social discrimination and result in
ML has been employed in order to help doctors with their inequitable access to healthcare, just related to the provenience
diagnoses, provided that a set of symptoms has been properly of the collected data [1], [3].
identified for a given patient; whereas statistical and proba- Furthermore, it may happen that black-boxes silently learn
bilistic inference have been employed to test drugs, in order something wrong (e.g., Google image recognition software
to prove them effective or safe. Furthermore, virtually any that classified black people as gorillas [4], [5]), or something
person, as a consumer of services and goods, lets a number right, but in a biased way (like the “background bias” problem,
of ML-trained agents decide or suggest what to buy, like, or causing for instance husky images to be recognised only

105
Workshop "From Objects to Agents" (WOA 2019)

because of their snowy background [6]). In such situations, II. C ONTEXT
explanations are expected to provide useful insights for black- Machine learning often produces black-box predictors based
box developers. on opaque models, thus hiding their internal logic to the user.
To tackle such trust issues, the eXplainable Artificial In- This hinders explainability, and represents both a practical and
telligence (XAI) research field has recently emerged, and an ethical issue for ML. As a result, many research approaches
a comprehensive research road map has been proposed by in the XAI field aim at overcoming that crucial weakness,
DARPA [7], targeting the themes of explainability and in- sometimes at the cost of trading off accuracy against inter-
terpretability in AI – and in particular ML – as a challenge pretability. So, we first (Subsection II-B) summarise the state
of paramount importance in a world where AI is becoming of the art as well as the goal of XAI, then (Subsection II-A)
more and more pervasively adopted. There, DARPA reviews introduce some background notions to define the terminology
the main approaches to make AI either more interpretable or a adopted.
posteriori explainable, it categorise the many currently avail-
able techniques aimed at building meaningful interpretations A. Background
or explanations for black-box models, it summarises the open Since several practical AI problems – such as image recog-
problems and challenges, and it provides a successful reference nition, financial and medical decision support systems – can
framework for the researchers interested in the field. be reduced to supervised ML – which can be further grouped
The main idea behind XAI is to employ explanators [8] in terms of either classification or regression problems [11],
to provide easy to understand insights for a given black-box [12] –, in the reminder of this paper we focus on this set of
and its particular decisions. An explanator is any procedure ML problems.
producing a meaningful explanation for some human observer, In those cases, a learning algorithm is commonly exploited
by leveraging on any combination of (i) the black-box, (ii) its to estimate the specific nature and shape of an unknown
input data, or (iii) its decisions or predictions. To this end, prediction function (or predictor) p∗ : X → Y, mapping
we believe that symbolic approaches to machine intelligence each input vector x from a given input space X into a
– properly integrated with sub-symbolic approaches – may prediction from a given output space Y. To do so, the learning
have a role to play in order to achieve key properties such algorithm takes into account a number N of examples in the
as interpretability, observability, explainability, accountability, form (xi , yi ) such that xi ∈ X ⊂ X , yi ∈ Y ⊂ Y, and
and trustability. |X| ≡ |Y | ≡ N . There, each xi represents an instance of the
In this paper we focus on the specific problem of building a input data for which the expected output value yi is known
narrative explanation of ML techniques—thus positioning our or has already been estimated. Such sorts of ML problems
contribution into the specific Narrative Generation DARPA are said to be “supervised” because the expected targets
category [7]. In particular, we first show a general framework Y are available, whereas they are said to be “regression”
where symbolic and sub-symbolic techniques are fruitfully problems if Y consists of continuous or numerable values, or
combined to produce intelligent behaviour in AI applications. “classification” problems if Y consists of categorical values.
Then, we focus on the translation of ML predictors into logical The learning algorithm usually assumes p∗ ∈ P, for a given
knowledge with the aim to (i) infer new knowledge, (ii) reason family P of predictors—meaning that the unknown prediction
and act accordingly, and (iii) build the narrative explanation function exists, and it is from P. The algorithm then trains a
of a decision output (or prediction). predictor p̂ ∈ P such that the value of a given loss function
To this end, we propose an automatic procedure aimed at λ : Y × Y → R – computing the discrepancy among predicted
translating a ML predictor – here in particular we consider the and expected nP outputs – is minimal o or reasonably low—i.e.:
N
case of decision trees (DT) – into logical knowledge. We argue p̂ = argmin i=1 λ(y i , p(x i )) .
p∈P
that, when the source DT has been trained over a set of real Depending on the predictor family P of choice, the nature
data in order to produce a predictor, the corresponding logic of the learning algorithm and the admissible shapes of p̂ may
program may be employed to produce a narrative explanation vary dramatically, as well as the their interpretability. Even if
for any given prediction. the interpretability of predictor families is not a well-defined
Despite being mostly focused on DT, our proposal represent feature, most authors agree on the fact that some predictor
a first step towards a more general approach. In fact, DT have families are more interpretable than others [13]—in the sense
been proposed as a general means for explaining the behaviour that it is easier for humans to understand the functioning and
of virtually any black-box model [9], [10]. the predictions of the former ones. For instance, it is widely
Accordingly, the reminder of this paper is organised as acknowledged that generalized linear models (GLM) are more
follows. Section II briefly recalls the ML concepts and termi- interpretable than neural networks (NN), whereas decision
nology used in the paper as well as the main research efforts in trees (DT) [14] are among the most interpretable families
the field. Then Section III introduces our vision of a framework [8]. DT can be considered more interpretable due to their
for the integration of symbolic and sub-symbolic techniques. construction: that is, recursively partitioning the input space
Finally, Section IV discusses early experiments alongside the X through a number of splits or decisions based on the input
prototype implementation. data X, in such a way that the prediction in each partition

106
Workshop "From Objects to Agents" (WOA 2019)

is constant, and the loss w.r.t. Y is low, while keeping the In spite of the many approaches proposed to explain black
amount of partitions low as well. Without affecting generality, boxes, some important scientific questions still remain unan-
we focus on the case of mono-dimensional classification – swered. One of the most important open problems is that,
thus we write y instead of y –, since other cases can be easily until now, there is no agreement on what an explanation is.
reduced to this one. We further assume the input space X is Indeed, some approaches adopt as explanation a set of rules,
N -dimensional, and let nj be the meta-variable representing others a decision tree, others rely on visualisation techniques
the name of the j th dimension of X . [8]. Moreover, recent works highlight the importance for an
Under such hypotheses, a DT predictor pT ∈ Pdt assumes explanation to guarantee some properties, e.g., soundness,
a binary tree T exists such that each node is either completeness, and compactness [8].
• a leaf, carrying and representing a prediction, i.e. and This is why our proposal aims at integrating sub-symbolic
assignment for y, approaches with symbolic ones. To this end, DT can be
• an internal node, carrying and representing a decision, i.e.
exploited as an effective bridge between the symbolic and
a formula in the form (nj ≤ c)—where c is a constant sub-symbolic realms. In fact, DT can be easily (i) built from
threshold chosen by the learning algorithm. an existing sub-symbolic predictor, and (ii) translated into
symbolic knowledge – as it is shown in the reminder of this
Each node ν inherits a partition Xν ⊆ X of the original input
paper – thanks to their rule-based nature.
data, from its parent. Since the root node ν0 has no parent, it
Decision trees are an interpretable family of predictors that
is assigned to the whole set of input data—i.e. Xν0 ≡ X. The
have been proposed as a global means for explaining other,
decision carried by each internal node splits its Xν into two
less interpretable, sorts of black-box predictors [9], [10]—
disjoint parts – XνL and XνR – along the j th dimension of X .
such as neural networks [19]. The main idea behind such an
In particular, XνL contains all the residual xi ∈ Xν such that
approach is to build a DT approximating the behaviour of a
(xji ≤ cν ) – which are inherited by ν left child –, whereas XνR
given predictor, possibly, by only considering its inputs and its
contains all the residual xi ∈ Xν such that xji > cν —which
outputs. Such approximation essentially trades off predictive
are inherited by by ν right child. A leaf node l is created
performance with interpretability. In fact, the structure of such
whenever a sequence of splits (i.e., a path from the tree root
a DT would then be used to provide useful insights concerning
to the leaf parent) leads to a partition Xl which is (almost)
the original predictor inner functioning.
pure—roughly, meaning that Xl (mostly) contains input data
Describing the particular means for extracting DT from
xi for which the expected output is the same yl . In this case,
black-boxes is outside the scope of this paper. Given the vast
we say that the prediction carried by l is yl . Assuming such a
literature on the topic – e.g., consider reading [8], [20] for
tree T exists, in order to classify some input data x ∈ X , the
an overview or [19], [21], [22] for a practical examples – we
predictor pT simply navigates the path P = (ν0 , ν1 , ν2 , . . . , l)
simply assume an extracted DT is available and it has an high
of T such that all decisions νk are matched by x, then it
fidelity—meaning that the loss in terms of predictive perfor-
outputs yl .
mance is low, w.r.t. the original black-box. In fact, whereas
there exist several works focussing on how to synthesise DT
B. XAI: The need for explanation and interpretable models
out of black-box predictors, no attention is paid to merging
Since the adoption of interpretable predictors usually comes them with symbolic approaches, which can play a key role in
at cost of a lower potential in terms of predictive performance, enhancing the interpretability and explainability of the system.
explanations are the newly preferred way for providing under- In this paper we focus on such a matter.
standable predictions without necessarily sacrificing accuracy. We believe that a logical representation of DT may be
The idea, and the main goal of XAI is to create intelligible interesting and enabling for further research directions. For
and understandable explanations for uninterpretable predictors instance, as far as explainability is concerned, we show how
without replacing or modifying them. Thus explanations are logic-translated DT can be used to both navigate the knowl-
built through a number of heterogeneous techniques, broadly edge stored within the corresponding predictors – thus acting
referred to as explanators [8]—just to cite some, decision rules as global explanators –, and produce narrative explanations
[15], feature importance [16], saliency masks [17], sensitivity for their predictions—thus acting as local explanators. Note
analysis [18], etc. that the restriction on the DT representation makes it easy to
The state of the art for explainability currently recognises map DT onto logical clauses, since DT are finite and with a
two main sorts of explanators, namely, either local or global. limited expressivity (if / else conditions).
While local explanators attempt to provide an explanation for
each particular prediction of a given predictor p, the global III. V ISION
ones attempt to provide an explanation for the predictor p as Many approaches to ML nowadays are increasingly fo-
a whole. In other words, local explanators provide an answer cussing on sub-symbolic approaches – such as deep learning
to the question “why does p predict y for the input x?” – with neural networks [23] – and on how to make them
such as the LIME technique presented in [6] –, whereas global work on the large scale. As promising as this may look –
explanators provide an answer to the question “how does p with the premise of potentially minimizing the engineering
build its predictions?”—such as decision rules. efforts needed – it is increasingly acknowledged that those

107
Workshop "From Objects to Agents" (WOA 2019)

Fig. 1. ML to LP and back: framework architecture.

approaches do not cope well with the socio-technical nature of global system, rules of general validity and concerning
the systems they are exploited in, which often demand a degree the most likely situation;
of interpretability, observability, explainability, accountability, • at the micro scale, we modulate such decision by con-
and trustability they just cannot deliver. sidering all the contingencies arising during the precise
To this end, since logic-based approaches already have situation – such as, for instance, a last minute inconve-
a well-understood role in building intelligent (multi-agent) nient, etc. As a consequence, we adapt the original plan
systems [24], declarative, logic-based approaches have the to the local perceptions we gather while enacting it.
potential to represent an alternative way of delivering sym- In order to better illustrate the above remarks, one may
bolic intelligence, complementary to the one pursued by consider as a concrete example the case of a disease diagnosis
sub-symbolic approaches. In fact, declarative and logic-based in a hospital, where the notions of micro and macro scale w.r.t.
technologies much better address the aforementioned socio- to the nature of algorithms and techniques can be declined as
technical issues, in particular when exploiting their inferential follows:
capabilities—e.g., [25].
• at the macro level, the main concerns regard a mid/long
The potential of logic-based models and their extensions is term horizon and focus the issue of analysis of high-
first of all related to their declarativeness as well as to explicit dimensional and multimodal biomedical data train algo-
knowledge representation, enabling knowledge sharing at the rithms to recognize cancerous tissue at a level comparable
most adequate level of abstraction, while supporting modu- to trained physicians—there including, for instance, rep-
larity and separation of concerns [26]—which are especially resentation and recognition of patterns and sequences in
valuable in open and dynamic distributed systems. As a further the input data. With such a sort of goals to pursue, it
element, LP sound and complete semantics straightforwardly is not surprising that most IT tools supporting decision
enables intelligent agents to reason and infer new information making are based on sub-symbolic approaches such as
in a sound and complete way. deep learning, Bayesian networks, machine vision, latent
Another relevant point is that LP has been already proven to Dirichlet analysis, and in general any kind of statistical
work well both as a knowledge representation language and as approach to ML [29], [30], [31]
an inference platform for rational agents [27], [28]. The latter • at the micro level, the main concerns regard instead the
usually may interact with an external environment by means short term horizon, and mostly focus on the specific
of a suitably defined observe–think–act cycle. problem of the patient, there including a few highly-
Accordingly to this vision, here we propose an integrated intertwined sub-problems—e.g. specific symptom or sit-
framework of hybrid reasoning – where symbolic and sub- uation, ongoing epidemic in that hospital or place that
symbolic techniques fruitfully combine to produce intelligent carries the same symptoms. Although sub-symbolic ap-
behaviour. proaches can still be used, symbolic ones such as fuzzy
Indeed, looking in depth at pervasive socio-technical sys- logic, specialized level (white box) learning instead of
tems, it turns out that agents (either human or software) higher-level learning, symbolic time series are most com-
effortlessly undertake a complex decision making process in mon [29], [32], [33]
almost all situations, which seamlessly integrates perceptions Generally speaking, we believe the computational intelligence
(and actions) at two different scales—the macro and the micro: accounts for this two kind of rules: general rules whose
• at the macro scale, by considering the knowledge of the validity is essentially unconstrained (speed limits, right of way,

108
Workshop "From Objects to Agents" (WOA 2019)

etc.) which represent the commonsense knowledge necessary TABLE I
to inhabit the environment and specific rules, with a validity ACUTE INFLAMMATIONS DATA SET ATTRIBUTES
bound in space and time (school hours and days, open-air Attribute Short name Values
market hours and days, unpredictable events such as incoming Temperature of patient temp 35◦ C ÷ 42◦ C
emergency vehicles the need to gather at an evacuation assem- Occurrence of nausea nausea {yes, no}
bly point), which represent the contextual or expert knowledge Lumbar pain lumbar {yes, no}
Urine pushing urine {yes, no}
necessary to deal with transient, unforeseen, and unpredictable Micturition pains micturition {yes, no}
situations. Burning of urethra urethra {yes, no}
That is why in the framework envisioned here we plan to
combine sub-symbolic techniques with symbolic ones (LP in Output attributes
particular): sub-symbolic techniques are exploited for training Inflammation of
inflammation {yes, no}
urinary bladder
the system and learn new rules (commonsense knowledge), Nephritis of
rules are translated into logical knowledge (contextual / expert nephritis {yes, no}
renal pelvis origin
knowledge), and the two approaches interact and interleave
to share knowledge and learn from each other in a coherent Alternative output
framework. {healty, inflammation
Diagnosis diagnosis
nephritis, both}
The framework architecture, depicted in Fig. 1, shows the
embodiment of the vision discussed above: sensor data and
dataset are translated into the logic knowledge base. In partic- TABLE II
ACUTE INFLAMMATIONS DATA SET DESCRIPTION
ular the Machine Learning Interface allows for the interaction
of different kinds of ML algorithms with the framework: a Dataset size 120
standard interface is proposed in order to combine the specific Num. of input attributes 6
features of each algorithm in a coherent manner. ML to Prolog Num. of output attributes 2
Num. of output classes 4
is the core of the translation into logical knowledge, while the Num. of healthy patients 30 (25%)
Prolog to ML returns insights of the logical KB to the ML Num. of patients with
59 (49.17%)
predictor—for instance, new inferred rules, or rules learned inflammation of urinary bladder
by a specific situation. The blocks on the left (Knowledge Num. of patients with
50 (41.67%)
nephritis of renal pelvis origin
Base, Demonstration) reflect the standard architecture of a Num. of patients with
19 (15.83%)
Prolog engine. Overall, the framework looks general enough both diseases
to account for the variety of ML techniques and algorithms,
and also to ensure the consistency between symbolic and
sub-symbolic approaches. Finally, the block Prolog to ML acute inflammations of urinary bladder and acute nephritises.
currently expresses our vision, and is obviously subject of Input parameters collect all the patient symptoms, each in-
future research. stance represents a potential patient. The data was created by
a medical expert as a data set to test the expert system, which
IV. E ARLY E XPERIMENTS performs the presumptive diagnosis of two diseases of urinary
The first prototype we design and implement enables the system. The dataset considered is summarised in TABLE I and
construction of a narrative explanation of the prediction gener- TABLE II.
ated exploiting the ML technique, thus achieving interpretabil- Starting from the general form Head ← Body for a logical
ity and making a step towards explainability. clause, a predicate in the Head is generated for the decision
With respect to Fig. 1, we experiment the predictor trans- of the predictor—in the example, the diagnosis predicate.
lation into logical rules, provided by the ML to Prolog. The Inside the predicate, a term for each input/output attribute is
experimental results refer to the case in which the predictor instantiated with the value of the decision tree (leaf).
corresponds to a decision tree or to the corresponding crisp In our example, the following predicate is generated:
✞
rules [34]. The conversion generates a Prolog predicate for diagnosis(temperatureOfPatient(T), occurrenceOfNausea(N),
each decision taken by the predictor: inside the predicate, a lumbarPain(L), urinePushing(U), micturitionPains(M),
burningOfUrethra(BU), nephritisOfRenalPelvisOrigin(
term for each input/output attribute is instantiated with the Decision), confidence(C)) :- Body.
values of the leaf of the decision tree. A rule is generated ✡ ✝ ✆
for each leaf in the tree: between the other advantages, this where the Body body consists of check and computation on
allows for a very compact representation, easy to handle and the variables of the Head terms. For instance, considering the
interoperate with. above tree of Fig. 2, the first generated rule is
For a concrete example, let us consider the “Acute in- ✞
diagnosis(temperatureOfPatient(T), occurrenceOfNausea(N),
flammations data set”1 [35] supplying data to perform the lumbarPain(L), urinePushing(U), micturitionPains(M),
presumptive diagnosis of two diseases of urinary system: the burningOfUrethra(BU), nephritis(no), confidence(1.00))
:- T =< 37.95.
✝
✡ ✆
1 http://archive.ics.uci.edu/ml/datasets/acute+inflammations

109
Workshop "From Objects to Agents" (WOA 2019)

representing the fact that if the temperature of patient is lesser inferring hidden knowledge in the rules. It is worth noticing
or equal of 37.9, it is unlikely the patient presents nephritis that similar results (emphasising the relations between decision
of renal pelvis; the answer contains a degree of confidence output) can be obtained manipulating the dataset a priori—
based on the case of the dataset that confirm the rule—in the i.e. before the ML algorithm training (a common operation
case 1.00 stands that all the patients in the dataset that have but not always applicable). The manipulation of the above
a temperature lower that 37.9 do not present the disease. dataset, for instance, can build a unique decision output
To improve readability, the rule above could be written as Result that combines the two different diseases and their
✞
diagnosis(temperatureOfPatient(T), _, _, _, _, _, nephritis symptoms. In such a case the dataset is enriched with the
(no), confidence(1.00)) :- T =< 37.95. Result attribute containing the complete diagnosis, i.e., it can
✝
✡ ✆
assume the values Healthy, Inflammation, Nephritis, Both. The
by omitting the undefined variables, i.e., highlighting the input corresponding decision tree and LP knowledge is depicted in
attribute that are effectively to be considered as influencer. Fig. 3.
Fig. 2 (left) depicts the whole picture: the decision trees d) Interpretable narrative explanation: LP makes it pos-
generated as output of the example dataset when we run sible to generate a narration for each answer of the predictor.
the basic classification tree algorithm2 and the corresponding The inference Prolog tree becomes inspectable, tracking the
translation into LP rules. With respect to Fig. 1, the decision path for obtaining the answer. For instance, w.r.t. the KB of
trees are the output of the Machine Learning Interface block Fig. 3 – including all diseases –, the diagnosis in the case of
and become the input for the ML to Prolog block. the following symptoms:
Fig. 2 represents experiments of running the ML algorithm ✞
with no manipulation of the dataset: so, since the ML algo- diagnosis(
temperatureOfPatient(36.5), occurrenceOfNausea(yes),
rithm allows only one decision output to be considered for lumbarPain(yes), urinePushing(no),
producing the corresponding decision tree, the information and micturitionPains(yes), burningOfUrethra(yes), _, _).
✝ ✆
the related knowledge is fragmented into two different trees – ✡
the first obtained running the algorithm with decision output would produce the corresponding narration:
nephritis and the second with decision output inflammation of ✞
urinary bladder. By running the ML to Prolog block of Fig. 1 The diagnosis is healthy, with a full confidence because
the patient has no fever.
we translate the two DT in LP rules as depicted in Fig. 2
(right). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
In particular the solution has been built across the
a) Interpretabilty: The LP program provides an inter- following path:
pretable explanation of virtually any predictor. At a glance, Solution: result(healthy) with confidence(1.00).
the user can identify which attributes are meaningful and con- For the proof, the following clauses are considered:
sidered for response and which are not. In case of nephritis, the [1] diagnosis(temperatureOfPatient(T), _, _, urinePushing(
no), _, _, result(healthy), confidence(1.00)) :- T =< 3
only significant input attributes are the temperature of patient 7.95.
and the presence or absence of lumbar pain. The same is for [2] X =< Y that is verified if ’
expression_less_or_equal_than’(X, Y)
inflammation of urinary bladder, where the only discriminative
attributes are presence of urine pushing, micturition pains and In the query the temperature T is of 36.5.
because of rule [1] 36.5 =< 36.9 has to be verified
lumbar pain. and because of [2] ’expression_less_or_equal_than’(36.5,
b) Interoperability: The adoption of a standard AI lan- 36.9) has to be verified
so rules [1] and [2] are verified.
guage (LP), in spite of the plethora of different specific ML %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
toolkits, paves the way towards an interoperable explanation ✡ ✝ ✆
where LP is exploited as sort of lingua franca that goes beyond Despite its simplicity, the narration allows for a reconstruc-
the technical implementation of each ML framework. tion of the decision track, showing the path to the decision.
c) Relations between outputs: As emphasised by Fig. 2,
With a large amount of nested rules this could result very
relations between outputs are lost, and possible links between
effective.
the diseases are not clearly highlighted having two different e) Exploitation of LP extension / abduction on the KB:
decision trees. Instead, once obtained a LP representation, it Moreover, we believe that exploiting abduction techniques we
is easy to run simple queries on it in order to get much more could pave the way to hypothetical reasoning with incomplete
information with respect to the two different decision tree. knowledge, i.e., learning new possible hypotheses that can
For instance, we can learn that in case of fever (temperature be assumed to hold, provided that they are consistent with
of patient > 37.95) not presenting nephritis (i.e. no lumbar the given knowledge base. The idea, to be explored in future
pain detected), the only case in which inflammation of uri- research, is to provide the most likely solution given a set of
nary bladder is present is when urine pushing is detected in evidence. The conclusion would leave a degree of uncertainty
absence of symptoms of micturition pains. With the logical while highlighting a plausible answer based on the collected
representation, relations between output can be recovered by information. In the healthcare field, for instance, it could be
2 We exploit two different implementations: C45 [36] weka J48 for the Java represented by having the collection of symptoms (although
translator and SciKit-Learn CART [14] for the Phyton one incomplete) and finding the most likely disease for them.

110
Workshop "From Objects to Agents" (WOA 2019)

✞
Output Decision:
Nephritis of renal pelvis origin {yes, no}
✝
✡ ✆
✞
diagnosis(temperatureOfPatient(T), _, _, _, _, _,
nephritis(no), confidence(1.00)) :- T =< 37.95.

diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _, _, _,
nephritis(yes), confidence(1.00)) :- T > 37.95.

diagnosis(temperatureOfPatient(T), _, lumbarPain(no), _, _,
_, nephritis(no), confidence(1.00)) :- T > 37.95.
✝
✡ ✆

✞
Output Decision:
Inflammation of urinary bladder {yes, no}

✝
✡ ✆
✞
diagnosis(_, _, _, urinePushing(no), _, _, inflammation(no),
confidence(1.00)).

diagnosis(_, _, lumbarPain(yes), urinePushing(yes),
micturitionPains(no), _, inflammation(no), confidence(1.00)).

diagnosis(_, _, lumbarPain(no), urinePushing(yes), micturitionPains
(no), _, inflammation(yes), confidence(1.00)).

diagnosis(_, _, _, urinePushing(yes), micturitionPains(yes), _,
inflammation, confidence(1.00).
✝
✡ ✆
Fig. 2. Experimental results obtained running the framework on the Acute Inflammations dataset [35]: on the lef t side are represented the decision trees
generated by the supervised ML algorithm (Weka J48 – SciKit-Learn CART), while on the right the corresponding LP rules output of the ML to Prolog
block. In order to deal with two different overlapped outputs, two DT are generated: information are not connected as the knowledge.

✞
Output Decision:
Result {Healthy, Inflammation, Nephritis, Both}
✝
✡ ✆
✞
diagnosis(temperatureOfPatient(T), _, _, urinePushing(no), _,
_, result(healthy), confidence(1.00)) :-
T =< 37.95.

diagnosis(temperatureOfPatient(T), _, _, urinePushing(yes), _
, _, result(inflammation), confidence(1.00)) :-
T =< 37.95.

diagnosis(temperatureOfPatient(T), _, lumbarPain(no), _,
_, _, result(healthy), confidence(1.00)) :- T > 37.95.

diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _,
micturitionPains(no), _, result(nephritis),
confidence(1.00)) :- T > 37.95.

diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _,
micturitionPains(yes), _, result(both),
confidence(0.66)) :- T > 37.95.
✝
✡ ✆
Fig. 3. Decision Tree (left) and corresponding “ML to Prolog core” output (right) after the previous manipulation of the dataset. In particular the two different
output decisions (nephritis and inflammation of urinary bladder) have been combined in order to generate a comprehensive output decision: the new diagnosis
consider that case of a healthy patient (none of the previous diseases), the case in which only one of the two diseases is present (inflammation or nephritis),
and finally the case in which are both present.

V. C ONCLUSION tive policing. Nevertheless, concerns about the intentional
and unintentional negative consequences of AI systems are
AI systems nowadays synthesise large amounts of data, legitimate, as well as ethical and legal concerns, mostly related
learning from experience and making predictions with the to darkness and opaqueness of AI decision algorithm. For that
goal of taking autonomous decisions—applications range from reason, recent work on interpretability in machine learning and
clinical decision support to autonomous driving and predic-

111
Workshop "From Objects to Agents" (WOA 2019)

AI has focused on simplified models that approximate the true [17] R. Fong and A. Vedaldi, “Interpretable explanations of black boxes by
criteria used to make decisions. meaningful perturbation,” CoRR, vol. abs/1704.03296, 2017.
In this paper we focus on building a narrative explanation [18] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep
networks,” CoRR, vol. abs/1703.01365, 2017.
of the machine learning techniques: we first translate a ML
[19] M. W. Craven and J. W. Shavlik, “Extracting tree-structured represen-
predictor into logical knowledge, then inspect the proof tree tations of trained networks,” in 8th International Conference on Neural
leading to a solution. The narration is built tracking the path Information Processing Systems (NIPS’95). MIT Press, 1995, pp. 24–
(i.e., the rules) that leads from the query to the answer. 30.
Along this line, we foresee a broader vision that involves [20] R. Andrews, J. Diederich, and A. B. Tickle, “Survey and critique of
techniques for extracting rules from trained artificial neural networks,”
the design of a consistent framework where symbolic and Knowledge-Based Systems, vol. 8, no. 6, pp. 373–389, Dec. 1995.
sub-symbolic techniques are fruitfully combined to produce [21] U. Johansson and l. Niklasson, “Evolving decision trees using oracle
intelligent behaviour in AI applications while exploiting the guides,” in 2009 IEEE Symposium on Computational Intelligence and
benefits of each approach—like, in the case of symbolic ones, Data Mining, Mar. 2009, pp. 238–244.
interpretability, observability, explainability, and accountabil- [22] N. Frosst and G. E. Hinton, “Distilling a neural network into a soft
ity. decision tree,” in CEX 2017 Comprehensibility and Explanation in AI
and ML 2017 (CEX 2017), ser. CEUR Workshop Proceedings, vol. 2071,
The results presented here represent just a preliminary Nov. 2017.
exploration of the potential benefits of merging symbolic and [23] D. Silver et al., “Mastering the game of Go with deep neural networks
sub-symbolic approaches—where, of course, many critical and tree search,” Nature, vol. 529, pp. 484–489, Jan. 2016.
issues are still unexplored and will be subject of future work. [24] A. Omicini and F. Zambonelli, “MAS as complex systems: A view on
However, despite its simplicity, the case study already allows the role of declarative approaches,” in Declarative Agent Languages and
Technologies, ser. Lecture Notes in Computer Science. Springer, May
us to point out the feasibility and the potential benefits of the 2004, vol. 2990, pp. 1–17.
exploitation of symbolic techniques towards XAI. [25] F. Idelberger, G. Governatori, R. Riveret, and G. Sartor, “Evaluation of
logic-based smart contracts for blockchain systems,” in Rule Technolo-
R EFERENCES gies. Research, Tools, and Applications, ser. Lecture Notes in Computer
[1] D. Helbing, “Societal, economic, ethical and legal challenges of the Science, vol. 9718. Springer, 2016, pp. 167–183.
digital revolution: From big data to deep learning, artificial intelligence, [26] M. Oliya and H. K. Pung, “Towards incremental reasoning for context
and manipulative technologies,” in Towards Digital Enlightenment. aware systems,” in Advances in Computing and Communications, ser.
Springer, 2019, pp. 47–72. Communications in Computer and Information Science. Springer, 2011,
[2] A. Elliott, The Culture of AI: Everyday Life and the Digital Revolution. vol. 190, pp. 232–241.
Routledge, 2019. [27] G. Sotnik, “The SOSIEL platform: Knowledge-based, cognitive, and
[3] S. Bird, K. Kenthapadi, E. Kiciman, and M. Mitchell, “Fairness- multi-agent,” Biologically Inspired Cognitive Architectures, vol. 26, pp.
aware machine learning: Practical challenges and lessons learned,” in 103–117, Oct. 2018.
12th ACM International Conference on Web Search and Data Mining
(WSDM’19). ACM, 2019, pp. 834–835. [28] R. Kowalski and F. Sadri, “From logic programming towards multi-agent
[4] M. Fourcade and K. Healy, “Categories all the way down,” Historical systems,” Annals of Mathematics and Artificial Intelligence, vol. 25,
Social Research/Historische Sozialforschung, pp. 286–296, 2017. no. 3, pp. 391–419, Nov. 1999.
[5] K. Crawford, “Artificial intelligence’s white guy problem,” The New [29] M. D. Pandya, P. D. Shah, and S. Jardosh, “Medical image diagnosis
York Times, vol. 25, 2016. for disease detection: A deep learning approach,” in U-Healthcare
[6] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Monitoring Systems, ser. Advances in Ubiquitous Sensing Applications
Explaining the predictions of any classifier,” CoRR, vol. abs/1602.04938, for Healthcare. Academic Press, 2019, vol. 1: Design and Applications,
2016. ch. 3, pp. 37–60.
[7] D. Gunning, “Explainable artificial intelligence (XAI),” DARPA,
[30] S. Kuwayama, Y. Ayatsuka, D. Yanagisono, T. Uta, H. Usui, A. Kato,
Funding Program DARPA-BAA-16-53, 2016. [Online]. Available:
N. Takase, Y. Ogura, and T. Yasukawa, “Automated detection of macular
http://www.darpa.mil/program/explainable-artificial-intelligence
diseases by optical coherence tomography and artificial intelligence
[8] R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti,
machine learning of optical coherence tomography images,” Journal of
“A survey of methods for explaining black box models,” CoRR, vol.
Ophthalmology, vol. 2019, p. 7, 2019.
abs/1802.01933, 2018.
[9] F. Di Castro and E. Bertini, “Surrogate decision tree visualization,” in [31] P. Sajda, “Machine learning for detection and diagnosis of disease,”
Joint Proceedings of the ACM IUI 2019 Workshops (ACMIUI-WS 2019), Annual Review of Biomedical Engineering, vol. 8, pp. 537–565, Aug.
ser. CEUR Workshop Proceedings, vol. 2327, Mar. 2019. 2006.
[10] O. Bastani, C. Kim, and H. Bastani, “Interpreting blackbox models via [32] C. Zhang, Y. Chen, A. Yin, and X. Wang, “Anomaly detection in
model extraction,” CoRR, vol. abs/1705.08504, 2017. ECG based on trend symbolic aggregate approximation,” Mathematical
[11] B. Twala, “Multiple classifier application to credit risk assessment,” Biosciences and Engineering, vol. 16, no. 4, pp. 2154–2167, 2019.
Expert Systems with Applications, vol. 37, no. 4, pp. 3326–3336, 2010.
[12] S. Kotsiantis, “Supervised machine learning: A review of classification [33] A. Rastogi, R. Arora, and S. Sharma, “Leaf disease detection and grading
techniques,” in Emerging Artificial Intelligence Applications in Com- using computer vision technology & fuzzy logic,” in 2nd International
puter Engineering, ser. Frontiers in Artificial Intelligence and Applica- Conference on Signal Processing and Integrated Networks (SPIN 2015).
tions. IOS Press, Oct. 2007, vol. 160, pp. 3–24. IEEE, 2015, pp. 500–505.
[13] Z. C. Lipton, “The mythos of model interpretability,” CoRR, vol. [34] A. Lozowski, T. J. Cholewo, and J. M. Zurada, “Crisp rule extraction
abs/1606.03490, 2016. from perceptron network classifiers,” in IEEE International Conference
[14] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification on Neural Networks (ICNN 1996), vol. Plenary, Panel and Special
and Regression Trees. Chapman & Hall/CRC, 1984. Sessions, Jun. 1996, pp. 94–99.
[16] G. Tolomei, F. Silvestri, A. Haines, and M. Lalmas, “Interpretable [35] J. Czerniak and H. Zarzycki, “Application of rough sets in the presump-
predictions of tree-based ensembles via actionable feature tweaking,” in tive diagnosis of urinary system diseases,” in Artificial Intelligence and
23rd ACM SIGKDD International Conference on Knowledge Discovery Security in Computing Systems, ser. The Springer International Series
and Data Mining. ACM, 2017, pp. 465–474. [Online]. Available: in Engineering and Computer Science. Springer, 2003, vol. 752, pp.
http://dl.acm.org/citation.cfm?id=3098039 41–51.
[15] M. G. Augasta and T. Kathirvalavakumar, “Reverse engineering the
neural networks for rule extraction in classification problems,” Neural [36] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco,
Processing Letters, vol. 35, no. 2, pp. 131–150, Apr. 2012. CA, USA: Morgan Kaufmann Publishers Inc., 1993.

112