            Interpretable Narrative Explanation for ML
            Predictors with LP: A Case Study for XAI
                    Roberta Calegari            Giovanni Ciatto           Jason Dellaluce          Andrea Omicini
                                 Dipartimento di Informatica – Scienza e Ingegneria (DISI)
                                 A LMA M ATER S TUDIORUM–Università di Bologna, Italy
     Email: roberta.calegari@unibo.it, giovanni.ciatto@unibo.it, jason.dellaluce@studio.unibo.it, andrea.omicini@unibo.it

   Abstract—In the era of digital revolution, individual lives          read—as any consumer is likely to be profiled by most of the
are going to cross and interconnect ubiquitous online domains           companies and organisations he/she has interacted.
and offline reality based on smart technologies—discovering,               In spite of the large adoption, intelligent agents whose
storing, processing, learning, analysing, and predicting from huge
amounts of environment-collected data. Sub-symbolic techniques,         behaviour is the result of automatic synthesis / learning proce-
such as deep learning, play a key role there, yet they are often        dures are difficult to trust for most people—in particular when
built as black boxes, which are not inspectable, interpretable,         people are not expert in the fields of computer or data sciences,
explainable. New research efforts towards explainable artificial        AI, statistics. This is especially true for agents leveraging on
intelligence (XAI) are trying to address those issues, with the final   machine or deep learning based techniques, often producing
purpose of building understandable, accountable, and trustable AI
systems—still, seemingly with a long way to go.                         models whose internal behaviour is opaque and hard to explain
   Generally speaking, while we fully understand and appreciate         for their developers too.
the power of sub-symbolic approaches, we believe that symbolic             There, agents often tend to accumulate their knowledge into
approaches to machine intelligence, once properly combined with         black-box predictive models which are trained through ML or
sub-symbolic ones, have a critical role to play in order to achieve     DL. Broadly speaking, the “black-box” expression is used to
key properties of XAI such as observability, interpretability,
explainability, accountability, and trustability. In this paper we      refer to models where knowledge is not explicitly represented
describe an example of integration of symbolic and sub-symbolic         – such as in neural networks, support vector machines, or
techniques. First, we sketch a general framework where symbolic         Hidden Markov Chains –, and it is therefore difficult, for
and sub-symbolic approaches could fruitfully combine to produce         humans, to understand what a black-box actually knows, or
intelligent behaviour in AI applications. Then, we focus in             what leads to a particular decision.
particular on the goal of building a narrative explanation for ML
predictors: to this end, we exploit the logical knowledge obtained         Such difficulty in understanding black-boxes content and
translating decision tree predictors into logical programs.             functioning is what prevents people from fully trusting –
   Index Terms—XAI, logic programming, machine learning,                and thus accepting – them. In several contexts, such as the
symbolic vs. sub-symbolic                                               medical or financial ones, it is not sufficient for intelligent
                                                                        agents to output bare decisions, since, for instance, ethical
                       I. I NTRODUCTION                                 and legal issues may arise. An explanation for each decision
   Artificial intelligence (AI), machine learning (ML), and             is therefore often desirable, preferable, or even required. For
deep learning (DL) are nowadays intertwined with a growing              instance, applications dealing with personal data need to face
number of aspects of people’s every day life [1], [2]. In fact,         the challenges of achieving valid consent for data use and
more and more decisions are delegated by humans to software             protecting confidentiality, and addressing threats to privacy,
agents whose intelligent behaviour is not the result of some            data protection, and copyright. Those issues are particularly
skilled developer endowing it with some clever code, but rather         challenging in critical application scenarios such as health-
the consequence the agents’ capability of learning, planning,           care, often involving the use of image (i.e., identifiable) data
or inferring what to do from data—or, roughly speaking, their           from children. While issues of data ownership, data security,
artificial intelligence.                                                and data access are important, other ethical issues may arise:
   For instance, banks and insurance companies have adopted             since the diagnostic accuracy and value of the result is
ML and statistical methods since decades, in order to decide            determined by the amount and quality of data used in model
whether or not to grant a loan to a given customer, or to               training, the first potential concern is to avoid algorithmic
estimate the most profitable insurance plan for her. Similarly,         bias, which may lead to social discrimination and result in
ML has been employed in order to help doctors with their                inequitable access to healthcare, just related to the provenience
diagnoses, provided that a set of symptoms has been properly            of the collected data [1], [3].
identified for a given patient; whereas statistical and proba-             Furthermore, it may happen that black-boxes silently learn
bilistic inference have been employed to test drugs, in order           something wrong (e.g., Google image recognition software
to prove them effective or safe. Furthermore, virtually any             that classified black people as gorillas [4], [5]), or something
person, as a consumer of services and goods, lets a number              right, but in a biased way (like the “background bias” problem,
of ML-trained agents decide or suggest what to buy, like, or            causing for instance husky images to be recognised only

because of their snowy background [6]). In such situations,                                    II. C ONTEXT
explanations are expected to provide useful insights for black-          Machine learning often produces black-box predictors based
box developers.                                                       on opaque models, thus hiding their internal logic to the user.
   To tackle such trust issues, the eXplainable Artificial In-        This hinders explainability, and represents both a practical and
telligence (XAI) research field has recently emerged, and             an ethical issue for ML. As a result, many research approaches
a comprehensive research road map has been proposed by                in the XAI field aim at overcoming that crucial weakness,
DARPA [7], targeting the themes of explainability and in-             sometimes at the cost of trading off accuracy against inter-
terpretability in AI – and in particular ML – as a challenge          pretability. So, we first (Subsection II-B) summarise the state
of paramount importance in a world where AI is becoming               of the art as well as the goal of XAI, then (Subsection II-A)
more and more pervasively adopted. There, DARPA reviews               introduce some background notions to define the terminology
the main approaches to make AI either more interpretable or a         adopted.
posteriori explainable, it categorise the many currently avail-
able techniques aimed at building meaningful interpretations          A. Background
or explanations for black-box models, it summarises the open             Since several practical AI problems – such as image recog-
problems and challenges, and it provides a successful reference       nition, financial and medical decision support systems – can
framework for the researchers interested in the field.                be reduced to supervised ML – which can be further grouped
   The main idea behind XAI is to employ explanators [8]              in terms of either classification or regression problems [11],
to provide easy to understand insights for a given black-box          [12] –, in the reminder of this paper we focus on this set of
and its particular decisions. An explanator is any procedure          ML problems.
producing a meaningful explanation for some human observer,              In those cases, a learning algorithm is commonly exploited
by leveraging on any combination of (i) the black-box, (ii) its       to estimate the specific nature and shape of an unknown
input data, or (iii) its decisions or predictions. To this end,       prediction function (or predictor) p∗ : X → Y, mapping
we believe that symbolic approaches to machine intelligence           each input vector x from a given input space X into a
– properly integrated with sub-symbolic approaches – may              prediction from a given output space Y. To do so, the learning
have a role to play in order to achieve key properties such           algorithm takes into account a number N of examples in the
as interpretability, observability, explainability, accountability,   form (xi , yi ) such that xi ∈ X ⊂ X , yi ∈ Y ⊂ Y, and
and trustability.                                                     |X| ≡ |Y | ≡ N . There, each xi represents an instance of the
   In this paper we focus on the specific problem of building a       input data for which the expected output value yi is known
narrative explanation of ML techniques—thus positioning our           or has already been estimated. Such sorts of ML problems
contribution into the specific Narrative Generation DARPA             are said to be “supervised” because the expected targets
category [7]. In particular, we first show a general framework        Y are available, whereas they are said to be “regression”
where symbolic and sub-symbolic techniques are fruitfully             problems if Y consists of continuous or numerable values, or
combined to produce intelligent behaviour in AI applications.         “classification” problems if Y consists of categorical values.
Then, we focus on the translation of ML predictors into logical          The learning algorithm usually assumes p∗ ∈ P, for a given
knowledge with the aim to (i) infer new knowledge, (ii) reason        family P of predictors—meaning that the unknown prediction
and act accordingly, and (iii) build the narrative explanation        function exists, and it is from P. The algorithm then trains a
of a decision output (or prediction).                                 predictor p̂ ∈ P such that the value of a given loss function
   To this end, we propose an automatic procedure aimed at            λ : Y × Y → R – computing the discrepancy among predicted
translating a ML predictor – here in particular we consider the       and expected nP outputs – is minimal  o or reasonably low—i.e.:
case of decision trees (DT) – into logical knowledge. We argue        p̂ = argmin      i=1 λ(y i , p(x i ))  .
that, when the source DT has been trained over a set of real             Depending on the predictor family P of choice, the nature
data in order to produce a predictor, the corresponding logic         of the learning algorithm and the admissible shapes of p̂ may
program may be employed to produce a narrative explanation            vary dramatically, as well as the their interpretability. Even if
for any given prediction.                                             the interpretability of predictor families is not a well-defined
   Despite being mostly focused on DT, our proposal represent         feature, most authors agree on the fact that some predictor
a first step towards a more general approach. In fact, DT have        families are more interpretable than others [13]—in the sense
been proposed as a general means for explaining the behaviour         that it is easier for humans to understand the functioning and
of virtually any black-box model [9], [10].                           the predictions of the former ones. For instance, it is widely
   Accordingly, the reminder of this paper is organised as            acknowledged that generalized linear models (GLM) are more
follows. Section II briefly recalls the ML concepts and termi-        interpretable than neural networks (NN), whereas decision
nology used in the paper as well as the main research efforts in      trees (DT) [14] are among the most interpretable families
the field. Then Section III introduces our vision of a framework      [8]. DT can be considered more interpretable due to their
for the integration of symbolic and sub-symbolic techniques.          construction: that is, recursively partitioning the input space
Finally, Section IV discusses early experiments alongside the         X through a number of splits or decisions based on the input
prototype implementation.                                             data X, in such a way that the prediction in each partition

is constant, and the loss w.r.t. Y is low, while keeping the               In spite of the many approaches proposed to explain black
amount of partitions low as well. Without affecting generality,         boxes, some important scientific questions still remain unan-
we focus on the case of mono-dimensional classification –               swered. One of the most important open problems is that,
thus we write y instead of y –, since other cases can be easily         until now, there is no agreement on what an explanation is.
reduced to this one. We further assume the input space X is             Indeed, some approaches adopt as explanation a set of rules,
N -dimensional, and let nj be the meta-variable representing            others a decision tree, others rely on visualisation techniques
the name of the j th dimension of X .                                   [8]. Moreover, recent works highlight the importance for an
   Under such hypotheses, a DT predictor pT ∈ Pdt assumes               explanation to guarantee some properties, e.g., soundness,
a binary tree T exists such that each node is either                    completeness, and compactness [8].
  • a leaf, carrying and representing a prediction, i.e. and               This is why our proposal aims at integrating sub-symbolic
    assignment for y,                                                   approaches with symbolic ones. To this end, DT can be
  • an internal node, carrying and representing a decision, i.e.
                                                                        exploited as an effective bridge between the symbolic and
    a formula in the form (nj ≤ c)—where c is a constant                sub-symbolic realms. In fact, DT can be easily (i) built from
    threshold chosen by the learning algorithm.                         an existing sub-symbolic predictor, and (ii) translated into
                                                                        symbolic knowledge – as it is shown in the reminder of this
Each node ν inherits a partition Xν ⊆ X of the original input
                                                                        paper – thanks to their rule-based nature.
data, from its parent. Since the root node ν0 has no parent, it
                                                                           Decision trees are an interpretable family of predictors that
is assigned to the whole set of input data—i.e. Xν0 ≡ X. The
                                                                        have been proposed as a global means for explaining other,
decision carried by each internal node splits its Xν into two
                                                                        less interpretable, sorts of black-box predictors [9], [10]—
disjoint parts – XνL and XνR – along the j th dimension of X .
                                                                        such as neural networks [19]. The main idea behind such an
In particular, XνL contains all the residual xi ∈ Xν such that
                                                                        approach is to build a DT approximating the behaviour of a
(xji ≤ cν ) – which are inherited by ν left child –, whereas XνR
                                                                        given predictor, possibly, by only considering its inputs and its
contains all the residual xi ∈ Xν such that xji > cν —which
                                                                        outputs. Such approximation essentially trades off predictive
are inherited by by ν right child. A leaf node l is created
                                                                        performance with interpretability. In fact, the structure of such
whenever a sequence of splits (i.e., a path from the tree root
                                                                        a DT would then be used to provide useful insights concerning
to the leaf parent) leads to a partition Xl which is (almost)
                                                                        the original predictor inner functioning.
pure—roughly, meaning that Xl (mostly) contains input data
                                                                           Describing the particular means for extracting DT from
xi for which the expected output is the same yl . In this case,
                                                                        black-boxes is outside the scope of this paper. Given the vast
we say that the prediction carried by l is yl . Assuming such a
                                                                        literature on the topic – e.g., consider reading [8], [20] for
tree T exists, in order to classify some input data x ∈ X , the
                                                                        an overview or [19], [21], [22] for a practical examples – we
predictor pT simply navigates the path P = (ν0 , ν1 , ν2 , . . . , l)
                                                                        simply assume an extracted DT is available and it has an high
of T such that all decisions νk are matched by x, then it
                                                                        fidelity—meaning that the loss in terms of predictive perfor-
outputs yl .
                                                                        mance is low, w.r.t. the original black-box. In fact, whereas
                                                                        there exist several works focussing on how to synthesise DT
B. XAI: The need for explanation and interpretable models
                                                                        out of black-box predictors, no attention is paid to merging
   Since the adoption of interpretable predictors usually comes         them with symbolic approaches, which can play a key role in
at cost of a lower potential in terms of predictive performance,        enhancing the interpretability and explainability of the system.
explanations are the newly preferred way for providing under-           In this paper we focus on such a matter.
standable predictions without necessarily sacrificing accuracy.            We believe that a logical representation of DT may be
The idea, and the main goal of XAI is to create intelligible            interesting and enabling for further research directions. For
and understandable explanations for uninterpretable predictors          instance, as far as explainability is concerned, we show how
without replacing or modifying them. Thus explanations are              logic-translated DT can be used to both navigate the knowl-
built through a number of heterogeneous techniques, broadly             edge stored within the corresponding predictors – thus acting
referred to as explanators [8]—just to cite some, decision rules        as global explanators –, and produce narrative explanations
[15], feature importance [16], saliency masks [17], sensitivity         for their predictions—thus acting as local explanators. Note
analysis [18], etc.                                                     that the restriction on the DT representation makes it easy to
   The state of the art for explainability currently recognises         map DT onto logical clauses, since DT are finite and with a
two main sorts of explanators, namely, either local or global.          limited expressivity (if / else conditions).
While local explanators attempt to provide an explanation for
each particular prediction of a given predictor p, the global                                    III. V ISION
ones attempt to provide an explanation for the predictor p as              Many approaches to ML nowadays are increasingly fo-
a whole. In other words, local explanators provide an answer            cussing on sub-symbolic approaches – such as deep learning
to the question “why does p predict y for the input x?” –               with neural networks [23] – and on how to make them
such as the LIME technique presented in [6] –, whereas global           work on the large scale. As promising as this may look –
explanators provide an answer to the question “how does p               with the premise of potentially minimizing the engineering
build its predictions?”—such as decision rules.                         efforts needed – it is increasingly acknowledged that those

                                              Fig. 1. ML to LP and back: framework architecture.

approaches do not cope well with the socio-technical nature of               global system, rules of general validity and concerning
the systems they are exploited in, which often demand a degree               the most likely situation;
of interpretability, observability, explainability, accountability,        • at the micro scale, we modulate such decision by con-
and trustability they just cannot deliver.                                   sidering all the contingencies arising during the precise
   To this end, since logic-based approaches already have                    situation – such as, for instance, a last minute inconve-
a well-understood role in building intelligent (multi-agent)                 nient, etc. As a consequence, we adapt the original plan
systems [24], declarative, logic-based approaches have the                   to the local perceptions we gather while enacting it.
potential to represent an alternative way of delivering sym-            In order to better illustrate the above remarks, one may
bolic intelligence, complementary to the one pursued by                 consider as a concrete example the case of a disease diagnosis
sub-symbolic approaches. In fact, declarative and logic-based           in a hospital, where the notions of micro and macro scale w.r.t.
technologies much better address the aforementioned socio-              to the nature of algorithms and techniques can be declined as
technical issues, in particular when exploiting their inferential       follows:
capabilities—e.g., [25].
                                                                           • at the macro level, the main concerns regard a mid/long
   The potential of logic-based models and their extensions is               term horizon and focus the issue of analysis of high-
first of all related to their declarativeness as well as to explicit         dimensional and multimodal biomedical data train algo-
knowledge representation, enabling knowledge sharing at the                  rithms to recognize cancerous tissue at a level comparable
most adequate level of abstraction, while supporting modu-                   to trained physicians—there including, for instance, rep-
larity and separation of concerns [26]—which are especially                  resentation and recognition of patterns and sequences in
valuable in open and dynamic distributed systems. As a further               the input data. With such a sort of goals to pursue, it
element, LP sound and complete semantics straightforwardly                   is not surprising that most IT tools supporting decision
enables intelligent agents to reason and infer new information               making are based on sub-symbolic approaches such as
in a sound and complete way.                                                 deep learning, Bayesian networks, machine vision, latent
   Another relevant point is that LP has been already proven to              Dirichlet analysis, and in general any kind of statistical
work well both as a knowledge representation language and as                 approach to ML [29], [30], [31]
an inference platform for rational agents [27], [28]. The latter           • at the micro level, the main concerns regard instead the
usually may interact with an external environment by means                   short term horizon, and mostly focus on the specific
of a suitably defined observe–think–act cycle.                               problem of the patient, there including a few highly-
   Accordingly to this vision, here we propose an integrated                 intertwined sub-problems—e.g. specific symptom or sit-
framework of hybrid reasoning – where symbolic and sub-                      uation, ongoing epidemic in that hospital or place that
symbolic techniques fruitfully combine to produce intelligent                carries the same symptoms. Although sub-symbolic ap-
behaviour.                                                                   proaches can still be used, symbolic ones such as fuzzy
   Indeed, looking in depth at pervasive socio-technical sys-                logic, specialized level (white box) learning instead of
tems, it turns out that agents (either human or software)                    higher-level learning, symbolic time series are most com-
effortlessly undertake a complex decision making process in                  mon [29], [32], [33]
almost all situations, which seamlessly integrates perceptions          Generally speaking, we believe the computational intelligence
(and actions) at two different scales—the macro and the micro:          accounts for this two kind of rules: general rules whose
  •   at the macro scale, by considering the knowledge of the           validity is essentially unconstrained (speed limits, right of way,

etc.) which represent the commonsense knowledge necessary                                         TABLE I
to inhabit the environment and specific rules, with a validity                     ACUTE INFLAMMATIONS DATA SET ATTRIBUTES
bound in space and time (school hours and days, open-air                      Attribute            Short name                  Values
market hours and days, unpredictable events such as incoming            Temperature of patient        temp                 35◦ C ÷ 42◦ C
emergency vehicles the need to gather at an evacuation assem-           Occurrence of nausea         nausea                  {yes, no}
bly point), which represent the contextual or expert knowledge              Lumbar pain              lumbar                  {yes, no}
                                                                            Urine pushing            urine                   {yes, no}
necessary to deal with transient, unforeseen, and unpredictable           Micturition pains       micturition                {yes, no}
situations.                                                               Burning of urethra        urethra                  {yes, no}
   That is why in the framework envisioned here we plan to
combine sub-symbolic techniques with symbolic ones (LP in                Output attributes
particular): sub-symbolic techniques are exploited for training           Inflammation of
                                                                                                 inflammation                {yes, no}
                                                                           urinary bladder
the system and learn new rules (commonsense knowledge),                     Nephritis of
rules are translated into logical knowledge (contextual / expert                                   nephritis                 {yes, no}
                                                                         renal pelvis origin
knowledge), and the two approaches interact and interleave
to share knowledge and learn from each other in a coherent               Alternative output
framework.                                                                                                            {healty, inflammation
                                                                              Diagnosis            diagnosis
                                                                                                                        nephritis, both}
   The framework architecture, depicted in Fig. 1, shows the
embodiment of the vision discussed above: sensor data and
dataset are translated into the logic knowledge base. In partic-                                 TABLE II
                                                                                  ACUTE INFLAMMATIONS DATA SET DESCRIPTION
ular the Machine Learning Interface allows for the interaction
of different kinds of ML algorithms with the framework: a                         Dataset size                       120
standard interface is proposed in order to combine the specific                   Num. of input attributes             6
features of each algorithm in a coherent manner. ML to Prolog                     Num. of output attributes            2
                                                                                  Num. of output classes               4
is the core of the translation into logical knowledge, while the                  Num. of healthy patients            30    (25%)
Prolog to ML returns insights of the logical KB to the ML                         Num. of patients with
                                                                                                                      59   (49.17%)
predictor—for instance, new inferred rules, or rules learned                      inflammation of urinary bladder
by a specific situation. The blocks on the left (Knowledge                        Num. of patients with
                                                                                                                      50   (41.67%)
                                                                                  nephritis of renal pelvis origin
Base, Demonstration) reflect the standard architecture of a                       Num. of patients with
                                                                                                                      19   (15.83%)
Prolog engine. Overall, the framework looks general enough                        both diseases
to account for the variety of ML techniques and algorithms,
and also to ensure the consistency between symbolic and
sub-symbolic approaches. Finally, the block Prolog to ML            acute inflammations of urinary bladder and acute nephritises.
currently expresses our vision, and is obviously subject of         Input parameters collect all the patient symptoms, each in-
future research.                                                    stance represents a potential patient. The data was created by
                                                                    a medical expert as a data set to test the expert system, which
                     IV. E ARLY E XPERIMENTS                        performs the presumptive diagnosis of two diseases of urinary
   The first prototype we design and implement enables the          system. The dataset considered is summarised in TABLE I and
construction of a narrative explanation of the prediction gener-    TABLE II.
ated exploiting the ML technique, thus achieving interpretabil-         Starting from the general form Head ← Body for a logical
ity and making a step towards explainability.                        clause, a predicate in the Head is generated for the decision
   With respect to Fig. 1, we experiment the predictor trans-       of the predictor—in the example, the diagnosis predicate.
lation into logical rules, provided by the ML to Prolog. The        Inside the predicate, a term for each input/output attribute is
experimental results refer to the case in which the predictor       instantiated with the value of the decision tree (leaf).
corresponds to a decision tree or to the corresponding crisp            In our example, the following predicate is generated:
rules [34]. The conversion generates a Prolog predicate for diagnosis(temperatureOfPatient(T), occurrenceOfNausea(N),
each decision taken by the predictor: inside the predicate, a        lumbarPain(L), urinePushing(U), micturitionPains(M),
                                                                     burningOfUrethra(BU), nephritisOfRenalPelvisOrigin(
term for each input/output attribute is instantiated with the        Decision), confidence(C)) :- Body.
values of the leaf of the decision tree. A rule is generated ✡   ✝                                                          ✆
for each leaf in the tree: between the other advantages, this where the Body body consists of check and computation on
allows for a very compact representation, easy to handle and the variables of the Head terms. For instance, considering the
interoperate with.                                               above tree of Fig. 2, the first generated rule is
   For a concrete example, let us consider the “Acute in- ✞
                                                                     diagnosis(temperatureOfPatient(T), occurrenceOfNausea(N),
flammations data set”1 [35] supplying data to perform the                 lumbarPain(L), urinePushing(U), micturitionPains(M),
presumptive diagnosis of two diseases of urinary system: the              burningOfUrethra(BU), nephritis(no), confidence(1.00))
                                                                          :- T =< 37.95.
                                                                    ✡                                                                         ✆
  1 http://archive.ics.uci.edu/ml/datasets/acute+inflammations

representing the fact that if the temperature of patient is lesser  inferring hidden knowledge in the rules. It is worth noticing
or equal of 37.9, it is unlikely the patient presents nephritis     that similar results (emphasising the relations between decision
of renal pelvis; the answer contains a degree of confidence         output) can be obtained manipulating the dataset a priori—
based on the case of the dataset that confirm the rule—in the       i.e. before the ML algorithm training (a common operation
case 1.00 stands that all the patients in the dataset that have     but not always applicable). The manipulation of the above
a temperature lower that 37.9 do not present the disease.           dataset, for instance, can build a unique decision output
   To improve readability, the rule above could be written as       Result that combines the two different diseases and their
diagnosis(temperatureOfPatient(T), _, _, _, _, _, nephritis         symptoms. In such a case the dataset is enriched with the
       (no), confidence(1.00)) :- T =< 37.95.                       Result attribute containing the complete diagnosis, i.e., it can
✡                                                                 ✆
                                                                    assume the values Healthy, Inflammation, Nephritis, Both. The
by omitting the undefined variables, i.e., highlighting the input corresponding decision tree and LP knowledge is depicted in
attribute that are effectively to be considered as influencer.      Fig. 3.
    Fig. 2 (left) depicts the whole picture: the decision trees           d) Interpretable narrative explanation: LP makes it pos-
generated as output of the example dataset when we run sible to generate a narration for each answer of the predictor.
the basic classification tree algorithm2 and the corresponding The inference Prolog tree becomes inspectable, tracking the
translation into LP rules. With respect to Fig. 1, the decision path for obtaining the answer. For instance, w.r.t. the KB of
trees are the output of the Machine Learning Interface block Fig. 3 – including all diseases –, the diagnosis in the case of
 and become the input for the ML to Prolog block.                   the following symptoms:
    Fig. 2 represents experiments of running the ML algorithm ✞
 with no manipulation of the dataset: so, since the ML algo- diagnosis(
                                                                           temperatureOfPatient(36.5), occurrenceOfNausea(yes),
 rithm allows only one decision output to be considered for                lumbarPain(yes), urinePushing(no),
 producing the corresponding decision tree, the information and            micturitionPains(yes), burningOfUrethra(yes), _, _).
                                                                   ✝                                                                  ✆
 the related knowledge is fragmented into two different trees – ✡
 the first obtained running the algorithm with decision output would produce the corresponding narration:
 nephritis and the second with decision output inflammation of ✞
urinary bladder. By running the ML to Prolog block of Fig. 1 The diagnosis is healthy, with a full confidence because
                                                                    the patient has no fever.
 we translate the two DT in LP rules as depicted in Fig. 2
 (right).                                                           %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                                                                    In particular the solution has been built across the
       a) Interpretabilty: The LP program provides an inter- following path:
 pretable explanation of virtually any predictor. At a glance, Solution: result(healthy) with confidence(1.00).
 the user can identify which attributes are meaningful and con- For the proof, the following clauses are considered:
 sidered for response and which are not. In case of nephritis, the [1] diagnosis(temperatureOfPatient(T), _, _, urinePushing(
                                                                          no), _, _, result(healthy), confidence(1.00)) :- T =< 3
 only significant input attributes are the temperature of patient         7.95.
 and the presence or absence of lumbar pain. The same is for [2] X =< Y that is verified if ’
                                                                          expression_less_or_equal_than’(X, Y)
 inflammation of urinary bladder, where the only discriminative
 attributes are presence of urine pushing, micturition pains and In the query the temperature T is of 36.5.
                                                                       because of rule [1] 36.5 =< 36.9 has to be verified
 lumbar pain.                                                          and because of [2] ’expression_less_or_equal_than’(36.5,
       b) Interoperability: The adoption of a standard AI lan-            36.9) has to be verified
                                                                    so rules [1] and [2] are verified.
 guage (LP), in spite of the plethora of different specific ML %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 toolkits, paves the way towards an interoperable explanation ✡    ✝                                                                  ✆
 where LP is exploited as sort of lingua franca that goes beyond       Despite its simplicity, the narration allows for a reconstruc-
 the technical implementation of each ML framework.                 tion of the decision track, showing the path to the decision.
       c) Relations between outputs: As emphasised by Fig. 2,
                                                                    With a large amount of nested rules this could result very
relations between outputs are lost, and possible links between
the diseases are not clearly highlighted having two different             e) Exploitation of LP extension / abduction on the KB:
decision trees. Instead, once obtained a LP representation, it Moreover, we believe that exploiting abduction techniques we
is easy to run simple queries on it in order to get much more could pave the way to hypothetical reasoning with incomplete
information with respect to the two different decision tree. knowledge, i.e., learning new possible hypotheses that can
For instance, we can learn that in case of fever (temperature be assumed to hold, provided that they are consistent with
of patient > 37.95) not presenting nephritis (i.e. no lumbar the given knowledge base. The idea, to be explored in future
pain detected), the only case in which inflammation of uri- research, is to provide the most likely solution given a set of
nary bladder is present is when urine pushing is detected in evidence. The conclusion would leave a degree of uncertainty
absence of symptoms of micturition pains. With the logical while highlighting a plausible answer based on the collected
representation, relations between output can be recovered by information. In the healthcare field, for instance, it could be
    2 We exploit two different implementations: C45 [36] weka J48 for the Java   represented by having the collection of symptoms (although
 translator and SciKit-Learn CART [14] for the Phyton one                        incomplete) and finding the most likely disease for them.

                                                                            Output Decision:
                                                                            Nephritis of renal pelvis origin {yes, no}
                                                                        ✡                                                                                           ✆
                                                                         diagnosis(temperatureOfPatient(T), _, _, _, _, _,
                                                                               nephritis(no), confidence(1.00)) :- T =< 37.95.

                                                                         diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _, _, _,
                                                                              nephritis(yes), confidence(1.00)) :- T > 37.95.

                                                                         diagnosis(temperatureOfPatient(T), _, lumbarPain(no), _, _,
                                                                               _, nephritis(no), confidence(1.00)) :- T > 37.95.
                                                                        ✡                                                                                           ✆

                                                                            Output Decision:
                                                                            Inflammation of urinary bladder {yes, no}

                                                                        ✡                                                                                           ✆
                                                                         diagnosis(_, _, _, urinePushing(no), _, _, inflammation(no),

                                                                         diagnosis(_, _, lumbarPain(yes), urinePushing(yes),
                                                                              micturitionPains(no), _, inflammation(no), confidence(1.00)).

                                                                         diagnosis(_, _, lumbarPain(no), urinePushing(yes), micturitionPains
                                                                              (no), _, inflammation(yes), confidence(1.00)).

                                                                         diagnosis(_, _, _, urinePushing(yes), micturitionPains(yes), _,
                                                                              inflammation, confidence(1.00).
                                                                        ✡                                                                                           ✆
Fig. 2. Experimental results obtained running the framework on the Acute Inflammations dataset [35]: on the lef t side are represented the decision trees
generated by the supervised ML algorithm (Weka J48 – SciKit-Learn CART), while on the right the corresponding LP rules output of the ML to Prolog
block. In order to deal with two different overlapped outputs, two DT are generated: information are not connected as the knowledge.

                                                                               Output Decision:
                                                                               Result {Healthy, Inflammation, Nephritis, Both}
                                                                             ✡                                                                                      ✆
                                                                               diagnosis(temperatureOfPatient(T), _, _, urinePushing(no), _,
                                                                                     _, result(healthy), confidence(1.00)) :-
                                                                                    T =< 37.95.

                                                                               diagnosis(temperatureOfPatient(T), _, _, urinePushing(yes), _
                                                                                    , _, result(inflammation), confidence(1.00)) :-
                                                                                    T =< 37.95.

                                                                               diagnosis(temperatureOfPatient(T), _, lumbarPain(no), _,
                                                                                    _, _, result(healthy), confidence(1.00)) :- T > 37.95.

                                                                               diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _,
                                                                                    micturitionPains(no), _, result(nephritis),
                                                                                    confidence(1.00)) :- T > 37.95.

                                                                               diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _,
                                                                                    micturitionPains(yes), _, result(both),
                                                                                    confidence(0.66)) :- T > 37.95.
                                                                             ✡                                                                                      ✆
Fig. 3. Decision Tree (left) and corresponding “ML to Prolog core” output (right) after the previous manipulation of the dataset. In particular the two different
output decisions (nephritis and inflammation of urinary bladder) have been combined in order to generate a comprehensive output decision: the new diagnosis
consider that case of a healthy patient (none of the previous diseases), the case in which only one of the two diseases is present (inflammation or nephritis),
and finally the case in which are both present.

                            V. C ONCLUSION                                        tive policing. Nevertheless, concerns about the intentional
                                                                                  and unintentional negative consequences of AI systems are
   AI systems nowadays synthesise large amounts of data,                          legitimate, as well as ethical and legal concerns, mostly related
learning from experience and making predictions with the                          to darkness and opaqueness of AI decision algorithm. For that
goal of taking autonomous decisions—applications range from                       reason, recent work on interpretability in machine learning and
clinical decision support to autonomous driving and predic-

AI has focused on simplified models that approximate the true                       [17] R. Fong and A. Vedaldi, “Interpretable explanations of black boxes by
criteria used to make decisions.                                                         meaningful perturbation,” CoRR, vol. abs/1704.03296, 2017.
   In this paper we focus on building a narrative explanation                       [18] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep
                                                                                         networks,” CoRR, vol. abs/1703.01365, 2017.
of the machine learning techniques: we first translate a ML
                                                                                    [19] M. W. Craven and J. W. Shavlik, “Extracting tree-structured represen-
predictor into logical knowledge, then inspect the proof tree                            tations of trained networks,” in 8th International Conference on Neural
leading to a solution. The narration is built tracking the path                          Information Processing Systems (NIPS’95). MIT Press, 1995, pp. 24–
(i.e., the rules) that leads from the query to the answer.                               30.
   Along this line, we foresee a broader vision that involves                       [20] R. Andrews, J. Diederich, and A. B. Tickle, “Survey and critique of
                                                                                         techniques for extracting rules from trained artificial neural networks,”
the design of a consistent framework where symbolic and                                  Knowledge-Based Systems, vol. 8, no. 6, pp. 373–389, Dec. 1995.
sub-symbolic techniques are fruitfully combined to produce                          [21] U. Johansson and l. Niklasson, “Evolving decision trees using oracle
intelligent behaviour in AI applications while exploiting the                            guides,” in 2009 IEEE Symposium on Computational Intelligence and
benefits of each approach—like, in the case of symbolic ones,                            Data Mining, Mar. 2009, pp. 238–244.
interpretability, observability, explainability, and accountabil-                   [22] N. Frosst and G. E. Hinton, “Distilling a neural network into a soft
ity.                                                                                     decision tree,” in CEX 2017 Comprehensibility and Explanation in AI
                                                                                         and ML 2017 (CEX 2017), ser. CEUR Workshop Proceedings, vol. 2071,
   The results presented here represent just a preliminary                               Nov. 2017.
exploration of the potential benefits of merging symbolic and                       [23] D. Silver et al., “Mastering the game of Go with deep neural networks
sub-symbolic approaches—where, of course, many critical                                  and tree search,” Nature, vol. 529, pp. 484–489, Jan. 2016.
issues are still unexplored and will be subject of future work.                     [24] A. Omicini and F. Zambonelli, “MAS as complex systems: A view on
However, despite its simplicity, the case study already allows                           the role of declarative approaches,” in Declarative Agent Languages and
                                                                                         Technologies, ser. Lecture Notes in Computer Science. Springer, May
us to point out the feasibility and the potential benefits of the                        2004, vol. 2990, pp. 1–17.
exploitation of symbolic techniques towards XAI.                                    [25] F. Idelberger, G. Governatori, R. Riveret, and G. Sartor, “Evaluation of
                                                                                         logic-based smart contracts for blockchain systems,” in Rule Technolo-
