Workshop "From Objects to Agents" (WOA 2019) Interpretable Narrative Explanation for ML Predictors with LP: A Case Study for XAI Roberta Calegari Giovanni Ciatto Jason Dellaluce Andrea Omicini Dipartimento di Informatica – Scienza e Ingegneria (DISI) A LMA M ATER S TUDIORUM–Università di Bologna, Italy Email: roberta.calegari@unibo.it, giovanni.ciatto@unibo.it, jason.dellaluce@studio.unibo.it, andrea.omicini@unibo.it Abstract—In the era of digital revolution, individual lives read—as any consumer is likely to be profiled by most of the are going to cross and interconnect ubiquitous online domains companies and organisations he/she has interacted. and offline reality based on smart technologies—discovering, In spite of the large adoption, intelligent agents whose storing, processing, learning, analysing, and predicting from huge amounts of environment-collected data. Sub-symbolic techniques, behaviour is the result of automatic synthesis / learning proce- such as deep learning, play a key role there, yet they are often dures are difficult to trust for most people—in particular when built as black boxes, which are not inspectable, interpretable, people are not expert in the fields of computer or data sciences, explainable. New research efforts towards explainable artificial AI, statistics. This is especially true for agents leveraging on intelligence (XAI) are trying to address those issues, with the final machine or deep learning based techniques, often producing purpose of building understandable, accountable, and trustable AI systems—still, seemingly with a long way to go. models whose internal behaviour is opaque and hard to explain Generally speaking, while we fully understand and appreciate for their developers too. the power of sub-symbolic approaches, we believe that symbolic There, agents often tend to accumulate their knowledge into approaches to machine intelligence, once properly combined with black-box predictive models which are trained through ML or sub-symbolic ones, have a critical role to play in order to achieve DL. Broadly speaking, the “black-box” expression is used to key properties of XAI such as observability, interpretability, explainability, accountability, and trustability. In this paper we refer to models where knowledge is not explicitly represented describe an example of integration of symbolic and sub-symbolic – such as in neural networks, support vector machines, or techniques. First, we sketch a general framework where symbolic Hidden Markov Chains –, and it is therefore difficult, for and sub-symbolic approaches could fruitfully combine to produce humans, to understand what a black-box actually knows, or intelligent behaviour in AI applications. Then, we focus in what leads to a particular decision. particular on the goal of building a narrative explanation for ML predictors: to this end, we exploit the logical knowledge obtained Such difficulty in understanding black-boxes content and translating decision tree predictors into logical programs. functioning is what prevents people from fully trusting – Index Terms—XAI, logic programming, machine learning, and thus accepting – them. In several contexts, such as the symbolic vs. sub-symbolic medical or financial ones, it is not sufficient for intelligent agents to output bare decisions, since, for instance, ethical I. I NTRODUCTION and legal issues may arise. An explanation for each decision Artificial intelligence (AI), machine learning (ML), and is therefore often desirable, preferable, or even required. For deep learning (DL) are nowadays intertwined with a growing instance, applications dealing with personal data need to face number of aspects of people’s every day life [1], [2]. In fact, the challenges of achieving valid consent for data use and more and more decisions are delegated by humans to software protecting confidentiality, and addressing threats to privacy, agents whose intelligent behaviour is not the result of some data protection, and copyright. Those issues are particularly skilled developer endowing it with some clever code, but rather challenging in critical application scenarios such as health- the consequence the agents’ capability of learning, planning, care, often involving the use of image (i.e., identifiable) data or inferring what to do from data—or, roughly speaking, their from children. While issues of data ownership, data security, artificial intelligence. and data access are important, other ethical issues may arise: For instance, banks and insurance companies have adopted since the diagnostic accuracy and value of the result is ML and statistical methods since decades, in order to decide determined by the amount and quality of data used in model whether or not to grant a loan to a given customer, or to training, the first potential concern is to avoid algorithmic estimate the most profitable insurance plan for her. Similarly, bias, which may lead to social discrimination and result in ML has been employed in order to help doctors with their inequitable access to healthcare, just related to the provenience diagnoses, provided that a set of symptoms has been properly of the collected data [1], [3]. identified for a given patient; whereas statistical and proba- Furthermore, it may happen that black-boxes silently learn bilistic inference have been employed to test drugs, in order something wrong (e.g., Google image recognition software to prove them effective or safe. Furthermore, virtually any that classified black people as gorillas [4], [5]), or something person, as a consumer of services and goods, lets a number right, but in a biased way (like the “background bias” problem, of ML-trained agents decide or suggest what to buy, like, or causing for instance husky images to be recognised only 105 Workshop "From Objects to Agents" (WOA 2019) because of their snowy background [6]). In such situations, II. C ONTEXT explanations are expected to provide useful insights for black- Machine learning often produces black-box predictors based box developers. on opaque models, thus hiding their internal logic to the user. To tackle such trust issues, the eXplainable Artificial In- This hinders explainability, and represents both a practical and telligence (XAI) research field has recently emerged, and an ethical issue for ML. As a result, many research approaches a comprehensive research road map has been proposed by in the XAI field aim at overcoming that crucial weakness, DARPA [7], targeting the themes of explainability and in- sometimes at the cost of trading off accuracy against inter- terpretability in AI – and in particular ML – as a challenge pretability. So, we first (Subsection II-B) summarise the state of paramount importance in a world where AI is becoming of the art as well as the goal of XAI, then (Subsection II-A) more and more pervasively adopted. There, DARPA reviews introduce some background notions to define the terminology the main approaches to make AI either more interpretable or a adopted. posteriori explainable, it categorise the many currently avail- able techniques aimed at building meaningful interpretations A. Background or explanations for black-box models, it summarises the open Since several practical AI problems – such as image recog- problems and challenges, and it provides a successful reference nition, financial and medical decision support systems – can framework for the researchers interested in the field. be reduced to supervised ML – which can be further grouped The main idea behind XAI is to employ explanators [8] in terms of either classification or regression problems [11], to provide easy to understand insights for a given black-box [12] –, in the reminder of this paper we focus on this set of and its particular decisions. An explanator is any procedure ML problems. producing a meaningful explanation for some human observer, In those cases, a learning algorithm is commonly exploited by leveraging on any combination of (i) the black-box, (ii) its to estimate the specific nature and shape of an unknown input data, or (iii) its decisions or predictions. To this end, prediction function (or predictor) p∗ : X → Y, mapping we believe that symbolic approaches to machine intelligence each input vector x from a given input space X into a – properly integrated with sub-symbolic approaches – may prediction from a given output space Y. To do so, the learning have a role to play in order to achieve key properties such algorithm takes into account a number N of examples in the as interpretability, observability, explainability, accountability, form (xi , yi ) such that xi ∈ X ⊂ X , yi ∈ Y ⊂ Y, and and trustability. |X| ≡ |Y | ≡ N . There, each xi represents an instance of the In this paper we focus on the specific problem of building a input data for which the expected output value yi is known narrative explanation of ML techniques—thus positioning our or has already been estimated. Such sorts of ML problems contribution into the specific Narrative Generation DARPA are said to be “supervised” because the expected targets category [7]. In particular, we first show a general framework Y are available, whereas they are said to be “regression” where symbolic and sub-symbolic techniques are fruitfully problems if Y consists of continuous or numerable values, or combined to produce intelligent behaviour in AI applications. “classification” problems if Y consists of categorical values. Then, we focus on the translation of ML predictors into logical The learning algorithm usually assumes p∗ ∈ P, for a given knowledge with the aim to (i) infer new knowledge, (ii) reason family P of predictors—meaning that the unknown prediction and act accordingly, and (iii) build the narrative explanation function exists, and it is from P. The algorithm then trains a of a decision output (or prediction). predictor p̂ ∈ P such that the value of a given loss function To this end, we propose an automatic procedure aimed at λ : Y × Y → R – computing the discrepancy among predicted translating a ML predictor – here in particular we consider the and expected nP outputs – is minimal o or reasonably low—i.e.: N case of decision trees (DT) – into logical knowledge. We argue p̂ = argmin i=1 λ(y i , p(x i )) . p∈P that, when the source DT has been trained over a set of real Depending on the predictor family P of choice, the nature data in order to produce a predictor, the corresponding logic of the learning algorithm and the admissible shapes of p̂ may program may be employed to produce a narrative explanation vary dramatically, as well as the their interpretability. Even if for any given prediction. the interpretability of predictor families is not a well-defined Despite being mostly focused on DT, our proposal represent feature, most authors agree on the fact that some predictor a first step towards a more general approach. In fact, DT have families are more interpretable than others [13]—in the sense been proposed as a general means for explaining the behaviour that it is easier for humans to understand the functioning and of virtually any black-box model [9], [10]. the predictions of the former ones. For instance, it is widely Accordingly, the reminder of this paper is organised as acknowledged that generalized linear models (GLM) are more follows. Section II briefly recalls the ML concepts and termi- interpretable than neural networks (NN), whereas decision nology used in the paper as well as the main research efforts in trees (DT) [14] are among the most interpretable families the field. Then Section III introduces our vision of a framework [8]. DT can be considered more interpretable due to their for the integration of symbolic and sub-symbolic techniques. construction: that is, recursively partitioning the input space Finally, Section IV discusses early experiments alongside the X through a number of splits or decisions based on the input prototype implementation. data X, in such a way that the prediction in each partition 106 Workshop "From Objects to Agents" (WOA 2019) is constant, and the loss w.r.t. Y is low, while keeping the In spite of the many approaches proposed to explain black amount of partitions low as well. Without affecting generality, boxes, some important scientific questions still remain unan- we focus on the case of mono-dimensional classification – swered. One of the most important open problems is that, thus we write y instead of y –, since other cases can be easily until now, there is no agreement on what an explanation is. reduced to this one. We further assume the input space X is Indeed, some approaches adopt as explanation a set of rules, N -dimensional, and let nj be the meta-variable representing others a decision tree, others rely on visualisation techniques the name of the j th dimension of X . [8]. Moreover, recent works highlight the importance for an Under such hypotheses, a DT predictor pT ∈ Pdt assumes explanation to guarantee some properties, e.g., soundness, a binary tree T exists such that each node is either completeness, and compactness [8]. • a leaf, carrying and representing a prediction, i.e. and This is why our proposal aims at integrating sub-symbolic assignment for y, approaches with symbolic ones. To this end, DT can be • an internal node, carrying and representing a decision, i.e. exploited as an effective bridge between the symbolic and a formula in the form (nj ≤ c)—where c is a constant sub-symbolic realms. In fact, DT can be easily (i) built from threshold chosen by the learning algorithm. an existing sub-symbolic predictor, and (ii) translated into symbolic knowledge – as it is shown in the reminder of this Each node ν inherits a partition Xν ⊆ X of the original input paper – thanks to their rule-based nature. data, from its parent. Since the root node ν0 has no parent, it Decision trees are an interpretable family of predictors that is assigned to the whole set of input data—i.e. Xν0 ≡ X. The have been proposed as a global means for explaining other, decision carried by each internal node splits its Xν into two less interpretable, sorts of black-box predictors [9], [10]— disjoint parts – XνL and XνR – along the j th dimension of X . such as neural networks [19]. The main idea behind such an In particular, XνL contains all the residual xi ∈ Xν such that approach is to build a DT approximating the behaviour of a (xji ≤ cν ) – which are inherited by ν left child –, whereas XνR given predictor, possibly, by only considering its inputs and its contains all the residual xi ∈ Xν such that xji > cν —which outputs. Such approximation essentially trades off predictive are inherited by by ν right child. A leaf node l is created performance with interpretability. In fact, the structure of such whenever a sequence of splits (i.e., a path from the tree root a DT would then be used to provide useful insights concerning to the leaf parent) leads to a partition Xl which is (almost) the original predictor inner functioning. pure—roughly, meaning that Xl (mostly) contains input data Describing the particular means for extracting DT from xi for which the expected output is the same yl . In this case, black-boxes is outside the scope of this paper. Given the vast we say that the prediction carried by l is yl . Assuming such a literature on the topic – e.g., consider reading [8], [20] for tree T exists, in order to classify some input data x ∈ X , the an overview or [19], [21], [22] for a practical examples – we predictor pT simply navigates the path P = (ν0 , ν1 , ν2 , . . . , l) simply assume an extracted DT is available and it has an high of T such that all decisions νk are matched by x, then it fidelity—meaning that the loss in terms of predictive perfor- outputs yl . mance is low, w.r.t. the original black-box. In fact, whereas there exist several works focussing on how to synthesise DT B. XAI: The need for explanation and interpretable models out of black-box predictors, no attention is paid to merging Since the adoption of interpretable predictors usually comes them with symbolic approaches, which can play a key role in at cost of a lower potential in terms of predictive performance, enhancing the interpretability and explainability of the system. explanations are the newly preferred way for providing under- In this paper we focus on such a matter. standable predictions without necessarily sacrificing accuracy. We believe that a logical representation of DT may be The idea, and the main goal of XAI is to create intelligible interesting and enabling for further research directions. For and understandable explanations for uninterpretable predictors instance, as far as explainability is concerned, we show how without replacing or modifying them. Thus explanations are logic-translated DT can be used to both navigate the knowl- built through a number of heterogeneous techniques, broadly edge stored within the corresponding predictors – thus acting referred to as explanators [8]—just to cite some, decision rules as global explanators –, and produce narrative explanations [15], feature importance [16], saliency masks [17], sensitivity for their predictions—thus acting as local explanators. Note analysis [18], etc. that the restriction on the DT representation makes it easy to The state of the art for explainability currently recognises map DT onto logical clauses, since DT are finite and with a two main sorts of explanators, namely, either local or global. limited expressivity (if / else conditions). While local explanators attempt to provide an explanation for each particular prediction of a given predictor p, the global III. V ISION ones attempt to provide an explanation for the predictor p as Many approaches to ML nowadays are increasingly fo- a whole. In other words, local explanators provide an answer cussing on sub-symbolic approaches – such as deep learning to the question “why does p predict y for the input x?” – with neural networks [23] – and on how to make them such as the LIME technique presented in [6] –, whereas global work on the large scale. As promising as this may look – explanators provide an answer to the question “how does p with the premise of potentially minimizing the engineering build its predictions?”—such as decision rules. efforts needed – it is increasingly acknowledged that those 107 Workshop "From Objects to Agents" (WOA 2019) Fig. 1. ML to LP and back: framework architecture. approaches do not cope well with the socio-technical nature of global system, rules of general validity and concerning the systems they are exploited in, which often demand a degree the most likely situation; of interpretability, observability, explainability, accountability, • at the micro scale, we modulate such decision by con- and trustability they just cannot deliver. sidering all the contingencies arising during the precise To this end, since logic-based approaches already have situation – such as, for instance, a last minute inconve- a well-understood role in building intelligent (multi-agent) nient, etc. As a consequence, we adapt the original plan systems [24], declarative, logic-based approaches have the to the local perceptions we gather while enacting it. potential to represent an alternative way of delivering sym- In order to better illustrate the above remarks, one may bolic intelligence, complementary to the one pursued by consider as a concrete example the case of a disease diagnosis sub-symbolic approaches. In fact, declarative and logic-based in a hospital, where the notions of micro and macro scale w.r.t. technologies much better address the aforementioned socio- to the nature of algorithms and techniques can be declined as technical issues, in particular when exploiting their inferential follows: capabilities—e.g., [25]. • at the macro level, the main concerns regard a mid/long The potential of logic-based models and their extensions is term horizon and focus the issue of analysis of high- first of all related to their declarativeness as well as to explicit dimensional and multimodal biomedical data train algo- knowledge representation, enabling knowledge sharing at the rithms to recognize cancerous tissue at a level comparable most adequate level of abstraction, while supporting modu- to trained physicians—there including, for instance, rep- larity and separation of concerns [26]—which are especially resentation and recognition of patterns and sequences in valuable in open and dynamic distributed systems. As a further the input data. With such a sort of goals to pursue, it element, LP sound and complete semantics straightforwardly is not surprising that most IT tools supporting decision enables intelligent agents to reason and infer new information making are based on sub-symbolic approaches such as in a sound and complete way. deep learning, Bayesian networks, machine vision, latent Another relevant point is that LP has been already proven to Dirichlet analysis, and in general any kind of statistical work well both as a knowledge representation language and as approach to ML [29], [30], [31] an inference platform for rational agents [27], [28]. The latter • at the micro level, the main concerns regard instead the usually may interact with an external environment by means short term horizon, and mostly focus on the specific of a suitably defined observe–think–act cycle. problem of the patient, there including a few highly- Accordingly to this vision, here we propose an integrated intertwined sub-problems—e.g. specific symptom or sit- framework of hybrid reasoning – where symbolic and sub- uation, ongoing epidemic in that hospital or place that symbolic techniques fruitfully combine to produce intelligent carries the same symptoms. Although sub-symbolic ap- behaviour. proaches can still be used, symbolic ones such as fuzzy Indeed, looking in depth at pervasive socio-technical sys- logic, specialized level (white box) learning instead of tems, it turns out that agents (either human or software) higher-level learning, symbolic time series are most com- effortlessly undertake a complex decision making process in mon [29], [32], [33] almost all situations, which seamlessly integrates perceptions Generally speaking, we believe the computational intelligence (and actions) at two different scales—the macro and the micro: accounts for this two kind of rules: general rules whose • at the macro scale, by considering the knowledge of the validity is essentially unconstrained (speed limits, right of way, 108 Workshop "From Objects to Agents" (WOA 2019) etc.) which represent the commonsense knowledge necessary TABLE I to inhabit the environment and specific rules, with a validity ACUTE INFLAMMATIONS DATA SET ATTRIBUTES bound in space and time (school hours and days, open-air Attribute Short name Values market hours and days, unpredictable events such as incoming Temperature of patient temp 35◦ C ÷ 42◦ C emergency vehicles the need to gather at an evacuation assem- Occurrence of nausea nausea {yes, no} bly point), which represent the contextual or expert knowledge Lumbar pain lumbar {yes, no} Urine pushing urine {yes, no} necessary to deal with transient, unforeseen, and unpredictable Micturition pains micturition {yes, no} situations. Burning of urethra urethra {yes, no} That is why in the framework envisioned here we plan to combine sub-symbolic techniques with symbolic ones (LP in Output attributes particular): sub-symbolic techniques are exploited for training Inflammation of inflammation {yes, no} urinary bladder the system and learn new rules (commonsense knowledge), Nephritis of rules are translated into logical knowledge (contextual / expert nephritis {yes, no} renal pelvis origin knowledge), and the two approaches interact and interleave to share knowledge and learn from each other in a coherent Alternative output framework. {healty, inflammation Diagnosis diagnosis nephritis, both} The framework architecture, depicted in Fig. 1, shows the embodiment of the vision discussed above: sensor data and dataset are translated into the logic knowledge base. In partic- TABLE II ACUTE INFLAMMATIONS DATA SET DESCRIPTION ular the Machine Learning Interface allows for the interaction of different kinds of ML algorithms with the framework: a Dataset size 120 standard interface is proposed in order to combine the specific Num. of input attributes 6 features of each algorithm in a coherent manner. ML to Prolog Num. of output attributes 2 Num. of output classes 4 is the core of the translation into logical knowledge, while the Num. of healthy patients 30 (25%) Prolog to ML returns insights of the logical KB to the ML Num. of patients with 59 (49.17%) predictor—for instance, new inferred rules, or rules learned inflammation of urinary bladder by a specific situation. The blocks on the left (Knowledge Num. of patients with 50 (41.67%) nephritis of renal pelvis origin Base, Demonstration) reflect the standard architecture of a Num. of patients with 19 (15.83%) Prolog engine. Overall, the framework looks general enough both diseases to account for the variety of ML techniques and algorithms, and also to ensure the consistency between symbolic and sub-symbolic approaches. Finally, the block Prolog to ML acute inflammations of urinary bladder and acute nephritises. currently expresses our vision, and is obviously subject of Input parameters collect all the patient symptoms, each in- future research. stance represents a potential patient. The data was created by a medical expert as a data set to test the expert system, which IV. E ARLY E XPERIMENTS performs the presumptive diagnosis of two diseases of urinary The first prototype we design and implement enables the system. The dataset considered is summarised in TABLE I and construction of a narrative explanation of the prediction gener- TABLE II. ated exploiting the ML technique, thus achieving interpretabil- Starting from the general form Head ← Body for a logical ity and making a step towards explainability. clause, a predicate in the Head is generated for the decision With respect to Fig. 1, we experiment the predictor trans- of the predictor—in the example, the diagnosis predicate. lation into logical rules, provided by the ML to Prolog. The Inside the predicate, a term for each input/output attribute is experimental results refer to the case in which the predictor instantiated with the value of the decision tree (leaf). corresponds to a decision tree or to the corresponding crisp In our example, the following predicate is generated: ✞ rules [34]. The conversion generates a Prolog predicate for diagnosis(temperatureOfPatient(T), occurrenceOfNausea(N), each decision taken by the predictor: inside the predicate, a lumbarPain(L), urinePushing(U), micturitionPains(M), burningOfUrethra(BU), nephritisOfRenalPelvisOrigin( term for each input/output attribute is instantiated with the Decision), confidence(C)) :- Body. values of the leaf of the decision tree. A rule is generated ✡ ✝ ✆ for each leaf in the tree: between the other advantages, this where the Body body consists of check and computation on allows for a very compact representation, easy to handle and the variables of the Head terms. For instance, considering the interoperate with. above tree of Fig. 2, the first generated rule is For a concrete example, let us consider the “Acute in- ✞ diagnosis(temperatureOfPatient(T), occurrenceOfNausea(N), flammations data set”1 [35] supplying data to perform the lumbarPain(L), urinePushing(U), micturitionPains(M), presumptive diagnosis of two diseases of urinary system: the burningOfUrethra(BU), nephritis(no), confidence(1.00)) :- T =< 37.95. ✝ ✡ ✆ 1 http://archive.ics.uci.edu/ml/datasets/acute+inflammations 109 Workshop "From Objects to Agents" (WOA 2019) representing the fact that if the temperature of patient is lesser inferring hidden knowledge in the rules. It is worth noticing or equal of 37.9, it is unlikely the patient presents nephritis that similar results (emphasising the relations between decision of renal pelvis; the answer contains a degree of confidence output) can be obtained manipulating the dataset a priori— based on the case of the dataset that confirm the rule—in the i.e. before the ML algorithm training (a common operation case 1.00 stands that all the patients in the dataset that have but not always applicable). The manipulation of the above a temperature lower that 37.9 do not present the disease. dataset, for instance, can build a unique decision output To improve readability, the rule above could be written as Result that combines the two different diseases and their ✞ diagnosis(temperatureOfPatient(T), _, _, _, _, _, nephritis symptoms. In such a case the dataset is enriched with the (no), confidence(1.00)) :- T =< 37.95. Result attribute containing the complete diagnosis, i.e., it can ✝ ✡ ✆ assume the values Healthy, Inflammation, Nephritis, Both. The by omitting the undefined variables, i.e., highlighting the input corresponding decision tree and LP knowledge is depicted in attribute that are effectively to be considered as influencer. Fig. 3. Fig. 2 (left) depicts the whole picture: the decision trees d) Interpretable narrative explanation: LP makes it pos- generated as output of the example dataset when we run sible to generate a narration for each answer of the predictor. the basic classification tree algorithm2 and the corresponding The inference Prolog tree becomes inspectable, tracking the translation into LP rules. With respect to Fig. 1, the decision path for obtaining the answer. For instance, w.r.t. the KB of trees are the output of the Machine Learning Interface block Fig. 3 – including all diseases –, the diagnosis in the case of and become the input for the ML to Prolog block. the following symptoms: Fig. 2 represents experiments of running the ML algorithm ✞ with no manipulation of the dataset: so, since the ML algo- diagnosis( temperatureOfPatient(36.5), occurrenceOfNausea(yes), rithm allows only one decision output to be considered for lumbarPain(yes), urinePushing(no), producing the corresponding decision tree, the information and micturitionPains(yes), burningOfUrethra(yes), _, _). ✝ ✆ the related knowledge is fragmented into two different trees – ✡ the first obtained running the algorithm with decision output would produce the corresponding narration: nephritis and the second with decision output inflammation of ✞ urinary bladder. By running the ML to Prolog block of Fig. 1 The diagnosis is healthy, with a full confidence because the patient has no fever. we translate the two DT in LP rules as depicted in Fig. 2 (right). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% In particular the solution has been built across the a) Interpretabilty: The LP program provides an inter- following path: pretable explanation of virtually any predictor. At a glance, Solution: result(healthy) with confidence(1.00). the user can identify which attributes are meaningful and con- For the proof, the following clauses are considered: sidered for response and which are not. In case of nephritis, the [1] diagnosis(temperatureOfPatient(T), _, _, urinePushing( no), _, _, result(healthy), confidence(1.00)) :- T =< 3 only significant input attributes are the temperature of patient 7.95. and the presence or absence of lumbar pain. The same is for [2] X =< Y that is verified if ’ expression_less_or_equal_than’(X, Y) inflammation of urinary bladder, where the only discriminative attributes are presence of urine pushing, micturition pains and In the query the temperature T is of 36.5. because of rule [1] 36.5 =< 36.9 has to be verified lumbar pain. and because of [2] ’expression_less_or_equal_than’(36.5, b) Interoperability: The adoption of a standard AI lan- 36.9) has to be verified so rules [1] and [2] are verified. guage (LP), in spite of the plethora of different specific ML %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% toolkits, paves the way towards an interoperable explanation ✡ ✝ ✆ where LP is exploited as sort of lingua franca that goes beyond Despite its simplicity, the narration allows for a reconstruc- the technical implementation of each ML framework. tion of the decision track, showing the path to the decision. c) Relations between outputs: As emphasised by Fig. 2, With a large amount of nested rules this could result very relations between outputs are lost, and possible links between effective. the diseases are not clearly highlighted having two different e) Exploitation of LP extension / abduction on the KB: decision trees. Instead, once obtained a LP representation, it Moreover, we believe that exploiting abduction techniques we is easy to run simple queries on it in order to get much more could pave the way to hypothetical reasoning with incomplete information with respect to the two different decision tree. knowledge, i.e., learning new possible hypotheses that can For instance, we can learn that in case of fever (temperature be assumed to hold, provided that they are consistent with of patient > 37.95) not presenting nephritis (i.e. no lumbar the given knowledge base. The idea, to be explored in future pain detected), the only case in which inflammation of uri- research, is to provide the most likely solution given a set of nary bladder is present is when urine pushing is detected in evidence. The conclusion would leave a degree of uncertainty absence of symptoms of micturition pains. With the logical while highlighting a plausible answer based on the collected representation, relations between output can be recovered by information. In the healthcare field, for instance, it could be 2 We exploit two different implementations: C45 [36] weka J48 for the Java represented by having the collection of symptoms (although translator and SciKit-Learn CART [14] for the Phyton one incomplete) and finding the most likely disease for them. 110 Workshop "From Objects to Agents" (WOA 2019) ✞ Output Decision: Nephritis of renal pelvis origin {yes, no} ✝ ✡ ✆ ✞ diagnosis(temperatureOfPatient(T), _, _, _, _, _, nephritis(no), confidence(1.00)) :- T =< 37.95. diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _, _, _, nephritis(yes), confidence(1.00)) :- T > 37.95. diagnosis(temperatureOfPatient(T), _, lumbarPain(no), _, _, _, nephritis(no), confidence(1.00)) :- T > 37.95. ✝ ✡ ✆ ✞ Output Decision: Inflammation of urinary bladder {yes, no} ✝ ✡ ✆ ✞ diagnosis(_, _, _, urinePushing(no), _, _, inflammation(no), confidence(1.00)). diagnosis(_, _, lumbarPain(yes), urinePushing(yes), micturitionPains(no), _, inflammation(no), confidence(1.00)). diagnosis(_, _, lumbarPain(no), urinePushing(yes), micturitionPains (no), _, inflammation(yes), confidence(1.00)). diagnosis(_, _, _, urinePushing(yes), micturitionPains(yes), _, inflammation, confidence(1.00). ✝ ✡ ✆ Fig. 2. Experimental results obtained running the framework on the Acute Inflammations dataset [35]: on the lef t side are represented the decision trees generated by the supervised ML algorithm (Weka J48 – SciKit-Learn CART), while on the right the corresponding LP rules output of the ML to Prolog block. In order to deal with two different overlapped outputs, two DT are generated: information are not connected as the knowledge. ✞ Output Decision: Result {Healthy, Inflammation, Nephritis, Both} ✝ ✡ ✆ ✞ diagnosis(temperatureOfPatient(T), _, _, urinePushing(no), _, _, result(healthy), confidence(1.00)) :- T =< 37.95. diagnosis(temperatureOfPatient(T), _, _, urinePushing(yes), _ , _, result(inflammation), confidence(1.00)) :- T =< 37.95. diagnosis(temperatureOfPatient(T), _, lumbarPain(no), _, _, _, result(healthy), confidence(1.00)) :- T > 37.95. diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _, micturitionPains(no), _, result(nephritis), confidence(1.00)) :- T > 37.95. diagnosis(temperatureOfPatient(T), _, lumbarPain(yes), _, micturitionPains(yes), _, result(both), confidence(0.66)) :- T > 37.95. ✝ ✡ ✆ Fig. 3. Decision Tree (left) and corresponding “ML to Prolog core” output (right) after the previous manipulation of the dataset. In particular the two different output decisions (nephritis and inflammation of urinary bladder) have been combined in order to generate a comprehensive output decision: the new diagnosis consider that case of a healthy patient (none of the previous diseases), the case in which only one of the two diseases is present (inflammation or nephritis), and finally the case in which are both present. V. C ONCLUSION tive policing. Nevertheless, concerns about the intentional and unintentional negative consequences of AI systems are AI systems nowadays synthesise large amounts of data, legitimate, as well as ethical and legal concerns, mostly related learning from experience and making predictions with the to darkness and opaqueness of AI decision algorithm. For that goal of taking autonomous decisions—applications range from reason, recent work on interpretability in machine learning and clinical decision support to autonomous driving and predic- 111 Workshop "From Objects to Agents" (WOA 2019) AI has focused on simplified models that approximate the true [17] R. Fong and A. Vedaldi, “Interpretable explanations of black boxes by criteria used to make decisions. meaningful perturbation,” CoRR, vol. abs/1704.03296, 2017. In this paper we focus on building a narrative explanation [18] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” CoRR, vol. abs/1703.01365, 2017. of the machine learning techniques: we first translate a ML [19] M. W. Craven and J. W. Shavlik, “Extracting tree-structured represen- predictor into logical knowledge, then inspect the proof tree tations of trained networks,” in 8th International Conference on Neural leading to a solution. The narration is built tracking the path Information Processing Systems (NIPS’95). MIT Press, 1995, pp. 24– (i.e., the rules) that leads from the query to the answer. 30. Along this line, we foresee a broader vision that involves [20] R. Andrews, J. Diederich, and A. B. Tickle, “Survey and critique of techniques for extracting rules from trained artificial neural networks,” the design of a consistent framework where symbolic and Knowledge-Based Systems, vol. 8, no. 6, pp. 373–389, Dec. 1995. sub-symbolic techniques are fruitfully combined to produce [21] U. Johansson and l. Niklasson, “Evolving decision trees using oracle intelligent behaviour in AI applications while exploiting the guides,” in 2009 IEEE Symposium on Computational Intelligence and benefits of each approach—like, in the case of symbolic ones, Data Mining, Mar. 2009, pp. 238–244. interpretability, observability, explainability, and accountabil- [22] N. Frosst and G. E. Hinton, “Distilling a neural network into a soft ity. decision tree,” in CEX 2017 Comprehensibility and Explanation in AI and ML 2017 (CEX 2017), ser. CEUR Workshop Proceedings, vol. 2071, The results presented here represent just a preliminary Nov. 2017. exploration of the potential benefits of merging symbolic and [23] D. Silver et al., “Mastering the game of Go with deep neural networks sub-symbolic approaches—where, of course, many critical and tree search,” Nature, vol. 529, pp. 484–489, Jan. 2016. issues are still unexplored and will be subject of future work. [24] A. Omicini and F. Zambonelli, “MAS as complex systems: A view on However, despite its simplicity, the case study already allows the role of declarative approaches,” in Declarative Agent Languages and Technologies, ser. Lecture Notes in Computer Science. Springer, May us to point out the feasibility and the potential benefits of the 2004, vol. 2990, pp. 1–17. exploitation of symbolic techniques towards XAI. [25] F. Idelberger, G. Governatori, R. Riveret, and G. Sartor, “Evaluation of logic-based smart contracts for blockchain systems,” in Rule Technolo- R EFERENCES gies. Research, Tools, and Applications, ser. Lecture Notes in Computer [1] D. Helbing, “Societal, economic, ethical and legal challenges of the Science, vol. 9718. Springer, 2016, pp. 167–183. digital revolution: From big data to deep learning, artificial intelligence, [26] M. Oliya and H. K. Pung, “Towards incremental reasoning for context and manipulative technologies,” in Towards Digital Enlightenment. aware systems,” in Advances in Computing and Communications, ser. Springer, 2019, pp. 47–72. Communications in Computer and Information Science. Springer, 2011, [2] A. Elliott, The Culture of AI: Everyday Life and the Digital Revolution. vol. 190, pp. 232–241. Routledge, 2019. [27] G. Sotnik, “The SOSIEL platform: Knowledge-based, cognitive, and [3] S. Bird, K. Kenthapadi, E. Kiciman, and M. Mitchell, “Fairness- multi-agent,” Biologically Inspired Cognitive Architectures, vol. 26, pp. aware machine learning: Practical challenges and lessons learned,” in 103–117, Oct. 2018. 12th ACM International Conference on Web Search and Data Mining (WSDM’19). ACM, 2019, pp. 834–835. [28] R. Kowalski and F. Sadri, “From logic programming towards multi-agent [4] M. Fourcade and K. Healy, “Categories all the way down,” Historical systems,” Annals of Mathematics and Artificial Intelligence, vol. 25, Social Research/Historische Sozialforschung, pp. 286–296, 2017. no. 3, pp. 391–419, Nov. 1999. [5] K. Crawford, “Artificial intelligence’s white guy problem,” The New [29] M. D. Pandya, P. D. Shah, and S. Jardosh, “Medical image diagnosis York Times, vol. 25, 2016. for disease detection: A deep learning approach,” in U-Healthcare [6] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Monitoring Systems, ser. Advances in Ubiquitous Sensing Applications Explaining the predictions of any classifier,” CoRR, vol. abs/1602.04938, for Healthcare. Academic Press, 2019, vol. 1: Design and Applications, 2016. ch. 3, pp. 37–60. [7] D. Gunning, “Explainable artificial intelligence (XAI),” DARPA, [30] S. Kuwayama, Y. Ayatsuka, D. Yanagisono, T. Uta, H. Usui, A. Kato, Funding Program DARPA-BAA-16-53, 2016. [Online]. Available: N. Takase, Y. Ogura, and T. Yasukawa, “Automated detection of macular http://www.darpa.mil/program/explainable-artificial-intelligence diseases by optical coherence tomography and artificial intelligence [8] R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, machine learning of optical coherence tomography images,” Journal of “A survey of methods for explaining black box models,” CoRR, vol. Ophthalmology, vol. 2019, p. 7, 2019. abs/1802.01933, 2018. [9] F. Di Castro and E. Bertini, “Surrogate decision tree visualization,” in [31] P. Sajda, “Machine learning for detection and diagnosis of disease,” Joint Proceedings of the ACM IUI 2019 Workshops (ACMIUI-WS 2019), Annual Review of Biomedical Engineering, vol. 8, pp. 537–565, Aug. ser. CEUR Workshop Proceedings, vol. 2327, Mar. 2019. 2006. [10] O. Bastani, C. Kim, and H. Bastani, “Interpreting blackbox models via [32] C. Zhang, Y. Chen, A. Yin, and X. Wang, “Anomaly detection in model extraction,” CoRR, vol. abs/1705.08504, 2017. ECG based on trend symbolic aggregate approximation,” Mathematical [11] B. Twala, “Multiple classifier application to credit risk assessment,” Biosciences and Engineering, vol. 16, no. 4, pp. 2154–2167, 2019. Expert Systems with Applications, vol. 37, no. 4, pp. 3326–3336, 2010. [12] S. Kotsiantis, “Supervised machine learning: A review of classification [33] A. Rastogi, R. Arora, and S. Sharma, “Leaf disease detection and grading techniques,” in Emerging Artificial Intelligence Applications in Com- using computer vision technology & fuzzy logic,” in 2nd International puter Engineering, ser. Frontiers in Artificial Intelligence and Applica- Conference on Signal Processing and Integrated Networks (SPIN 2015). tions. IOS Press, Oct. 2007, vol. 160, pp. 3–24. IEEE, 2015, pp. 500–505. [13] Z. C. Lipton, “The mythos of model interpretability,” CoRR, vol. [34] A. Lozowski, T. J. Cholewo, and J. M. Zurada, “Crisp rule extraction abs/1606.03490, 2016. from perceptron network classifiers,” in IEEE International Conference [14] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification on Neural Networks (ICNN 1996), vol. Plenary, Panel and Special and Regression Trees. Chapman & Hall/CRC, 1984. Sessions, Jun. 1996, pp. 94–99. [16] G. Tolomei, F. Silvestri, A. Haines, and M. Lalmas, “Interpretable [35] J. Czerniak and H. Zarzycki, “Application of rough sets in the presump- predictions of tree-based ensembles via actionable feature tweaking,” in tive diagnosis of urinary system diseases,” in Artificial Intelligence and 23rd ACM SIGKDD International Conference on Knowledge Discovery Security in Computing Systems, ser. The Springer International Series and Data Mining. ACM, 2017, pp. 465–474. [Online]. Available: in Engineering and Computer Science. Springer, 2003, vol. 752, pp. http://dl.acm.org/citation.cfm?id=3098039 41–51. [15] M. G. Augasta and T. Kathirvalavakumar, “Reverse engineering the neural networks for rule extraction in classification problems,” Neural [36] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, Processing Letters, vol. 35, no. 2, pp. 131–150, Apr. 2012. CA, USA: Morgan Kaufmann Publishers Inc., 1993. 112