INTRODUCTION

Ontology-based explanation of classifiers

0 Federico Croce Gianluca Cima Maurizio Lenzerini Tiziana Catarci Sapienza - University of Rome

The rise of data mining and machine learning use in many applications has brought new challenges related to classification. Here, we deal with the following challenge: how to interpret and understand the reason behind a classifier's prediction. Indeed, understanding the behaviour of a classifier is widely recognized as a very important task for wide and safe adoption of machine learning and data mining technologies, especially in high-risk domains, and in dealing with bias. We present a preliminary work on a proposal of using the Ontology-Based Data Management paradigm for explaining the behavior of a classifier in terms of the concepts and the relations that are meaningful in the domain that is relevant for the classifier.

INTRODUCTION

One of the problems in processing information ethically is the perpetuation and amplification of unfair biases existing in training data and in the outcome of classifiers.

It is well known that many learning algorithms (data analytics, data mining, machine learning, ML) base their predictions on training data and improve them with the growth of such data. In a typical project, the creation and curation of training data sets is largely a human-based activity and involve several people: domain experts, data scientists, machine learning experts, etc. In other words, data-related human design decisions afect learning outcomes throughout the entire process pipeline, even if at a certain point these decisions seem to disappear in the black-box “magic” approach of ML algorithms. On the other hand, it is now gaining attention the fact that humans typically sufer from conscious and unconscious biases and current historical data used in training set very often incorporate such biases, so perpetuating and amplifying existing inequalities and unfair choices. While researchers of diferent areas (from philosophy to computer science passing through social sciences and law) have begun a rich discourse on this problem, concrete solutions on how to address it by discovering and eliminating unintended unfair biases are still missing. A critical aspect in assessing and addressing bias is represented by the lack of transparency, accountability and human-interpretability of the ML algorithms that make overly dificult to fully understand the expected outcomes. A famous example is the COMPAS algorithm used by the Department of Corrections in Wisconsin, New York and Florida that has led to harsher sentencing toward African Americans [ 1 ].

In this paper we address the problem of providing explanations for supervised classification. Supervised learning is the task of learning a function that maps an input to an output based on input-output pairs provided as examples. When applied to classiifcation, the ultimate goal of supervised learning is to construct algorithms that are able to predict the target output (i.e., the class) of the proposed inputs. To achieve this, the learning algorithm is provided with some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, so as to be able to classify instances that have not been shown during training.

The rise of machine learning use in many applications has brought new challenges related to classification. Here, we deal with the following challenge: how to interpret and understand the reason behind a classifier’s prediction. Indeed, understanding the behaviour of a classifier is recognized as a very important task for wide and safe adoption of machine learning and data mining technologies, especially in high-risk domains, and, as we discussed above, in dealing with bias.

In this paper we present a preliminary work on this subject, based on the use of semantic technologies. In particular, we assume that the classification task is performed in an organization that adopts an Ontology-Based Data Management (OBDM) approach [ 15, 16 ]. OBDM is a paradigm for accessing data using a conceptual representation of the domain of interest expressed as an ontology. The OBDM paradigm relies on a three-level architecture, consisting of the data layer, the ontology, and the mapping between the two.

• The ontology is a declarative and explicit representation of the domain of interest for the organization, formulated in a Description Logic (DL) [ 2, 7 ], so as to take advantage of various reasoning capabilities in accessing data. • The data layer is constituted by the existing data sources that are relevant for the organization. • The mapping is a set of declarative assertions specifying how the sources in the data layer relate to the ontology.

Consequently, an OBDM specification is a triple J = ⟨O, S, M⟩ which, together with an S-database , form a so-called OBDM system Σ = ⟨J , ⟩. Given such a system Σ, suppose that is the result of a classification task carried out by any actor, e.g., a human or a machine, and that the objects involved in the classification task are represented as tuples in the S-database , which we assume relational.

In particular, in this work we consider a binary classifier, and therefore we regard as a partial function : dom() → {+1, −1}, where ≥ 1 is an integer. We denote by + (resp., −) the set of tuples that have been classified positively (resp., negatively), i.e., + = {® ∈ dom() | (®) = +1} (resp., − = {® ∈ dom() | (®) = −1}).

We observe that another view of the partial function is that of a training set. In this case, + represents the tuples tagged positively during the classifier training, while − represents the tuples tagged negatively.

Intuitively, our goal is to derive an expression over O that semantically describes the partial function in the best way w.r.t. Σ. In other words, the main task in our framework is searching for a “good” definition of using the concepts and the roles of the ontology. Without loss of generality, we consider such an expression to be a query over O, and we formalize the notion of “semantically describing” by requiring that the certain answers to w.r.t. Σ include all the tuples in + (or, as many tuples in + as possible), and none of the tuples in − (or, as few tuples in − as possible).

Following the terminology of some recent papers, the goal of our framework can be generally described as the reverse engineering task of finding a describing query, from a set of examples in a database. The roots of this task can be found in the Query By Example (QBE) approach for classical relational databases [ 3, 4, 18, 19 ]. In a nutshell, such an approach allows a user to explore the database by providing a set of positive and negative examples to the system, implicitly referring to the query whose answers are all the positive examples and none of the negatives. This idea has also been studied by the Description Logics (DLs) community, with a particular attention to the line of research of the so-called concept learning. In particular, the work in [ 13 ] has an interesting characterization of the complexity of learning an ontology concept, formulated in expressive DLs, from positive and negative examples. We also mention the concept learning tools in [ 5, 12, 17 ], that include several learning algorithms and support an extensive range of DLs, even expressive ones such as ALC and ALCQ. Finally, we consider the work in [ 14 ] to be related to our work. The authors study the problem of deriving (unions of) conjunctive queries, with ontologies formulated in Horn-ALCI, deriving algorithms and tight complexity bounds.

Our work is focused on the Ontology-Based Data Management (OBDM) paradigm [ 6, 11 ]. Having the layer for linking the data to the ontology is a non trivial extension of the problem, that has important consequences, as we will show in a following section of this paper. The goal of this paper is to present a general framework for explaining a classifier by means of an ontology, that can be adapted to several diferent contexts. For this reason, an important aspect of our framework, is the possibility of defining a number of criteria one wants the output query to be optimized on. This flexibility, makes it possible to derive completely diferent solutions, depending on the specific criteria in use. Specifically, given an OBDM system and a set of positive and negative examples, the goal of the framework could be to find a query over the ontology whose answers include all the positive examples and none of the negatives. However, we consider reasonable for some applications that one may want to relax this requirement, and allow the framework to find a query whose answers are as similar as possible to the positive examples, includes only a small fraction of the negatives, and enjoys additional predefined criteria. 2

PRELIMINARIES

Given a schema S, an S-database is a finite set of atoms (®), where is an -ary predicate symbol of S, and ® = (1, . . . , ) is an -tuple of constants.

As mentioned earlier, we distinguish between the specicfiation of an OBDM system, and the OBDM system itself (cf. Figure 1). An OBDM specification J determines the intensional level of the system, and is expressed as a triple ⟨O, S, M⟩, where O is an ontology, S is the schema of the data source, and M is the mapping between S and O. Specifically, M consists of a set of mapping assertions, each one relating a query over the source schema to a query over the ontology. An OBDM system Σ = ⟨J , ⟩ is obtained by adding to J an extensional level, which is given in terms of an S-database , which represents the data at the source, and is structured according to the schema S.

The formal semantics of ⟨J , ⟩ is specified by the set Mod ( J ) of its models, which is the set of (logical) interpretations I for O such that I is a model of O, i.e., it satisfies all axioms in O, and ⟨, I⟩ satisfies all the assertions in M. The satisfaction of a mapping assertion depends on its form, which is meant to represent semantic assumptions about the completeness of the source data with respect to the intended ontology models. Specifically, sound (resp., complete, exact) mappings capture sources containing a subset (resp., a superset, exactly the set) of the expected data.

In OBDM, the main service to be provided by the system is query answering. The user poses queries by referring only to the ontology, and is therefore masked from the implementation details and the idiosyncrasies of the data source. The fact that the semantics of ⟨J , ⟩ is defined in terms of a set of models makes the task of query answering involved. Indeed, query answering cannot be simply based on evaluating the query expression over a single interpretation, like in traditional databases. Rather, it amounts to compute the so-called certain answers, i.e., the tuples that satisfy the query in all interpretations in Mod ( J ), and has therefore the characteristic of a logical inference task. More formally, given a OBDM specification J = ⟨O, S, M⟩, a query O over O, and an S-database , we define the certain answers of O w.r.t. J and , denoted by certO, J , as the set of tuples ® of S-constants such that ® ∈ , for every ∈ Mod ( J ). Obviously, the computation of certaOin answers must take into account the semantics of the ontology, the knowledge expressed in the mapping, and the content of the data source. Designing eficient query processing algorithms is one of the main challenges of OBDM. Indeed, an OBDM framework is characterized by three formalisms: (1) the language used to express the ontology; (2) the language used for queries; (3) the language used to specify the mapping. and the choices made for each of the three formalisms afect semantic and computational properties of the system.

The axioms of the ontology allow one to enrich the information coming from the source with domain knowledge, and hence to infer additional answers to queries. The language used for the ontology deeply afects the computational characteristics of query answering. For this reason, instead of expressing the ontology in first-order logic (FOL), one adopts tailored languages, typically based on Description Logics (DLs), which ensure decidability and possibly eficiency of reasoning.

Also, the use of FOL (i.e., SQL) as a query language, immediately leads to undecidability of query answering, even when the ontology consists only of an alphabet (i.e., it is a flat schema), and when the mapping is of the simplest possible form, i.e., it specifies a one-to-one correspondence between ontology elements and database tables. The language typically adopted is Union of Conjunctive Queries (UCQs), i.e., FOL queries expressed as a union of select-project-join SQL queries.

With respect to mapping specification, the incompleteness of the source data is captured correctly by mappings that are sound. Moreover, allowing to mix sound mapping assertions with complete or exact ones leads to undecidability of query answering, even when only CQs are used in queries and mapping assertions, and the ontology is simply a flat schema. As a consequence, all proposals for OBDM frameworks so far, including the one in this paper, assume that mappings are sound. In addition, the concern above on the use of FOL applies also for the ontology queries in the mapping. Note instead, that the source queries in the mapping are directly evaluated over the source database, and hence are typically allowed to be arbitrary (eficiently) computable queries. 3

THE FRAMEWORK

As we said in the introduction, we consider the result of a binary classification task or the characterization of a training set for a classifier as a partial function : dom() → {+1, −1}, where ≥ 1 is an integer. We remind the reader that we denote by + (resp., −) the set of tuples that have been classified positively (resp., negatively), i.e., + = {® ∈ dom() | (®) = +1} (resp., − = {® ∈ dom() | (®) = −1}).

Before formally defining when a query over O semantically describes , we introduce some preliminary notions.

Definition 3.1. Let W be a set of atoms. We say that an atom is reachable from W if there exists an atom ∈ W such that there is a constant ∈ dom() that appears in both and . □

We now define which are the relevant atoms of an S-database w.r.t. a tuple ® ∈ dom() . To be as general as possible, we introduce a parametric notion of border of radius , where the parameter is a natural number whose intended meaning is to indicate how far one is interested in going for identifying an atom as relevant.

Definition 3.2. Let be an S-database, and let ® be a tuple in dom() . Consider the following definition: • W®,0 () = { ∈ | has a constant appearing in ®} • W®,+1 () = { ∈ | is reachable from W®, } Then, for a natural number , the border of radius of ® in , denoted by B®, (), is:

B®, () =

W®, (). Ø 0≤ ≤ □ We illustrate the notion of border of radius with an example.

Example 3.3. Let the source database be = {R(a,b), S(a,c), Z(c,d), W(d,e), W(e,h), R(f,g)}, and let ® = ⟨a⟩. We have that: • W®,0 () = { (, ), (, )} • W®,1 () = { (, )} • W®,2 () = { (, )} Finally, the border of radius 2 of ® in is B®,2 () { (, ), (, ), (, ), (, )}. = □

With the above notion at hand, we now define when a query O over the ontology O matches (w.r.t. an OBDM specification J ) a border B®, () for a radius , a tuple ®, and a source database .

Definition 3.4.

A query O J -matches a border B®, () of B®, () . □ radius of a tuple ® in a source database , if ® ∈ certO, J

The next proposition establishes how FOL queries behave when the radius of a border B®, () increments.

Proposition 3.5. Let J = ⟨O, S, M⟩ be an OBDM specification, B®, () be a border of radius of a tuple ® in an S-database , and O be a FOL query over O. If O J -matches B®, (), then O J -matches B®, +1 (). tions: () certO, J

Proof. The proof is based on the following two observa⊆ certO′ , J , for any OBDM specification J = ⟨O, S, M⟩, FOL query O , and pair of S-databases , ′ such that ⊆ ′. (ii) B®, () ⊆ B®, +1 (), for any ≥ 0 and tuple ® of a database . □

Similarly to what described in [ 3, 13, 14 ], one may be interested in finding a query O over O expressed in a certain language L O that perfectly separates the set of tuples in + from the set of tuples in −, that is, a query O ∈ L O such that, for a given a radius , the following two conditions hold: (1) for all ® ∈ +, O J -matches B®, (), (2) for all ® ∈ −, O does not J -match B®, ().

However, the following example shows that, even in very simple cases, such query is not guaranteed to exists.

Example 3.6. Consider the following database :

+ −

STUD A10 B80 C12

D50 E25 Moreover, let O = {studies ⊑ likes}, and M be:

ENR(x, y, z) ⇝ studies(x,y) ENR(x, y, z) ⇝ taughtIn(y,z) LOC(x, y) ⇝ locatedIn(x,y)

Let L O be the class of conjunctive queries (CQ). It is possible to show that there is no CQ-query over the ontology that perfectly separates the set of tuples in + from the set of tuples in −. Nonetheless, observe that there are several CQ-queries that reasonably describe . For example: 1 ( ) ← studies(x,y) ∧ taughtIn(y,z) ∧ locatedIn(z, ‘Rome’) 2 ( ) ← studies(x, ‘Math’) 3 ( ) ← likes(x, ‘Science’)

It is easy to verify that:

• 1 Σ-matches B®,1 (), for all ® ∈ {A10, B80, D50} • 2 Σ-matches B®,1 (), for all ® ∈ {A10, B80, E25} • 3 Σ-matches B®,1 (), for all ® ∈ {C12, D50} Looking at the above queries, one could ask which query is the best. The answer to this question, however, is not trivial, since 2 Σ-matches 24 of B®,1 () for ® in +, and all B®,1 () for ® in −, whilst 1 Σ-matches 34 of B®,1 () for ® in +, and no B®,1 () for ® in −. Besides, 3 Σ-matches 24 of B®,1 () for ® in +, and no B®,1 () for ® in −. Finally, 2 and 3 have less atoms than 1. □

The above example suggests that searching for a query aiming at semantically describing with the only constraint of satisfying conditions (1) and (2) may turn out to be unsatisfactory. For this reason, we propose a diferent approach by complicating the framework, so as to be potentially appealing in many diferent contexts.

In general, one is interested in a query O over O expressed in a certain language L O that accomplishes in the best way a set Δ of criteria. We formalize the idea by introducing a set of functions F , one for each criteria ∈ Δ, and a mathematical expression Z having a variable for each criteria ∈ Δ.

Specifically, for a certain criteria ∈ Δ, the value of the function J,, ( O ) represents how much the query O meets criteria for w.r.t. the OBDM system Σ = ⟨J , ⟩ and the considered radius . Without loss of generality, we can obviously consider all such functions to have the same range of values as their codomain. Then, after instantiating each variable in Z with the corresponding value J,, ( O ), the total value of the obtained expression, denoted by ZF ( O ), represents the Z-score of the query O under F .

Among the various possible queries in a certain query language L O , it is reasonable to look for the ones that give us the highest possible score. This naturally led to the following main definition of our framework:

B®, ()?” 1 = “Are there many tuples ® ∈ + such that O J -matches

B®, ()?” 2 = “Are there few tuples ® ∈ + such that O does not J match B®, ()?” 3 = “Are there many tuples ® ∈ − such that O does not

J -match B®, ()?” 4 = “Are there few tuples ® ∈ − such that O J -matches

Furthermore, depending on the query language L O considered, there may be many other meaningful criteria. For instance, when L O = , one may be interested in 5 = “Are there few atoms used by the query O ?”, and when L O = one may be further interested in 6 = “Are there few disjuncts used by the query O ?”.

We conclude this section by applying such newly introduced framework to Example 3.6.

Example 3.8. We refer to J , , , and the queries 1, 2, 3 as in Example 3.6. Suppose one is interested in the set of criteria Δ = {1, 4, 5}, with the following associated set of functions F : • 1 ( O ) = | {® ∈ + | O Σ-matches B®, () } |

|+ | | {® ∈ − | O Σ-matches B®, () } | • 4 ( O ) = 1 − |− |

1 • 5 ( O ) = |atoms appearing in O | Now, consider the expression Z = 1 × 4 ×5 , i.e. the aver + + age of the evaluations of each function of F , weighted over three parameters , , and . One can verify that the following queries best describe w.r.t. J , , Δ, F , and Z, for each instantiation of Z: (1) ( = = = 1) → 3 (2) ( = 3, = 1, = 1) → 1 In fact, let Z1 be the instantiation of the parameters of the expression Z corresponding to (1), then Z1 (1) = 0.693, Z1 (2) = 0.333, Z1 (3) = 0.833. Similarly, let Z2 be the instantiation of the parameters of the expression Z corresponding to (2), then Z2 (1) = 0.716, Z2 (2) = 0.5, Z2 (3) = 0.7. □ 4

CONCLUSIONS

We have presented a framework for using the Ontology-Based Data Management paradigm in order to provide an explanation of the behavior of a classifier. Our short term goal in this research is to provide techniques for deriving useful explanations in terms of queries over the ontology. Interestingly, the work in [ 8, 9 ] provides a ground basis for the reverse engineering process described in this paper, from the data sources to the ontology. Moreover, the work in [ 10 ] ofers an interesting set of techniques for explaining query answers in the context of an OBDM. Our future work will also include an evaluation of both the framework and the techniques presented in this paper to real world settings.

ACKNOWLEDGEMENTS

This work has been partially supported by Sapienza under the PRIN 2017 project “HOPE” (prot. 2017MMJJRE), and by European Research Council under the European Union’s Horizon 2020 Programme through the ERC Advanced Grant WhiteMech (No. 834228).

[1] Angwin , J. , Larson , J. , Mattu , S. , and Kirchner , L. Machine bias . retrieved from https://www.propublica.org/article/machine-bias -risk-assessments-incriminal- sentencing , May 23 , 2016 .

[2] Baader , F. , Calvanese , D. , McGuinness , D. , Nardi , D. , and Patel-Schneider , P. F., Eds . The Description Logic Handbook: Theory, Implementation and Applications , 2nd ed. Cambridge University Press, 2007 .

[3] Barceló , P. , and Romero , M. The Complexity of Reverse Engineering Problems for Conjunctive Queries . Proceedings of the 16th International Semantic Web Conference , 2017 ( 2017 ), 17 pages.

[4] Bonifati , A. , Ciucanu , R. , and Staworko , S. Learning join queries from user examples . ACM Trans. Database Syst . 40 , 4 (Jan. 2016 ), 24 : 1 - 24 : 38 .

[5] Bühmann , L. , Lehmann , J. , Westphal , P. , and Bin , S. Dl-learner structured machine learning on semantic web data . In Companion Proceedings of the The Web Conference 2018 ( Republic and Canton of Geneva , Switzerland, 2018 ), WWW '18, International World Wide Web Conferences Steering Committee, pp. 467 - 471 .

[6] Calvanese , D. , De Giacomo , G. , Lembo , D. , Lenzerini , M. , Poggi , A. , and Rosati , R. Linking data to ontologies: The description logic DL-Lite . In Proceedings of the Second International Workshop on OWL: Experiences and Directions (OWLED 2006 ) ( 2006 ), vol. 216 of CEUR Electronic Workshop Proceedings, http://ceur-ws.org/.

[7] Calvanese , D. , De Giacomo , G. , Lembo , D. , Lenzerini , M. , Poggi , A. , and Rosati , R. Ontology-based database access . In Proceedings of the Fifteenth Italian Conference on Database Systems (SEBD 2007 ) ( 2007 ), pp. 324 - 331 .

[8] Cima , G. Preliminary results on ontology-based open data publishing . In Proceedings of the Thirtieth International Workshop on Description Logics (DL 2017 ) ( 2017 ), vol. 1879 of CEUR Electronic Workshop Proceedings , http://ceur-ws.org/.

[9] Cima , G. , Lenzerini , M. , and Poggi , A. Semantic characterization of data services through ontologies . In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019 ) ( 2019 ), pp. 1647 - 1653 .

[10] Croce , F. , and Lenzerini , M. A framework for explaining query answers in dl-lite . In Proceedings of the Twenty-First International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018 ) ( 2018 ).

[11] Daraio , C. , Lenzerini , M. , Leporelli , C. , Naggar , P. , Bonaccorsi , A. , and Bartolucci , A. The advantages of an ontology-based data management approach: openness, interoperability and data quality . Scientometrics 108 (03 2016 ).

[12] Fanizzi , N. , Rizzo , G., d'Amato , C. , and Esposito , F. Dlfoil: Class expression learning revisited . In Proceedings of the Twenty-First International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018 ) ( 2018 ).

[13] Funk , M. , Jung , J. C. , Lutz , C. , Pulcini , H. , and Wolter , F. Learning description logic concepts: When can positive and negative examples be separated ? In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence , IJCAI- 19 (7 2019 ), International Joint Conferences on Artificial Intelligence Organization , pp. 1682 - 1688 .

[14] Gutiérrez-Basulto , V. , Jung , J. C. , and Sabellek , L. Reverse engineering queries in ontology-enriched systems: The case of expressive horn description logic ontologies . In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence , IJCAI- 18 (7 2018 ), International Joint Conferences on Artificial Intelligence Organization , pp. 1847 - 1853 .

[15] Lenzerini , M. Ontology-based data management . In Proceedings of the Twentieth International Conference on Information and Knowledge Management (CIKM 2011 ) ( 2011 ), pp. 5 - 6 .

[16] Lenzerini , M. Managing data through the lens of an ontology . AI Magazine 39 , 2 ( 2018 ), 65 - 74 .

[17] Straccia , U. , and Mucci , M. pfoil-dl: Learning (fuzzy) el concept descriptions from crisp owl data using a probabilistic ensemble estimation . In Proceedings of the 30th Annual ACM Symposium on Applied Computing (New York, NY, USA, 2015 ), SAC '15 , ACM, pp. 345 - 352 .

[18] Tran , Q. T. , Chan , C.-Y., and Parthasarathy , S. Query reverse engineering . The VLDB Journal 23 , 5 (Oct. 2014 ), 721 - 746 .

[19] Zloof , M. M. Query-by-example: The invocation and definition of tables and forms . In Proceedings of the 1st International Conference on Very Large Data Bases (New York, NY, USA, 1975 ), VLDB '75 , ACM, pp. 1 - 24 .