Ontology-based explanation of classifiers Federico Croce Gianluca Cima Maurizio Lenzerini Tiziana Catarci @diag.uniroma1.it Sapienza — University of Rome ABSTRACT input-output pairs provided as examples. When applied to classi- The rise of data mining and machine learning use in many ap- fication, the ultimate goal of supervised learning is to construct plications has brought new challenges related to classification. algorithms that are able to predict the target output (i.e., the class) Here, we deal with the following challenge: how to interpret and of the proposed inputs. To achieve this, the learning algorithm understand the reason behind a classifier’s prediction. Indeed, is provided with some training examples that demonstrate the understanding the behaviour of a classifier is widely recognized intended relation of input and output values. Then the learner is as a very important task for wide and safe adoption of machine supposed to approximate the correct output, so as to be able to learning and data mining technologies, especially in high-risk classify instances that have not been shown during training. domains, and in dealing with bias. We present a preliminary work The rise of machine learning use in many applications has on a proposal of using the Ontology-Based Data Management brought new challenges related to classification. Here, we deal paradigm for explaining the behavior of a classifier in terms of with the following challenge: how to interpret and understand the concepts and the relations that are meaningful in the domain the reason behind a classifier’s prediction. Indeed, understanding that is relevant for the classifier. the behaviour of a classifier is recognized as a very important task for wide and safe adoption of machine learning and data mining technologies, especially in high-risk domains, and, as we 1 INTRODUCTION discussed above, in dealing with bias. One of the problems in processing information ethically is the In this paper we present a preliminary work on this subject, perpetuation and amplification of unfair biases existing in train- based on the use of semantic technologies. In particular, we as- ing data and in the outcome of classifiers. sume that the classification task is performed in an organization It is well known that many learning algorithms (data analytics, that adopts an Ontology-Based Data Management (OBDM) ap- data mining, machine learning, ML) base their predictions on proach [15, 16]. OBDM is a paradigm for accessing data using a training data and improve them with the growth of such data. conceptual representation of the domain of interest expressed as In a typical project, the creation and curation of training data an ontology. The OBDM paradigm relies on a three-level architec- sets is largely a human-based activity and involve several people: ture, consisting of the data layer, the ontology, and the mapping domain experts, data scientists, machine learning experts, etc. In between the two. other words, data-related human design decisions affect learning • The ontology is a declarative and explicit representation outcomes throughout the entire process pipeline, even if at a of the domain of interest for the organization, formulated certain point these decisions seem to disappear in the black-box in a Description Logic (DL) [2, 7], so as to take advantage “magic” approach of ML algorithms. On the other hand, it is now of various reasoning capabilities in accessing data. gaining attention the fact that humans typically suffer from con- • The data layer is constituted by the existing data sources scious and unconscious biases and current historical data used in that are relevant for the organization. training set very often incorporate such biases, so perpetuating • The mapping is a set of declarative assertions specifying and amplifying existing inequalities and unfair choices. While how the sources in the data layer relate to the ontology. researchers of different areas (from philosophy to computer sci- ence passing through social sciences and law) have begun a rich Consequently, an OBDM specification is a triple J = ⟨O, S, M⟩ discourse on this problem, concrete solutions on how to address which, together with an S-database 𝐷, form a so-called OBDM it by discovering and eliminating unintended unfair biases are system Σ = ⟨J, 𝐷⟩. Given such a system Σ, suppose that 𝜆 is still missing. A critical aspect in assessing and addressing bias the result of a classification task carried out by any actor, e.g., a is represented by the lack of transparency, accountability and human or a machine, and that the objects involved in the classifi- human-interpretability of the ML algorithms that make overly cation task are represented as tuples in the S-database 𝐷, which difficult to fully understand the expected outcomes. A famous we assume relational. example is the COMPAS algorithm used by the Department of In particular, in this work we consider a binary classifier, and Corrections in Wisconsin, New York and Florida that has led to therefore we regard 𝜆 as a partial function 𝜆 : dom(𝐷)𝑛 → harsher sentencing toward African Americans [1]. {+1, −1}, where 𝑛 ≥ 1 is an integer. We denote by 𝜆 + (resp., In this paper we address the problem of providing explana- 𝜆 − ) the set of tuples that have been classified positively (resp., tions for supervised classification. Supervised learning is the task negatively), i.e., 𝜆 + = {𝑡® ∈ dom(𝐷)𝑛 | 𝜆(𝑡®) = +1} (resp., 𝜆 − = of learning a function that maps an input to an output based on {𝑡® ∈ dom(𝐷)𝑛 | 𝜆(𝑡®) = −1}). We observe that another view of the partial function 𝜆 is that © 2020 Copyright held by the owner/author(s). Published in Workshop Proceedings of a training set. In this case, 𝜆 + represents the tuples tagged of the EDBT/ICDT 2020 Joint Conference, March 30-April 2, 2020 on CEUR-WS.org positively during the classifier training, while 𝜆 − represents the Distribution of this paper is permitted under the terms of the Creative Commons license CC BY 4.0. tuples tagged negatively. Intuitively, our goal is to derive an expression over O that semantically describes the partial function 𝜆 in the best way w.r.t. Σ. In other words, the main task in our framework is searching for a “good” definition of 𝜆 using the concepts and the roles of the ontology. Without loss of generality, we consider such an expression to be a query 𝑞 over O, and we formalize the notion of “semantically describing” 𝜆 by requiring that the certain answers to 𝑞 w.r.t. Σ include all the tuples in 𝜆 + (or, as many tuples in 𝜆 + as possible), and none of the tuples in 𝜆 − (or, as few tuples in 𝜆 − as possible). Following the terminology of some recent papers, the goal of our framework can be generally described as the reverse en- gineering task of finding a describing query, from a set of ex- amples in a database. The roots of this task can be found in the Query By Example (QBE) approach for classical relational databases [3, 4, 18, 19]. In a nutshell, such an approach allows a user to explore the database by providing a set of positive and Figure 1: OBDM Specification and System negative examples to the system, implicitly referring to the query whose answers are all the positive examples and none of the negatives. This idea has also been studied by the Description an ontology, S is the schema of the data source, and M is the Logics (DLs) community, with a particular attention to the line mapping between S and O. Specifically, M consists of a set of of research of the so-called concept learning. In particular, the mapping assertions, each one relating a query over the source work in [13] has an interesting characterization of the complex- schema to a query over the ontology. An OBDM system Σ = ity of learning an ontology concept, formulated in expressive ⟨J, 𝐷⟩ is obtained by adding to J an extensional level, which is DLs, from positive and negative examples. We also mention the given in terms of an S-database 𝐷, which represents the data at concept learning tools in [5, 12, 17], that include several learning the source, and is structured according to the schema S. algorithms and support an extensive range of DLs, even expres- The formal semantics of ⟨J, 𝐷⟩ is specified by the set Mod 𝐷 (J ) sive ones such as ALC and ALCQ. Finally, we consider the of its models, which is the set of (logical) interpretations I for O work in [14] to be related to our work. The authors study the such that I is a model of O, i.e., it satisfies all axioms in O, and problem of deriving (unions of) conjunctive queries, with ontolo- ⟨𝐷, I⟩ satisfies all the assertions in M. The satisfaction of a map- gies formulated in Horn-ALCI, deriving algorithms and tight ping assertion depends on its form, which is meant to represent complexity bounds. semantic assumptions about the completeness of the source data Our work is focused on the Ontology-Based Data Manage- with respect to the intended ontology models. Specifically, sound ment (OBDM) paradigm [6, 11]. Having the layer for linking the (resp., complete, exact) mappings capture sources containing a data to the ontology is a non trivial extension of the problem, subset (resp., a superset, exactly the set) of the expected data. that has important consequences, as we will show in a follow- In OBDM, the main service to be provided by the system is ing section of this paper. The goal of this paper is to present a query answering. The user poses queries by referring only to general framework for explaining a classifier by means of an the ontology, and is therefore masked from the implementation ontology, that can be adapted to several different contexts. For details and the idiosyncrasies of the data source. The fact that the this reason, an important aspect of our framework, is the pos- semantics of ⟨J, 𝐷⟩ is defined in terms of a set of models makes sibility of defining a number of criteria one wants the output the task of query answering involved. Indeed, query answering query to be optimized on. This flexibility, makes it possible to cannot be simply based on evaluating the query expression over derive completely different solutions, depending on the specific a single interpretation, like in traditional databases. Rather, it criteria in use. Specifically, given an OBDM system and a set of amounts to compute the so-called certain answers, i.e., the tuples positive and negative examples, the goal of the framework could that satisfy the query in all interpretations in Mod 𝐷 (J ), and be to find a query over the ontology whose answers include all has therefore the characteristic of a logical inference task. More the positive examples and none of the negatives. However, we formally, given a OBDM specification J = ⟨O, S, M⟩, a query consider reasonable for some applications that one may want 𝑞 O over O, and an S-database 𝐷, we define the certain answers to relax this requirement, and allow the framework to find a of 𝑞 O w.r.t. J and 𝐷, denoted by cert 𝐷 , as the set of tuples 𝑡® 𝑞O , J query whose answers are as similar as possible to the positive of S-constants such that 𝑡® ∈ 𝑞 𝐵O , for every 𝐵 ∈ Mod 𝐷 (J ). Obvi- examples, includes only a small fraction of the negatives, and enjoys additional predefined criteria. ously, the computation of certain answers must take into account the semantics of the ontology, the knowledge expressed in the mapping, and the content of the data source. Designing efficient 2 PRELIMINARIES query processing algorithms is one of the main challenges of Given a schema S, an S-database 𝐷 is a finite set of atoms 𝑠 (® 𝑐 ), OBDM. Indeed, an OBDM framework is characterized by three where 𝑠 is an 𝑛-ary predicate symbol of S, and 𝑐® = (𝑐 1, . . . , 𝑐𝑛 ) is formalisms: an 𝑛-tuple of constants. (1) the language used to express the ontology; As mentioned earlier, we distinguish between the specification (2) the language used for queries; of an OBDM system, and the OBDM system itself (cf. Figure 1). (3) the language used to specify the mapping. An OBDM specification J determines the intensional level of and the choices made for each of the three formalisms affect the system, and is expressed as a triple ⟨O, S, M⟩, where O is semantic and computational properties of the system. The axioms of the ontology allow one to enrich the informa- Example 3.3. Let the source database be 𝐷 = {R(a,b), S(a,c), tion coming from the source with domain knowledge, and hence Z(c,d), W(d,e), W(e,h), R(f,g)}, and let 𝑡® = ⟨a⟩. We have that: to infer additional answers to queries. The language used for • W𝑡®,0 (𝐷) = {𝑅(𝑎, 𝑏), 𝑆 (𝑎, 𝑐)} the ontology deeply affects the computational characteristics • W𝑡®,1 (𝐷) = {𝑍 (𝑐, 𝑑)} of query answering. For this reason, instead of expressing the • W𝑡®,2 (𝐷) = {𝑊 (𝑑, 𝑒)} ontology in first-order logic (FOL), one adopts tailored languages, typically based on Description Logics (DLs), which ensure decid- Finally, the border of radius 2 of 𝑡® in 𝐷 is B𝑡®,2 (𝐷) = ability and possibly efficiency of reasoning. {𝑅(𝑎, 𝑏), 𝑆 (𝑎, 𝑐), 𝑍 (𝑐, 𝑑),𝑊 (𝑑, 𝑒)}. □ Also, the use of FOL (i.e., SQL) as a query language, immedi- ately leads to undecidability of query answering, even when the With the above notion at hand, we now define when a query ontology consists only of an alphabet (i.e., it is a flat schema), and 𝑞 O over the ontology O matches (w.r.t. an OBDM specification when the mapping is of the simplest possible form, i.e., it spec- J ) a border B𝑡®,𝑟 (𝐷) for a radius 𝑟 , a tuple 𝑡®, and a source database ifies a one-to-one correspondence between ontology elements 𝐷. and database tables. The language typically adopted is Union Definition 3.4. A query 𝑞 O J -matches a border B𝑡®,𝑟 (𝐷) of of Conjunctive Queries (UCQs), i.e., FOL queries expressed as a B® (𝐷) union of select-project-join SQL queries. radius 𝑟 of a tuple 𝑡® in a source database 𝐷, if 𝑡® ∈ cert𝑞 𝑡 ,𝑟, J . □ O With respect to mapping specification, the incompleteness of the source data is captured correctly by mappings that are sound. The next proposition establishes how FOL queries behave Moreover, allowing to mix sound mapping assertions with com- when the radius 𝑟 of a border B𝑡®,𝑟 (𝐷) increments. plete or exact ones leads to undecidability of query answering, even when only CQs are used in queries and mapping assertions, Proposition 3.5. Let J = ⟨O, S, M⟩ be an OBDM specifica- and the ontology is simply a flat schema. As a consequence, all tion, B𝑡®,𝑟 (𝐷) be a border of radius 𝑟 of a tuple 𝑡® in an S-database proposals for OBDM frameworks so far, including the one in this 𝐷, and 𝑞 O be a FOL query over O. If 𝑞 O J -matches B𝑡®,𝑟 (𝐷), then paper, assume that mappings are sound. In addition, the concern 𝑞 O J -matches B𝑡®,𝑟 +1 (𝐷). above on the use of FOL applies also for the ontology queries in the mapping. Note instead, that the source queries in the mapping Proof. The proof is based on the following two observa- ′ are directly evaluated over the source database, and hence are tions: (𝑖) cert𝑞𝐷 , J ⊆ cert𝑞𝐷 , J , for any OBDM specification O O typically allowed to be arbitrary (efficiently) computable queries. J = ⟨O, S, M⟩, FOL query 𝑞 O , and pair of S-databases 𝐷, 𝐷 ′ such that 𝐷 ⊆ 𝐷 ′ . (ii) B𝑡®,𝑟 (𝐷) ⊆ B𝑡®,𝑟 +1 (𝐷), for any 𝑟 ≥ 0 and 3 THE FRAMEWORK tuple 𝑡® of a database 𝐷. □ As we said in the introduction, we consider the result of a binary classification task or the characterization of a training set for a Similarly to what described in [3, 13, 14], one may be interested classifier as a partial function 𝜆 : dom(𝐷)𝑛 → {+1, −1}, where in finding a query 𝑞 O over O expressed in a certain language 𝑛 ≥ 1 is an integer. We remind the reader that we denote by 𝜆 + L O that perfectly separates the set of tuples in 𝜆 + from the set (resp., 𝜆 − ) the set of tuples that have been classified positively of tuples in 𝜆 − , that is, a query 𝑞 O ∈ L O such that, for a given a (resp., negatively), i.e., 𝜆 + = {𝑡® ∈ dom(𝐷)𝑛 | 𝜆(𝑡®) = +1} (resp., radius 𝑟 , the following two conditions hold: 𝜆 − = {𝑡® ∈ dom(𝐷)𝑛 | 𝜆(𝑡®) = −1}). Before formally defining when a query over O semantically (1) for all 𝑡® ∈ 𝜆 + , 𝑞 O J -matches B𝑡®,𝑟 (𝐷), describes 𝜆, we introduce some preliminary notions. (2) for all 𝑡® ∈ 𝜆 − , 𝑞 O does not J -match B𝑡®,𝑟 (𝐷). Definition 3.1. Let W be a set of atoms. We say that an atom However, the following example shows that, even in very 𝛼 is reachable from W if there exists an atom 𝛽 ∈ W such that simple cases, such query is not guaranteed to exists. there is a constant 𝑐 ∈ dom(𝐷) that appears in both 𝛼 and 𝛽. □ Example 3.6. Consider the following database 𝐷: We now define which are the relevant atoms of an S-database 𝐷 w.r.t. a tuple 𝑡® ∈ dom(𝐷)𝑛 . To be as general as possible, we introduce a parametric notion of border of radius 𝑟 , where the STUD 𝜆 parameter 𝑟 is a natural number whose intended meaning is to A10 +1 LOC indicate how far one is interested in going for identifying an B80 +1 𝜆+ Sap Rome atom as relevant. C12 +1 TV Rome D50 +1 Pol Milan Definition 3.2. Let 𝐷 be an S-database, and let 𝑡® be a tuple in dom(𝐷)𝑛 . Consider the following definition: 𝜆− E25 -1 • W𝑡®,0 (𝐷) = {𝛼 ∈ 𝐷 | 𝛼 has a constant 𝑐 appearing in 𝑡®} • W𝑡®,𝑗+1 (𝐷) = {𝛼 ∈ 𝐷 | 𝛼 is reachable from W𝑡®,𝑗 } Then, for a natural number 𝑟 , the border of radius 𝑟 of 𝑡® in 𝐷, denoted by B𝑡®,𝑟 (𝐷), is: ENR Ø A10 Math TV B𝑡®,𝑟 (𝐷) = W𝑡®,𝑖 (𝐷). B80 Math Sap 0≤𝑖 ≤𝑟 C12 Science Norm □ D50 Science TV We illustrate the notion of border of radius with an example. E25 Math Pol The corresponding borders of radius 1, for each tuple are: Definition 3.7. A query 𝑞 O L O -best describes 𝜆 w.r.t. an OBDM system Σ = ⟨J, 𝐷⟩, a radius 𝑟 , a set of criteria Δ, a set of functions BA10,1 (𝐷) = {STUD(A10), ENR(A10, Math, TV), LOC(TV, Rome)} F , and an expression Z, if 𝑞 O ∈ L O and there exists no query BB80,1 (𝐷) = {STUD(B80), ENR(B80, Math, Sap), LOC(Sap, Rome)} 𝑞 ′O ∈ L O such that Z F (𝑞 ′O ) > Z F (𝑞 O ). □ BC12,1 (𝐷) = {STUD(C12), ENR(C12, Science, Norm)} BD50,1 (𝐷) = {STUD(D50), ENR(D50, Science, TV), LOC(TV, Rome)} As for the set of criteria to be considered, here we just list some interesting ones: BE25,1 (𝐷) = {STUD(E25), ENR(E25, Math, Pol), LOC(Pol, Milan)} 𝛿 1 = “Are there many tuples 𝑡® ∈ 𝜆 + such that 𝑞 O J -matches Moreover, let O = {studies ⊑ likes}, and M be: B𝑡®,𝑟 (𝐷)?” ENR(x, y, z) ⇝ studies(x,y) 𝛿 2 = “Are there few tuples 𝑡® ∈ 𝜆 + such that 𝑞 O does not J - ENR(x, y, z) ⇝ taughtIn(y,z) match B𝑡®,𝑟 (𝐷)?” LOC(x, y) ⇝ locatedIn(x,y) 𝛿 3 = “Are there many tuples 𝑡® ∈ 𝜆 − such that 𝑞 O does not J -match B𝑡®,𝑟 (𝐷)?” 𝛿 4 = “Are there few tuples 𝑡® ∈ 𝜆 − such that 𝑞 O J -matches Let L O be the class of conjunctive queries (CQ). It is possible B𝑡®,𝑟 (𝐷)?” to show that there is no CQ-query over the ontology that per- fectly separates the set of tuples in 𝜆 + from the set of tuples in Furthermore, depending on the query language L O consid- 𝜆 − . Nonetheless, observe that there are several CQ-queries that ered, there may be many other meaningful criteria. For instance, reasonably describe 𝜆. For example: when L O = 𝐶𝑄, one may be interested in 𝛿 5 = “Are there few atoms used by the query 𝑞 O ?”, and when L O = 𝑈𝐶𝑄 one may 𝑞 1 (𝑥) ← studies(x,y) ∧ taughtIn(y,z) ∧ locatedIn(z, ‘Rome’) be further interested in 𝛿 6 = “Are there few disjuncts used by the 𝑞 2 (𝑥) ← studies(x, ‘Math’) query 𝑞 O ?”. We conclude this section by applying such newly introduced 𝑞 3 (𝑥) ← likes(x, ‘Science’) framework to Example 3.6. It is easy to verify that: • 𝑞 1 Σ-matches B𝑡®,1 (𝐷), for all 𝑡® ∈ {A10, B80, D50} Example 3.8. We refer to J , 𝑟 , 𝜆, and the queries 𝑞 1, 𝑞 2, 𝑞 3 as in Example 3.6. Suppose one is interested in the set of criteria • 𝑞 2 Σ-matches B𝑡®,1 (𝐷), for all 𝑡® ∈ {A10, B80, E25} Δ = {𝛿 1, 𝛿 4, 𝛿 5 }, with the following associated set of functions • 𝑞 3 Σ-matches B𝑡®,1 (𝐷), for all 𝑡® ∈ {C12, D50} F: Looking at the above queries, one could ask which query is the | {𝑡® ∈ 𝜆 + | 𝑞 O Σ-matches B𝑡®,𝑟 (𝐷) } | best. The answer to this question, however, is not trivial, since 𝑞 2 • 𝑓𝛿 1 (𝑞 O ) = |𝜆 + | Σ-matches 24 of B𝑡®,1 (𝐷) for 𝑡® in 𝜆 + , and all B𝑡®,1 (𝐷) for 𝑡® in 𝜆 − , | {𝑡® ∈ 𝜆 − | 𝑞 O Σ-matches B𝑡®,𝑟 (𝐷) } | • 𝑓𝛿 4 (𝑞 O ) = 1 − |𝜆 − | whilst 𝑞 1 Σ-matches 34 of B𝑡®,1 (𝐷) for 𝑡® in 𝜆 + , and no B𝑡®,1 (𝐷) for 1 • 𝑓𝛿 5 (𝑞 O ) = |atoms appearing in 𝑞 | 𝑡® in 𝜆 − . Besides, 𝑞 3 Σ-matches 24 of B𝑡®,1 (𝐷) for 𝑡® in 𝜆 + , and no O B𝑡®,1 (𝐷) for 𝑡® in 𝜆 − . Finally, 𝑞 2 and 𝑞 3 have less atoms than 𝑞 1 . Now, consider the expression Z = 1 𝛼𝑧𝛿 × 𝛽𝑧𝛿 × 𝛾𝑧𝛿 4 5 , i.e. the aver- 𝛼 + 𝛽 +𝛾 □ age of the evaluations of each function of F , weighted over three The above example suggests that searching for a query aiming parameters 𝛼, 𝛽, and 𝛾. One can verify that the following queries at semantically describing 𝜆 with the only constraint of satisfying best describe 𝜆 w.r.t. J , 𝑟 , Δ, F , and Z, for each instantiation of conditions (1) and (2) may turn out to be unsatisfactory. For this Z: reason, we propose a different approach by complicating the (1) (𝛼 = 𝛽 = 𝛾 = 1) → 𝑞 3 framework, so as to be potentially appealing in many different (2) (𝛼 = 3, 𝛽 = 1, 𝛾 = 1) → 𝑞 1 contexts. In general, one is interested in a query 𝑞 O over O expressed In fact, let Z1 be the instantiation of the parameters of the ex- in a certain language L O that accomplishes in the best way a pression Z corresponding to (1), then Z1 (𝑞 1 ) = 0.693, Z1 (𝑞 2 ) = set Δ of criteria. We formalize the idea by introducing a set of 0.333, Z1 (𝑞 3 ) = 0.833. Similarly, let Z2 be the instantiation of functions F , one for each criteria 𝛿 ∈ Δ, and a mathematical the parameters of the expression Z corresponding to (2), then expression Z having a variable 𝑧𝛿 for each criteria 𝛿 ∈ Δ. Z2 (𝑞 1 ) = 0.716, Z2 (𝑞 2 ) = 0.5, Z2 (𝑞 3 ) = 0.7. □ Specifically, for a certain criteria 𝛿 ∈ Δ, the value of the func- J,𝑟 tion 𝑓𝛿,𝜆 (𝑞 O ) represents how much the query 𝑞 O meets criteria 4 CONCLUSIONS 𝛿 for 𝜆 w.r.t. the OBDM system Σ = ⟨J, 𝐷⟩ and the considered We have presented a framework for using the Ontology-Based radius 𝑟 . Without loss of generality, we can obviously consider Data Management paradigm in order to provide an explanation all such functions to have the same range of values as their of the behavior of a classifier. Our short term goal in this research codomain. Then, after instantiating each variable 𝑧𝛿 in Z with is to provide techniques for deriving useful explanations in terms J,𝑟 the corresponding value 𝑓𝛿,𝜆 (𝑞 O ), the total value of the obtained of queries over the ontology. Interestingly, the work in [8, 9] pro- expression, denoted by Z F (𝑞 O ), represents the Z-score of the vides a ground basis for the reverse engineering process described query 𝑞 O under F . in this paper, from the data sources to the ontology. Moreover, Among the various possible queries in a certain query lan- the work in [10] offers an interesting set of techniques for ex- guage L O , it is reasonable to look for the ones that give us the plaining query answers in the context of an OBDM. Our future highest possible score. This naturally led to the following main work will also include an evaluation of both the framework and definition of our framework: the techniques presented in this paper to real world settings. 5 ACKNOWLEDGEMENTS This work has been partially supported by Sapienza under the PRIN 2017 project “HOPE” (prot. 2017MMJJRE), and by European Research Council under the European Union’s Horizon 2020 Programme through the ERC Advanced Grant WhiteMech (No. 834228). REFERENCES [1] Angwin, J., Larson, J., Mattu, S., and Kirchner, L. Machine bias. retrieved from https://www.propublica.org/article/machine-bias-risk-assessments-in- criminal-sentencing, May 23, 2016. [2] Baader, F., Calvanese, D., McGuinness, D., Nardi, D., and Patel-Schneider, P. F., Eds. The Description Logic Handbook: Theory, Implementation and Appli- cations, 2nd ed. Cambridge University Press, 2007. [3] Barceló, P., and Romero, M. The Complexity of Reverse Engineering Prob- lems for Conjunctive Queries. Proceedings of the 16th International Semantic Web Conference, 2017 (2017), 17 pages. [4] Bonifati, A., Ciucanu, R., and Staworko, S. Learning join queries from user examples. ACM Trans. Database Syst. 40, 4 (Jan. 2016), 24:1–24:38. [5] Bühmann, L., Lehmann, J., Westphal, P., and Bin, S. Dl-learner structured machine learning on semantic web data. In Companion Proceedings of the The Web Conference 2018 (Republic and Canton of Geneva, Switzerland, 2018), WWW ’18, International World Wide Web Conferences Steering Committee, pp. 467–471. [6] Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., and Rosati, R. Linking data to ontologies: The description logic DL-Lite𝐴 . In Proceedings of the Second International Workshop on OWL: Experiences and Directions (OWLED 2006) (2006), vol. 216 of CEUR Electronic Workshop Pro- ceedings, http://ceur-ws.org/. [7] Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., and Rosati, R. Ontology-based database access. In Proceedings of the Fifteenth Italian Conference on Database Systems (SEBD 2007) (2007), pp. 324–331. [8] Cima, G. Preliminary results on ontology-based open data publishing. In Pro- ceedings of the Thirtieth International Workshop on Description Logics (DL 2017) (2017), vol. 1879 of CEUR Electronic Workshop Proceedings, http://ceur-ws.org/. [9] Cima, G., Lenzerini, M., and Poggi, A. Semantic characterization of data services through ontologies. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019) (2019), pp. 1647–1653. [10] Croce, F., and Lenzerini, M. A framework for explaining query answers in dl-lite. In Proceedings of the Twenty-First International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018) (2018). [11] Daraio, C., Lenzerini, M., Leporelli, C., Naggar, P., Bonaccorsi, A., and Bartolucci, A. The advantages of an ontology-based data management approach: openness, interoperability and data quality. Scientometrics 108 (03 2016). [12] Fanizzi, N., Rizzo, G., d’Amato, C., and Esposito, F. Dlfoil: Class expression learning revisited. In Proceedings of the Twenty-First International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018) (2018). [13] Funk, M., Jung, J. C., Lutz, C., Pulcini, H., and Wolter, F. Learning descrip- tion logic concepts: When can positive and negative examples be separated? In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19 (7 2019), International Joint Conferences on Artificial Intelligence Organization, pp. 1682–1688. [14] Gutiérrez-Basulto, V., Jung, J. C., and Sabellek, L. Reverse engineering queries in ontology-enriched systems: The case of expressive horn description logic ontologies. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18 (7 2018), International Joint Conferences on Artificial Intelligence Organization, pp. 1847–1853. [15] Lenzerini, M. Ontology-based data management. In Proceedings of the Twentieth International Conference on Information and Knowledge Management (CIKM 2011) (2011), pp. 5–6. [16] Lenzerini, M. Managing data through the lens of an ontology. AI Magazine 39, 2 (2018), 65–74. [17] Straccia, U., and Mucci, M. pfoil-dl: Learning (fuzzy) el concept descriptions from crisp owl data using a probabilistic ensemble estimation. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (New York, NY, USA, 2015), SAC ’15, ACM, pp. 345–352. [18] Tran, Q. T., Chan, C.-Y., and Parthasarathy, S. Query reverse engineering. The VLDB Journal 23, 5 (Oct. 2014), 721–746. [19] Zloof, M. M. Query-by-example: The invocation and definition of tables and forms. In Proceedings of the 1st International Conference on Very Large Data Bases (New York, NY, USA, 1975), VLDB ’75, ACM, pp. 1–24.