<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mapping Data to Higher-Order Description Logic Knowledge Bases</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Floriana</forename><forename type="middle">Di</forename><surname>Pinto</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica e Sistemistica Antonio Ruberti</orgName>
								<orgName type="institution">Sapienza Università di Roma</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giuseppe</forename><surname>De Giacomo</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica e Sistemistica Antonio Ruberti</orgName>
								<orgName type="institution">Sapienza Università di Roma</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maurizio</forename><surname>Lenzerini</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica e Sistemistica Antonio Ruberti</orgName>
								<orgName type="institution">Sapienza Università di Roma</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Riccardo</forename><surname>Rosati</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica e Sistemistica Antonio Ruberti</orgName>
								<orgName type="institution">Sapienza Università di Roma</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Mapping Data to Higher-Order Description Logic Knowledge Bases</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B652FD46435981BBD7C7943D0C916A11</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T22:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we introduce the notion of mapping-based knowledge base (MKB), to formalize those ontology-based data access (OBDA) scenarios where both the extensional and the intensional level of the ontology are determined by suitable mapping assertions involving the data sources. We study reasoning over MKBs in the context of Hi(DL-LiteR), a higher-order version of the DL DL-LiteR. We show that answering queries posed to MKBs expressed in Hi(DL-LiteR) can be done efficiently through FOL rewriting: hence, query answering can be delegated to a DBMS, as in the case of traditional OBDA systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Ontology-based data access (OBDA) <ref type="bibr" target="#b1">[2]</ref> is a recent application of Description Logics (DLs) that is gaining momentum. The idea behind OBDA is to use a DL ontology as a means to access a set of data sources, so as to mask the user from all applicationdependent aspects of data, and to extract useful information from the sources based on a conceptual representation of the domain, expressed as a TBox in a suitable DL. In current approaches to OBDA, the intensional level of the ontology (the TBox) is fixed once for all at design time, and the mapping assertions specify how the data at the sources correspond to instances of the concepts, roles, and attibutes in the TBox. More precisely, the various mapping assertions determine a sort of virtual ABox, in which the individual objects are built out from data, and the instance assertions are specified through the relationships between the sources and the elements of the ontology.</p><p>Several OBDA projects have been carried out in the last years <ref type="bibr" target="#b8">[9]</ref>, and OBDA systems have been designed to support OBDA applications <ref type="bibr" target="#b0">[1]</ref>. The experience gained in this work has shown that there are important aspects that are missing in current OBDA technology. In this paper, we concentrate on three such aspects.</p><p>The first aspect is related to the need of making the intensional level of the ontology more dynamic. Indeed, in real applications, the information about which are the concepts and roles that are relevant in the domain of interest is often stored in the data sources. Consider, for example, the database D of a motor industry shown in figure <ref type="figure" target="#fig_3">1</ref>, storing data about different types of cars (table T-CarTypes), and various cars of such types (table T-Cars) manufactured by the firm. The key observation is that the database D stores information not only about the instances of concepts, but also about the concepts themselves, and their relationships. For example, table T-CarTypes tells us that there are four concepts in our ontology that are subconcepts of the concept Car, and, implicitely, tells us that they are mutually disjoint. hand, provides information about the instances of the various concepts, as well as other properties about such instances.</p><p>The second aspect is related to the need of metamodeling constructs in the language used to specify the ontology <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b5">6]</ref>. Metamodeling allows one to treat concepts and properties as first-order citizens, and to see them as individuals that may constitute the instances of other concepts, called meta-concepts. In our example, it is convenient to introduce in the ontology the concept Car-Type, whose instances are exactly the subconcepts of cars stored in table T-CarTypes. In this way, we allow users to dynamically acquire knowledge about relevant car types through simple queries asking for the instances of the meta-concept Car-Type.</p><p>The third aspect deals with the need of designing tractable algorithms for query answering in OBDA systems. In <ref type="bibr" target="#b7">[8]</ref>, it is argued that, since the data sources used in OBDA systems are likely to be very large, such systems should be based on DLs that are tractable in data complexity. In particular, <ref type="bibr" target="#b7">[8]</ref> advocates the use of the DL-Lite family, that allows for First-Order Logic (FOL) rewritability of (unions of) conjunctive queries. We remind the reader that in a DL enjoing FOL rewritability, query answering can be divided in two steps. In the first step, called rewriting, using the TBox only, the query q is transformed into a new FOL query q , and in the second step q is evaluated over the ABox. The correctness of the whole method relies on the fact the answers to q over the ABox coincide with the certain answers to q over the whole ontology. The challenge is now to design tractable query answering algorithms even in cases where the mappings relate data at the sources both to the extensional and the intensional level of the ontology, and meta-concepts and meta-roles are used in the queries. In this paper, we address the above aspects, and present the following contributions.</p><p>(i) We introduce the notion of mapping-based knowledge base (MKB) (Section 3), to formalize the situation where both the extensional and the intensional level of the ontology are determined by suitable mapping assertions involving the data sources.</p><p>(ii) We describe the higher-order DL Hi (DL-Lite R ) (Section 2), based on the approach presented in <ref type="bibr" target="#b4">[5]</ref>. In that paper, it is shown how, starting from a traditional DL L, one can define its higher-order version, called Hi(L). Here, we apply this idea, and present Hi (DL-Lite R ), which is the higher-order version of DL-Lite R <ref type="bibr" target="#b2">[3]</ref>.</p><p>(iii) We show that answering queries posed to MKBs expressed in Hi (DL-Lite R ) can be done efficiently through FOL rewriting (Section 4). More specifically, we describe an algorithm that, given a query q over a MKB, rewrites q into a FOL query that is evaluated taking into account only the mapping assertions M A of the MKB involving the extensional level of the ontology. Hence query answering can be delegated to a DBMS, as in the case of traditional OBDA systems.</p><p>2 Higher-order DL-Lite R In this section, we describe the higher-order DL Hi (DL-Lite R ), based on the approach presented in <ref type="bibr" target="#b4">[5]</ref>. Every traditional DL L is characterized by a set OP (L) of operators, used to form concept and role expressions, and a set of MP (L) of meta-predicates, used to form assertions. Each operator and each meta-predicate have an associated arity. If symbol S has arity n, then we write S/n to denote such a symbol and its arity. For DL-Lite R , we have</p><formula xml:id="formula_0">-OP (DL-Lite R ) = {Inv /1, Exists/1}; -MP (DL-Lite R ) = {Inst C /2, Inst R /3, Isa C /2, Isa R /2, Disj C /2, Disj R /2}.</formula><p>We assume that the reader is familiar with DL-Lite R . Therefore, the intuitive meaning of all the above symbols should be clear. The formal specification of their semantics will be given shortly. Syntax. We assume the existence of two disjoint, countably infinite alphabets: S, the set of names, and V, the set of variables. Intutively, the names in S are the symbols denoting the atomic elements of a Hi(DL-Lite R ) knowledge base. The building blocks of such a knowledge base are assertions, which in turn are based on terms and atoms.</p><p>We inductively define the set of terms, denoted by τ DL-Lite R (S, V), over the alphabets S and V for Hi(DL-Lite R ) as follows:</p><p>-</p><formula xml:id="formula_1">if E ∈ S then E ∈ τ DL-Lite R (S, V); -if V ∈ V then V ∈ τ DL-Lite R (S, V); -if C/n ∈ OP (DL-Lite R ) and t 1 , . . . , t n ∈ τ DL-Lite R (S, V) then C(t 1 , . . . , t n ) ∈ τ DL-Lite R (S, V).</formula><p>Ground terms, i.e., terms without variables, are called expressions, and the set of expressions is denoted by τ DL-Lite R (S).</p><p>A DL-Lite R -atom, or simply atom, over the alphabets S and V for Hi(DL-Lite R ) is a statement of the form M (E 1 , . . . , E n ) where M ∈ MP (DL-Lite R ), n is the arity of M , and for every 1 ≤ i ≤ n, E i ∈ τ DL-Lite R (S, V). If X is a subset of V, a is a DL-Lite R -atom, and all variables appearing in a belongs to X, then a is called an X-atom in DL-Lite R .</p><p>Ground DL-Lite R -atoms, i.e., DL-Lite R -atoms without variables, are called DL-Lite R -assertions, or simply assertions. Thus, an assertion is simply an application of a meta-predicate to a set of expressions. Intuitively, an assertion is an axiom that predicates over a set of individuals, concepts or roles.</p><p>A Hi(DL-Lite R ) knowledge base (KB) over S is a set of DL-Lite R -assertions over S. To agree with the usual terminology of DLs, we use the term TBox to denote a set of Isa C , Isa R , Disj C and Disj R assertions, and the term ABox to denote a set of Inst C and Inst R assertions. Semantics. Our definition of semantics for Hi (DL-Lite R ) is based on the notion of interpretation structure. An interpretation structure is a triple Σ = ∆, I c , I r where: (i) ∆ is a non-empty (possibly countably infinite) set; (ii) I c is a function that maps each d ∈ ∆ into a subset of ∆; and (iii) I r is a function that maps each d ∈ ∆ into a subset of ∆ × ∆. In other words, Σ treats every element of ∆ simultaneously as: (i) an individual; (ii) a unary relation, i.e., a concept, through I c ; and (iii) a binary relation, i.e., a role, through I r .</p><p>An interpretation for S (simply called an interpretation, when S is clear from the context) over the interpretation structure Σ is a pair I = Σ, I o , where -Σ = ∆, I c , I r is an interpretation structure, and -I o is a function that maps: 1. each element of S to a single object in ∆; and 2. each element C/n ∈ OP (DL-Lite R ) to an n-ary function C Io : ∆ n → ∆ that satisfies the conditions characterizing the operator C/n. In particular, the conditions for the operators in OP (DL-Lite R ) are as follows:</p><p>(a) for each</p><formula xml:id="formula_2">d 1 ∈ ∆, if d = Inv Io (d 1 ) then d Ir = (d Ir 1 ) −1 ; (b) for each d 1 ∈ ∆, if d = Exists(d 1 ) then d Ic = {e | ∃e 1 s.t. e, e 1 ∈ d Ir 1 }. We extend I o to ground terms in τ DL-Lite R (S) inductively as follows: if C/n ∈ OP (DL-Lite R ), then (C(t 1 , . . . , t n )) Io = C Io (E Io 1 , . . . , E Io n ).</formula><p>We now turn our attention to the interpretation of terms in Hi (DL-Lite R ). To interpret non-ground terms, we need assignments over interpretations, where an assignment µ over Σ, I o is a function µ : V → ∆. Given an interpretation I = Σ, I o and an assignment µ over I, the interpretation of terms is specified by the function</p><formula xml:id="formula_3">(•) Io,µ : τ DL-Lite R (S, V) → ∆ defined as follows: -if t ∈ S then t Io,µ = t Io ; -if t ∈ V then t Io,µ = µ(t); -if t is of the form C(t 1 , . . . , t n ), then t Io,µ = C Io (t Io,µ 1 , . . . , t Io,µ n</formula><p>). Finally, we define the semantics of atoms, by defining the notion of satisfaction of an atom with respect to an interpretation I and an assignment µ over I as follows:</p><p>-</p><formula xml:id="formula_4">I, µ |= Inst C (E 1 , E 2 ) if E Io,µ 1 ∈ (E Io,µ 2 ) Ic ; -I, µ |= Inst R (E 1 , E 2 , E 3 ) if E Io,µ 1 , E Io,µ 2 ∈ (E Io,µ 3 ) Ir ; -I, µ |= Isa C (E 1 , E 2 ) if (E Io,µ 1 ) Ic ⊆ (E Io,µ 2 ) Ic ; -I, µ |= Isa R (E 1 , E 2 ) if (E Io,µ 1 ) Ir ⊆ (E Io,µ 2 ) Ir ; -I, µ |= Disj C (E 1 , E 2 ) if (E Io,µ 1 ) Ic ∩ (E Io,µ 2 ) Ic = ∅; -I, µ |= Disj R (E 1 , E 2 ) if (E Io,µ 1 ) Ir ∩ (E Io,µ 2 ) Ir = ∅.</formula><p>A Hi (DL-Lite R ) KB H is satisfied by I if all the assertions in H are satisfied by I<ref type="foot" target="#foot_0">1</ref> . As usual, the interpretations I satisfying H are called the models of H. A Hi (DL-Lite R ) KB H is satisfiable if it has at least one model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Mapping-based knowledge bases</head><p>As we said in the previous section, a Hi (DL-Lite R ) KB is simply a set of assertions. One might think of such a set of assertions as explicitly stated by the designer of the KB. This is a reasonable assumption only in those cases where the ontology is managed by an ad-hoc system, and is built from scratch for the specific application. However, in many applications, it is of interest to derive the KB directly from a set of data sources, so that the assertions of the KB are defined by specific mappings to such data sources. The resulting notion will be called mapping-based knowledge base.</p><p>In the following, we assume that the data sources are expressed in terms of the relational data model. In other words, all the technical development presented in the rest of this section assumes that the set of sources to be linked to the knowledge base is one relational database. Note that this is a realistic assumption, since many data federation tools are now available that are able to wrap a set of heterogeneous sources and present them as a single relational database.</p><p>When mapping relational data sources to a knowledge base over S, one should take into account that sources store "data values", and such data values should not be confused with the elements in S. To face this impedance mismatch problem, <ref type="bibr" target="#b7">[8]</ref> proposes to structure the mapping assertions in such a way that the elements of the knowledge base are denoted by terms built out from data values stored at the sources using special function symbols. Although we could in principle follow the same idea here, for the sake of simplicity, in this paper we assume that the relational data sources store directly elements of S. Note, however, that all the results presented in the next sections easily extend to the case where mappings build complex terms for denoting the elements of the knowledge base.</p><p>We are now ready to provide the definition of mapping-based knowledge base.</p><p>Definition 1. A Hi (DL-Lite R ) mapping-based knowledge base (MKB) is a pair K = DB , M such that: (i) DB is a relational database; (ii) M is a mapping, i.e. a set of mapping assertions, each one of the form Φ(x) ; ψ, where Φ is an arbitrary FOL query over DB of arity n &gt; 0 with free variables x = x 1 , . . . , x n , and ψ is an X-atom in DL-Lite R , with X = {x 1 , . . . , x n }.</p><p>In the following, if K = DB , M is a MKB, then we denote by M A the set of mapping assertions from M whose head predicate is either Inst C or Inst R . Furthermore, we denote by M T the set M \ M A , i.e., the set of mapping assertions from M whose head predicate belongs to the set {Isa C , Isa R , Disj C , Disj R }. We call a mapping M an instance-mapping if M = M A , i.e., if the metapredicates Inst C and Inst R are the only ones to appear in the right-hand side of the mapping assertions in M.</p><p>In order to define the semantics of a Hi (DL-Lite R ) MKB K = DB , M , we need to define when an interpretation satisfies an assertion in M with respect to a database DB . To this end, we make use of the notion of ground instance of an atom, and the notion of answer to a query over DB . Let ψ be an X-atom with X = {x 1 , . . . , x n }, and let v be a tuple of arity n with values from DB . Then the ground instance ψ[x/v] of ψ is the formula obtained by substituting every occurrence of x i with v i (for i ∈ {1, .., n}) in ψ. Also, if DB is a relational database, and q is a query over DB , we write ans(Φ, DB ) to denote the set of answers to q over DB .</p><p>We now specify when an interpretation satisfies a mapping assertion with respect to a database. We say that an interpretation I satisfies the mapping assertion Φ(x) ; ψ with respect to the database DB , if for every tuple of values v ∈ ans(Φ, DB ), the ground atom ψ[x/v] is satified by I. We say that I is a model of K = DB , M if I satisfies every assertion in M with respect to DB .</p><p>The following example shows how Hi (DL-Lite R ) mapping-based knowledge bases can be used to model real world situations in a suitable manner.</p><p>Example 1. Consider the database D shown in the introduction. We define a Hi (DL-Lite R ) MKB K 1 = D, M , where the mapping M is defined as follows:</p><formula xml:id="formula_5">-M1: {y | T-CarTypes(x, y)} ; IsaC (y, Car) -M2: {(x, z) | T-Cars(x, y, t, u, v, q) ∧ T-CarTypes(y, z)} ; InstC (x, z) -M3: {(x, y) | T-CarTypes(z1, x) ∧ T-CarTypes(z2, y) ∧ x = y} ; Disj C (x, y)</formula><p>Intuitively, M1 states that every type of car (whose name appears in the second column of table T-CarTypes, i.e. Coupé, SUV, Sedan, etc.) is a Car. M2, instead, indicates how to correctly retrieve the instances of different types of cars (e.g. car with plate number AB111 has to be retrieved as an instance of the concept Coupé, cars with plate numbers AF333 and BR444 as instances of the concept SUV, and so on). Finally, the intended meaning of assertion M3 is that different types of cars are pairwise disjoint (e.g. a Coupé is not a SUV, a SUV is not a Sedan, and so on). Obviously, mapping assertions are always written by people who know the semantics of the information stored in the database. Notice that the mapping assertions in M are able to model the car-types hierarchy without knowing a priori (i.e. at design-time) all the different kinds of cars that are produced by the motor industry.</p><p>Suppose now that the motor industry decides to introduce new types of cars to its car fleet, and in particular it decides to produce Campers and Caravans as well, thus extending the hierarchy. As one can imagine, these new kinds of cars share some common characteristics with the previous car types, even though not all of them. Therefore, it might be a reasonable choice for the database designers to introduce a new relational The new situation can be modeled in our framework by simply adding to M the following mapping assertions: where mapping M4 states that the new kinds of cars (Camper and Caravan) are Cars, the second assertion indicates how to correctly retrieve their instances, (e.g. car with plate number CM777 as an instance of Camper), and mapping M6 states that the new types of cars are pairwise disjoint (i.e. a Camper is not a Caravan). Notice that if (instead of creating a new table) the new kinds of cars had been simply introduced into the initial table T-CarTypes (thus without modifying D in any way), the new concepts (Camper and Caravan) would been automatically detected at run-time by mappings M1-M3, whitout requiring any further mapping definition.</p><p>Next, we introduce the notion of query, which in turn relies on the notion of "query atom". Intuitively, a query atom is a special kind of atom, constituted by a metapredicate applied to a set of arguments, where each argument is either an expression or a variable. More precisely, we define the set of q-terms to be τ DL-Lite R (S) ∪ V. We define a query atom as an atom constituted by the application of a meta-predicate in MP (DL-Lite R ) to a set of q-terms, and we call a query atom ground if no variable occurs in it. A query atom whose meta-predicate is Inst C or Inst R is called an instancequery atom. A higher-order conjunctive query (HCQ) of arity n is an expression of the form q(x 1 , . . . , x n ) ← a 1 , . . . , a m where q, called the query predicate, is a symbol not in S ∪ V, every x i belongs to V, every a i is a (possibly non-ground) query atom, and all variables x 1 , . . . , x n occur in some a j . The variables x 1 , . . . , x n are called the free variables (or distinguished variables) of the query, while the other variables occurring in a 1 , . . . , a m are called existential variables. A HCQ constituted by instance atoms only is called an instance HCQ (IHCQ). A higher-order union of conjunctive queries (HUCQ) of arity n is a set of HCQs of arity n with the same query predicate. A HUCQ constituted by instance HCQs only is called an instance HUCQ (IHUCQ). A HCQ/HUCQ is called Boolean if it has no free variables.</p><p>Let I be an interpretation and µ an assignment over I. A Boolean HCQ q of the form q ← a 1 , . . . , a n is satisfied in I, µ if every query atom a i is satisfied in I, µ. A Boolean HUCQ Q is satisfied in I, µ if there exists a Boolean HCQ q ∈ Q that is satisfied in I, µ. A Boolean HUCQ Q is satisfied in I, written I |= Q, if there exists an assignment µ over I such that Q is satisifed in I, µ. Given a Boolean HUCQ Q and a Hi (DL-Lite R ) KB K, we say that Q is logically implied by K (denoted by K |= Q) if for each model I of K there exists an assignment µ such that Q is satisfied by I, µ.</p><p>Given a non-Boolean HUCQ q of the form q(t 1 , . . . , t n ) ← a 1 , . . . , a m , a grounding substitution of q is a substitution θ such that t 1 θ, . . . , t n θ are ground terms. We call t 1 θ, . . . , t n θ a grounding tuple. The set of certain answers to q in K is the set of grounding tuples t 1 θ, . . . , t n θ that make the Boolean query q θ ← a 1 θ, . . . , a n θ logically implied by K. Notice that, in general, the set of certain answers may be infinite even if the KB is finite. Therefore, it is of interest to define suitable notions of safeness, which guarantee that the set of answers is bounded. This issue, however, is beyond the scope of the present paper. Indeed, in this paper, we focus on Boolean queries only, so as to address the computation of certain answers as a decision problem.</p><p>Example 2. Let us refer to the MKB K 1 = D, M of example 1. Interesting queries that can be posed to K 1 include: (i) Return all the instances of Car manufactured by the motor industry, each one with its own type: q(x, y) ← Inst C (x, y), Inst C (y, Car); (ii) Return all the concepts which car with plate number 'AB111' belongs to: q(x) ← Inst C ( AB111 , x).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Query answering</head><p>In this section, we study the problem of answering IHUCQs over Hi (DL-Lite R ) MKBs. Our query answering technique is based on query rewriting, so we will first deal with the problem of computing a perfect reformulation of a IHUCQ over a Hi (DL-Lite R ) KB. Then, we will present a query answering algorithm for MKBs based on the above perfect reformulation technique. In the following, we assume that the MKB is consistent. This does not constitute a limitation, since it is possible to show that checking consistency of a MKB can also be done through query answering, by means of techniques analogous to the ones defined for DL-Lite.</p><p>We start with some auxiliary definitions. Given an assertion α = Inst C (e 1 , e 2 ), we say that e 2 occurs as a concept argument in α. Given an assertion α = Inst R (e 1 , e 2 , e 3 ), we say that e 3 occurs as a role argument in α. Given an assertion α = Isa C (e 1 , e 2 ), we say that e 1 and e 2 occur as concept arguments in α. Given an assertion α = Isa R (e 1 , e 2 ), we say that e 1 and e 2 occur as role arguments in α. Given an assertion α = Disj C (e 1 , e 2 ), we say that e 1 and e 2 occur as concept arguments in α. Given an assertion α = Disj R (e 1 , e 2 ), we say that e 1 and e 2 occur as role arguments in α.</p><p>A DL atom is an atom of the form N (t) or N (t 1 , t 2 ), where N is a name and t, t 1 , t 2 are either variables or names. An extended CQ (ECQ) is a conjunction of DL atoms, Inst C atoms and Inst R atoms. An extended UCQ (EUCQ) is a union of ECQs. Given an atom α, Pred (α) denotes the term appearing in predicate position in α (such a term may be either a variable or an expression). Given a TBox T , we denote by Concepts(T ) the set {e, Exists(e), Exists(Inv (e)) | e ∈ E and e occurs as a concept argument in T } and denote by Roles(T ) the set {e, Inv (e) | e ∈ E and e occurs as a role argument in T }. Given a mapping M and a database DB , we denote by Retrieve(M, DB ) the Hi (DL-Lite R ) KB H defined as:</p><formula xml:id="formula_6">H = {ψ(t) | Φ(x) ; ψ ∈ M and DB |= Φ(t)}</formula><p>Given an instance-mapping M and an ABox A, we say that A is retrievable through M if there exists a database DB such that A = Retrieve(M, DB ).</p><p>Query rewriting. We start by providing an intuitive explanation of our rewriting technique. The basic idea is to reduce the perfect reformulation of an IHUCQ over a Hi (DL-Lite R ) TBox to the perfect reformulation of a standard UCQ over a DL-Lite R TBox, which can be done e.g. by the algorithm PerfectRef presented in <ref type="bibr" target="#b2">[3]</ref>. To do so, we have to first transform a IHUCQ into a standard UCQ, actually an EUCQ. This is done through a first partial grounding of the query (through the function PMG) and then through the functions Normalize and τ presented below. Once computed the perfect reformulation of the EUCQ, we then have to transform the EUCQ back into a IHUCQ, through the functions Denormalize and τ − presented below.</p><p>Given two IHCQs q, q and a TBox T , we say that q is a partial metagrounding of q with respect to T if q = σ(q) where σ is a partial substitution of the metavariables of q with the expressions occurring in T such that, for each metavariable x of q, either σ(x) = x or: (i) if x occurs in a concept position in q, then σ(x) ∈ Concepts(T ); (ii) if x occurs in a role position in q, then σ(x) ∈ Roles(T ). Given an IHCQ q and a TBox T , we denote by PMG(q, T ) the set of all partial metagroundings of q with respect to T , i.e., the following IHUCQ Q: Q = {q | q is a partial metagrounding of q with respect to T } Moreover, given a IHUCQ Q and a TBox T , we define PMG(Q, T ) as the IHUCQ q∈Q PMG(q, T ).</p><p>Given an instance atom α, we define Normalize(α) as follows: Given an IHCQ q ← α 1 , . . . , α n , Normalize(q) returns the IHCQ q ← Normalize(α 1 ), . . . , Normalize(α n ). Finally, given an IHUCQ Q, we define Normalize(Q) as q∈Q Normalize(q). Given an IHCQ q and an instance-mapping M, Denormalize(q, M) is the IHUCQ Q defined inductively as follows:</p><formula xml:id="formula_7">-if α = Inst C (e 1 ,</formula><p>q ∈ Q; if q ∈ Q and q contains an atom α of the form Inst R (t 1 , , t 2 ), and either Exists(t 2 ) occurs in M or Exists(x) (where x is a variable) occurs in M, then the query obtained from q by replacing α with the atom Inst C (t 1 , Exists(t 2 )) belongs to Q; if q ∈ Q and q contains an atom α of the form Inst R ( , t 1 , t 2 ), and either Exists(Inv (t 2 )) occurs in M or Exists(Inv (x)) (where x is a variable) occurs in M, then the query obtained from q by replacing α with the atom Inst C (t 1 , Exists(Inv (t 2 ))) belongs to Q; if q ∈ Q and q contains an atom α of the form Inst R (t 1 , t 2 , t 3 ) and either Inv (t 2 )</p><p>occurs in M or Inv (x) (where x is a variable) occurs in M, then the query obtained from q by replacing α with the atom Inst R (t 2 , t 1 , Inv (t 3 ))) belongs to Q. Finally, given an IHUCQ Q and a mapping M, we define Denormalize(Q, M) as q∈Q Denormalize(q, M).</p><p>Given an IHUCQ Q and a TBox T , we denote by PerfectRef (Q, T ) the EUCQ returned by the query rewriting algorithm for DL-Lite R shown in <ref type="bibr" target="#b2">[3]</ref>. <ref type="foot" target="#foot_1">2</ref>We now define the functions τ and τ − which translate IHUCQs into EUCQs and vice versa. Given an IHCQ q and a TBox T , τ (q, T ) is the ECQ obtained from q as follows: (i) for each atom of q of the form Inst C (t 1 , t 2 ), if t 2 ∈ Concepts(T ) then replace the atom with the atom t 2 (t 1 ); (ii) for each atom of q of the form Inst R (t 1 , t 2 , t 3 ), if t 3 ∈ Roles(T ) then replace the atom with the atom t 3 (t 1 , t 2 ). Then, given an IHUCQ Q, we define τ (Q, T ) = {τ (q, T ) | q ∈ Q}.</p><p>Given a ECQ q and a TBox T , τ − (q, T ) is the IHCQ obtained from q as follows: (i) for each atom of q of the form t 2 (t 1 ), replace the atom with the atom Inst C (t 1 , t 2 ); (ii) for each atom of q of the form t 3 (t 1 , t 2 ), replace the atom with the atom Inst R (t 1 , t 2 , t 3 ). Then, given an IHUCQ Q, we define τ − (Q, T ) = {τ − (q, T ) | q ∈ Q}.</p><p>We are now ready to formally define our rewriting algorithm, which takes as input a IHUCQ, a TBox and an instance-mapping, and returns a new IHUCQ.</p><formula xml:id="formula_8">ALGORITHM RewriteIUCQ(Q, T , M) INPUT: Boolean IHUCQ Q, DL-Lite R TBox T , instance-mapping M OUTPUT: Boolean IHUCQ Q Q 0 = PMG(Q, T ); Q 1 = Normalize(Q 0 ); Q 2 = τ (Q 1 , T ); Q 3 = PerfectRef (Q 2 , T ); Q 4 = τ − (Q 3 , T ); Q = Denormalize(Q 4 , M); return Q ;</formula><p>The IHUCQ returned by RewriteIUCQ(Q, T , M) constitutes a perfect reformulation of the query Q with respect to the TBox T and the mapping M, as formally stated by the following theorem.</p><p>Theorem 1. Let T be a TBox, let M be an instance-mapping and let Q be a IHUCQ. Then, for every ABox A that is a retrievable through M, T</p><formula xml:id="formula_9">∪ A |= Q iff A |= RewriteIUCQ(Q, T , M).</formula><p>Query answering. Based on the above query rewriting technique, we now present an algorithm for query answering over MKBs. Our idea is to first compute a DL-Lite R TBox by evaluating the mapping assertions involving the predicates Isa C , Isa R , Disj C , Disj R over the database of the MKB; then, such a TBox is used to compute the perfect reformulation of the input IHUCQ.</p><p>To complete query answering, we now have to consider the mapping of the predicates Inst C and Inst R , and to reformulate the query thus obtained replacing the above predicates with the corresponding FOL queries of the mapping assertions. In this way we obtain a FOL query expressed on the database. This second rewriting step, usually called unfolding, can be performed by the algorithm UnfoldDB presented in <ref type="bibr" target="#b7">[8]</ref>. <ref type="foot" target="#foot_2">3</ref>In the following, given a mapping M and a database DB , we denote by DB MA the database constituted by every relation R of DB such that R occurs in M A . Analogously, we denote by DB MT the database constituted by every relation R of DB such that R occurs in M T . We are now ready to present our query answering algorithm. The algorithm starts by retrieving the TBox from the DB through the mapping M T . Then, it computes the perfect reformulation of the query with respect to the retrieved TBox, and next computes the unfolding of such a query with respect to the mapping M A . Finally, it evaluates the query over the database.</p><p>The following property can be proved by slightly extending the proof of correctness of the algorithm UnfoldDB shown in <ref type="bibr" target="#b7">[8]</ref>. The above lemma allows us to prove correctness of the algorithm Answer . Theorem 2. Let K = DB , M be a Hi (DL-Lite R ) MKB, let Q be a IHUCQ. Then, K |= Q iff Answer (Q, K) returns true.</p><p>Finally, from the algorithm Answer we are able to derive the following complexity results for query answering over Hi (DL-Lite R ) MKBs. Theorem 3. Let K = DB , M be a Hi (DL-Lite R ) MKB, and let Q be a IHUCQ. Deciding whether K |= Q is in AC 0 with respect to the size of DB MA , is in PTIME with respect to the size of K, and is NP-complete with respect to the size of K ∪ Q.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>In this paper we have investigated the possibility of generating a knowledge base on the fly, while computing instance queries, from data stored in data sources through asserted mappings. A key point to obtain such a degree of flexibility is relying on higher-order description logics which blur the distinction between classes/roles at the intensional level and individuals at the extensional level. This paper is only scratching the surfaces of the immense possibilities that this approach opens. For example, we may allow the coexistence of multiple TBoxes within the same data sources, and allow the user to select which TBox to load when querying the system, possibly depending on the query, much in the spirit of <ref type="bibr" target="#b6">[7]</ref>. The user can in principle even compose on the fly the TBox to use when answering a query. Obviously notions such as authorization views acquire an intriguing flavor in this setting (hiding intensional as well as extensional knowledge), as well as consistency, since we may even allow for contradicting assertions to coexist as long as they are not used together when performing query answering.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>-</head><label></label><figDesc>M4: {y | T-NewCars(x, y, t, u, v, q)} ; IsaC (y, Car) -M5: {(x, z) | T-NewCars(x, z, t, u, v, q)} ; InstC (x, z) -M6: {(x, y) | T-NewCars(z1, x) ∧ T-NewCars(z2, y) ∧ x = y} ; Disj C (x, y)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>e 2 )</head><label>2</label><figDesc>and e 2 has the form Exists(e ) where e is an expression which is not of the form Inv (e ), then Normalize(α) = Inst R (e 1 , , e ); if α = Inst C (e 1 , e 2 ) and e 2 has the form Exists(Inv (e )) where e is any expression, then Normalize(α) = Inst R ( , e 1 , e ); if α = Inst R (e 1 , e 2 , e 3 ) and e 3 is of the form Inv k (e ) where k ≥ 1 and k is an even number and e is an expression which is not of the form Inv (e ), then Normalize(α) = Inst R (e 1 , e 2 , e ); if α = Inst R (e 1 , e 2 , e 3 ) and e 3 is of the form Inv k (e ) where k ≥ 1 and k is an odd number and e is an expression which is not of the form Inv (e ), then Normalize(α) = Inst R (e 2 , e 1 , e ).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>ALGORITHM</head><label></label><figDesc>Answer (Q, K) INPUT: Boolean IHUCQ Q, Hi (DL-Lite R ) MKB K = DB , M OUTPUT: true if K |= Q, false otherwise T = Retrieve(M T , DB MT ); Q = RewriteIUCQ(Q, T , M A ); Q = UnfoldDB(Q , M A ); if DB MA |= Q then return true else return false</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Lemma 1 .</head><label>1</label><figDesc>Let M be an instance-mapping and let Q be a IHUCQ. Then, for every database DB , M, DB |= Q iff DB MA |= UnfoldDB(Q, M).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table T -</head><label>T</label><figDesc>Cars, on the other</figDesc><table><row><cell></cell><cell></cell><cell>T-Cars</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">T-CarTypes</cell><cell cols="6">NumberPlate CarType EngineSize BreakPower Color TopSpeed</cell></row><row><cell cols="2">Code TypeName</cell><cell>AB111</cell><cell>T1</cell><cell>2000</cell><cell>200</cell><cell>Silver</cell><cell>260</cell></row><row><cell>T1</cell><cell>Coupé</cell><cell>AF333</cell><cell>T2</cell><cell>3000</cell><cell>300</cell><cell>Black</cell><cell>200</cell></row><row><cell>T2</cell><cell>SUV</cell><cell>BR444</cell><cell>T2</cell><cell>4000</cell><cell>400</cell><cell>Grey</cell><cell>220</cell></row><row><cell>T3</cell><cell>Sedan</cell><cell>AC222</cell><cell>T4</cell><cell>2000</cell><cell>125</cell><cell>Dark Blue</cell><cell>180</cell></row><row><cell>T4</cell><cell>Estate</cell><cell>BN555</cell><cell>T3</cell><cell>1000</cell><cell>75</cell><cell>Light Blue</cell><cell>180</cell></row><row><cell></cell><cell></cell><cell>BP666</cell><cell>T1</cell><cell>3000</cell><cell>600</cell><cell>Red</cell><cell>240</cell></row><row><cell></cell><cell></cell><cell cols="4">Fig. 1. The database of a motor industry</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>table in D:</figDesc><table><row><cell cols="3">T-NewCars NumberPlate CarType Height Weight EngineSize BreakHorsePower</cell></row><row><cell>CM777</cell><cell>Camper 2,50 mt 680 Kg 4000 cc</cell><cell>200 bhp</cell></row><row><cell>CM888</cell><cell>Camper 2,20 mt 550 Kg 3000 cc</cell><cell>150 bhp</cell></row><row><cell>CV333</cell><cell>Caravan 2,30 mt 620 Kg 3000 cc</cell><cell>200 bhp</cell></row><row><cell>CV222</cell><cell>Caravan 2,50 mt 580 Kg 4000 cc</cell><cell>250 bhp</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">We do not need to mention assignments here, since all assertions in H are ground.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">We are actually considering a slight generalization of the algorithm, which allows for the presence of a ternary relation (InstR) in the query.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Here we assume that the algorithm UnfoldDB takes as input a EUCQ and an instance-mapping. This corresponds to actually considering a straightforward extension of the algorithm presented in<ref type="bibr" target="#b7">[8]</ref> in order to deal with the presence of the ternary predicate InstR.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The Mastro system for ontology-based data access</title>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rodriguez-Muro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ruzzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">F</forename><surname>Savo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web Journal</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="43" to="53" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Ontologybased database access</title>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SEBD 2007</title>
				<meeting>of SEBD 2007</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="324" to="331" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Tractable reasoning and efficient query answering in description logics: The DL-Lite family</title>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Automated Reasoning</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="385" to="429" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">HILOG: A foundation for higher-order logic programming</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Warren</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Logic Programming</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="187" to="230" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Higher-order description logics for domain metamodeling</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">De</forename><surname>Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of AAAI 2011</title>
				<meeting>of AAAI 2011</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">OWL FA: a metamodeling extension of OWL DL</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW 2006</title>
				<meeting>of WWW 2006</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="1065" to="1066" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Emancipating instances from the tyranny of classes in information modeling</title>
		<author>
			<persName><forename type="first">J</forename><surname>Parsons</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wand</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. on Database Systems</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="228" to="268" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Linking data to ontologies</title>
		<author>
			<persName><forename type="first">A</forename><surname>Poggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. on Data Semantics</title>
		<imprint>
			<biblScope unit="volume">X</biblScope>
			<biblScope unit="page" from="133" to="173" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">MASTRO at work: Experiences on ontology-based data access</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">F</forename><surname>Savo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rodríguez-Muro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Romagnoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ruzzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stella</surname></persName>
		</author>
		<ptr target="org" />
	</analytic>
	<monogr>
		<title level="m">Proc. of DL 2010</title>
				<meeting>of DL 2010</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">573</biblScope>
			<biblScope unit="page" from="20" to="31" />
		</imprint>
	</monogr>
	<note>ceur-ws.</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
