=Paper=
{{Paper
|id=Vol-1350/paper-13
|storemode=property
|title=Singular
Referring Expressions in Conjunctive Query Answers: the case for
a CFD DL Dialect
|pdfUrl=https://ceur-ws.org/Vol-1350/paper-13.pdf
|volume=Vol-1350
|dblpUrl=https://dblp.org/rec/conf/dlog/BorgidaTW15
}}
==Singular
Referring Expressions in Conjunctive Query Answers: the case for
a CFD DL Dialect==
Singular Referring Expressions in Conjunctive Query Answers: the case for a CF D DL Dialect Alexander Borgida† , David Toman‡ and Grant Weddell‡ † Department of Computer Science Rutgers University, New Brunswick, USA borgida@cs.rutgers.edu ‡ Cheriton School of Computer Science University of Waterloo, Canada {david,gweddell}@uwaterloo.ca Abstract. A referring expression in linguistics is any noun phrase iden- tifying an object in a way that will be useful to interlocutors. In the context of conjunctive queries over a description logic knowledge base (DL KB), typically constant symbols (usually treated as rigid designa- tors) are used as referring expressions in a certain answer to the query. In this paper, we begin to explore how this can be usefully generalized by allowing more general DL concept descriptions, called singular referring expressions, to replace constants in this role. In particular, we lay the foundation for singular referring expressions in conjunctive query answers over a DL KB using a member of the CFD family of DL dialects. In the process, we introduce a specific language for referring concept types, and present initial results on how conjunctive queries with referring concept types can be efficiently supported. 1 Introduction and Motivation Query answering in logic-based approaches to data and knowledge bases has traditionally been viewed as finding constant names, appearing in the knowledge- base, which can be substituted for the variables of the query. More formally, a query q(x1 , . . . , xn ) is viewed as a formula with free variables x1 , . . . , xn and, if the knowledge-base K contains individual constant names IN, query answering consists of computing the set { (a1 , . . . , an ) | ai ∈ IN, K |= q(a1 /x1 , . . . , ak /xk ) }. We believe that in a number of circumstances this is less than ideal. (1) In object-based KBMSs (including Object-Relational, XML and Object- Oriented DBMSs, as well as DLs with UNA), all known individual objects must have unique (internal) distinguishing identifiers. However, these identifiers are often insufficient to allow users to figure out what real-world object they refer to, especially for large KBs. For example, system generated ref expressions in object-oriented databases [10] and blank node identifiers in RDF are semantically opaque to end-users. A specific example of this are identifiers that individual authors or the system must invent in community-developed ontologies such as Freebase [2]. There, for example, the id of the “Synchronicity” album by the Police is "/guid/9202a8c04000641f8000000002f9e349" (as of April, 2015.) (2) In Relational DBMSs, the above problem is supposedly avoided by using “external keys”: tuples of attributes whose values (strings, integers, ...) uniquely identify rows of tables. Problem (1) above will then arise in OBDA access to legacy relational systems, since the ontology will surely be object-based.1 We note that even in databases, universally unique keys are hard to find (e.g., newly arrived foreign students do not have ssn#), though they may work for subsets of individuals, such as those returned by queries. (3) Additional problems for finding identifying attributes for classes of ob- jects arise in conceptual modeling. For example, consider all cases where Ex- tended Entity-Relationship modeling creates a new heterogeneous entity set by “generalization” [5] from others. For example, we want to generalize Person (whose key might be ssn#) and Company (whose key might be tickerSymbol) to LegalEntity, which can own things. In EER modeling, such a situation forces the introduction of a new, artificial attribute as a key, with the attendant problems. Yet when we retrieve a set of legal entities, we can reference them in different, more natural ways, depending on which subclass they belong to. (4) The next example illustrates a subtler version of the above: consider the following hierarchy of concepts relating to publications: Journal v EditedCollection , EditedCollection v Publication And suppose edited collections are identified by isbn#, while journals are iden- tified by title and publisher. When we retrieve a set of objects in Publication, we would like to describe them in different ways, depending on the subclass they belong to; but in this case, there would be an additional preference for textittitle, publisher) over isbn# for elements of Journal. (5) Many kinds of KBMSs, including those based on DLs and FOL, allow us to describe situations where objects can be inferred to exist, without having an explicit (internal) identifier. For example, if Michelle is a person, then she has a mother, and if she is married then she has a spouse. Normally, such objects cannot be returned in the list of answers. This is all the more unpleasant if we can capture information about this unknown person, such as the phone number of Michelle’s mother: {Michelle} v ∃hasMother.∃hasPhone.{1234567}. Yet it is common in human communication to identify objects by their relationship to other known objects. For example, “Michelle’s mother” is a perfectly reasonable intensional description of someone who has phone 1234567. The standard response to some of the above problems would be to have the user modify the query by finding the appropriate values for identifying attributes (external keys). For example, instead of the query q1(x) :- Journal(x), the programmer would be expected to write q2(t, p) : −Journal(x), hasTitle(x, t), hasPublisher(x, p). 1 OBDA systems such as Mastro [4] and others try to deal with this issue by using function symbols over database keys to generate “object terms” that act as object identifiers. The name of the function symbols is however not semantically motivated. This approach has several problems: (i) In the enumeration of answers to q2, the relationship between the original object of interest, x, and its descriptors, t and p, is lost; something akin to “objects x with title = t and publisher = p” would be more desirable. (ii) The above reformulation cannot be done using regular conjunctive queries in the case of item 4 above, because the answer for edited collections that are not journals should be identified by isbn#, for which the query is q5(isb) : −EditedCollection(x), ¬Journal(x), hasIsbn(x, isbn) which is not a conjunctive query, since it includes a negation. (iii) From the point of view of software engineering, the task of choosing these identifying references is mentally distinct from the task of selecting the objects of interest to begin with. Both SQL’s select clause, and XQuery’s return clause are examples of separating these two aspects in existing query languages. This paper is then dedicated to the task of proposing a first solution to (some of) the issues raised by providing “singular referring expressions” in the place of individuals returned by conjunctive queries, in the context of DL KBs. Our plan and contributions are as follows: We will start by proposing a lan- guage for referring concept expressions and types. This language will generalize the usual case of presenting answers to queries as individual names to situations that: (i) allow object identification by key (paths), possibly within the limited context of some concept instances; (ii) deal with heterogeneous answer sets, such as LegalEntity; and (iii) allow preferential choice of referring expressions, as for EditedCollection. We will use this language to define answers for conjunctive queries over DL KBs. More generally, the proposed approach introduces a new separation of concerns for knowledge bases (identification vs. qualification). In our case, the query head will annotate each variable returned with an answer concept type; this will be instantiated to an answer concept (a subset of our DL concepts) for each answer; such concepts will eventually bottom out to individual constants, rather than atomic concepts. Because we wish to generalize the usual case of constant names in answers, we desire referring concept types to be singular expressions – i.e., to identify one individual.2 Unfortunately, without knowing anything else, it is impossible, for example, to tell whether an expression such as “object with p-value 3” will be singular or not: if p is a key, then yes, but not otherwise. Therefore, we need to use information from the ontology to verify the singularity of referring concept type. This can be extended by examining the body of the query (and hence learning more about what possible values variables may take). We will concentrate in this paper on: (i) the (compile-time) analysis of conjunctive query bodies in the context of the TBox to determine whether a referring concept type will return a reference to at most one individual; (ii) the reformulation of the query to 2 Researchers interested in so-called co-operative query answering have considered re- turning predicates/concepts describing sets of individuals (e.g., [1, 3, 6, 8]), where an answer to the query “Who can take the Data Structures course?” might include, “Anyone who has passed the Intro to Computer Science course with at least a C grade”. Please note that we are not considering that problem in this paper. guarantee that the referring expression will indeed return exactly 1 value. Our technical results will show that this can be done in polynomial time for the DL ∀ CF Dnc , which allows the capture of identification constraints such as keys. ∀ 2 Preliminaries: the Description Logic CF D nc ∀ The knowledge bases that we consider are based on the logic CF Dnc , a re- cent member of the CF D family of DL dialects. All members of this family are fragments of FOL with underlying signatures based on disjoint sets of unary predicate symbols called primitive concepts, constant symbols called individuals and unary function symbols called attributes. Although attributes deviate from the normal practice of using binary predicate symbols called roles, they make it easier to incorporate concept constructors suited to the capture of relational data sources that include various dependencies, e.g., by a straightforward reifi- cation of arbitrary n-ary predicates, and also make it easier to explore varieties of concepts that can serve as referring expressions. ∀ Definition 1 (CF Dnc Concepts) Let F, PC and IN be disjoint sets of (names of) attributes, primitive concepts and individuals, respectively. A path function Pf is a word in F∗ with the usual convention that the empty word is denoted ∀ by id and concatenation by “.”. The set of CF Dnc concepts C is given by the following grammar, where a ∈ IN, A ∈ PC, Pf and Pfi are path functions, k > 0 and fi ∈ F. C ::= {a} | A | ∀Pf.C | C1 u C2 | ¬A | Pf1 = Pf2 | (1) A : Pf1 .Pf, Pf2 , . . . , Pfk → Pf1 | A : Pf1 .Pf, Pf2 , . . . , Pfk → Pf1 .f Semantics is defined in the standard way with respect to an interpretation I = (4, (·)I ), where 4 is a domain of “objects” and (·)I an interpretation function that fixes the interpretation of attributes f to be total functions on 4, primitive concepts A to be subsets of 4, and individuals a, b to be elements of 4. The ({a})I = {(a)I }, (∀Pf.C)I = {x ∈ 4 | (Pf )I (x) ∈ (C)I }, (C1 u C2 )I = (C1 )I ∩ (C2 )I , (¬A)I = 4 \ (A)I , (Pf1 = Pf2 )I = {x ∈ 4 | (Pf1 )I (x) = (Pf2 )I (x)} and (A : Pf1 , . . . , Pfk → Pf)I = {x ∈ 4 | ∀ y ∈ (A)I : (Pfi )I (x) = (Pfi )I (y) → (Pf)I (x) = (Pf)I (y)} V i ∀ Fig. 1. Semantics of CFDnc Concepts. interpretation function is extended to path expressions by interpreting id as the identity function and concatenation as function composition. The semantics of ∀ the remaining CF Dnc concepts are then defined in Figure 1. 2 Concepts having the last two forms in (1) are called a (path) key and a path functional dependency (PFD), respectively. Informally, such concepts each de- note a set of objects, each of which, whenever agreeing with any A-object on all left-hand-side path functions, also agrees with that object on the right-hand-side path function. Thus, the axiom EditedCollection v EditedCollection : isbn# → id, expresses that isbn# is a key for edited collections, while P erson v P erson : home.phone# → home.address, says that if two persons have the same home phone then they have the same home address. ∀ Definition 2 (CF Dnc Knowledge Bases) Generic knowledge/metadata and ∀ specific facts/data in a CF Dnc knowledge base K are respectively defined by a TBox TK and an ABox AK . A TBox TK consists of a finite set of general concept inclusion axioms, which adhere to one of the following six forms, where A and Ai are primitive concepts, f is an attribute in F, B is a primitive concept or a negation of a primitive concept, and where Pf and Pfi are path functions: A v B ; A v ∀f.B ; ∀f.A v B ; A v (Pf1 = Pf2 ); A1 v A2 : Pf1 , . . . , Pfk → Pf. TK must also satisfy the following condition: stratification of path function equalities: If A v (Pf1 = Pf2 ) ∈ TK then A is a primitive concept that does not occur on the right-hand-side of any inclusion axiom in TK . An ABox AK consists of a finite set of axioms that express facts, each of which has one of the following two forms, respectively called individual membership assertions and individual relationship assertions, A(a) and Pf1 (a) = Pf2 (b), for individuals a, b ∈ IN, primitive concept A ∈ PC, and path functions Pfi ∈ F∗ . Note that ({a})I = {(a)I } and thus we can use nominal concepts as proxies for individuals. An interpretation I satisfies an inclusion axiom C1 v C2 if (C1 )I ⊆ (C2 )I . It satisfies ABox axioms A(a) and Pf1 (a) = Pf2 (b) if (a)I ∈ (A)I and (Pf1 )I ((a)I ) = (Pf2 )I ((b)I ), respectively. I satisfies a knowledge base K if it satisfies each axiom in K. 2 The need for the additional restriction on a TBox to avoid undecidability of TBox reasoning due to equational constraints derives in a straightforward way from the undecidability of the word problem for monoids [7, 9]. ∀ Proposition 3 (Consistency and Logical Implication in CF Dnc [11, 13]) ∀ Knowledge base consistency and logical implication for CF Dnc are complete for PTIME. 2 3 Conjunctive Queries and Certain Answers In this section we re-evaluate the way answers to conjunctive queries are under- stood and presented. Definition 4 (Conjunctive Queries) A conjunctive query (CQ) Q, with free variables {x1 , . . . , xk }, has the form ∃xk+1 , . . . , ∃xm : Body(Q) where Body(Q), the query body of Q, is a first order formula over the signature C ∪ F of the form ^ ^ C(xi ) ∧ f (xj ) = xk ) , (2) where each xi , xj and xk occurs in {x1 , . . . , xm }.3 2 ∀ Recall that our objective in this paper is to study how CF Dnc concepts can help serve the role of a singular referring expression in certain answers to conjunctive queries. To review current practice, assume K is a knowledge base over some DL dialect, and consider one of the purposes served by an ABox in defining the certain answers over K to a CQ Q: the finite collection of individual names {a1 , . . . , an } in the ABox defines a space of nk potential answers to Q: k-tuples θi that map query variables xj to individual names aij , {x1 7→ ai1 , . . . , xk 7→ aik }. Viewed as a substitution, recall that each θi is a certain answer to Q over K exactly when K |= Qθi . Note that the occurrence of an individual aij in a certain answer is a simple example of a referring expression, i.e., a syntactic artifact that identifies elements of an underlying domain. This notion of a potential answer is easily modified to accommodate a much larger variety of referring expressions. To start, one can view a potential answer θi as a set of size k that maps query variables to nominal concepts instead of individuals, as in {x1 7→ {ai1 }, . . . , xk 7→ {aik }}, ∀ and then extend this idea by allowing arbitrary CF Dnc concepts in potential answers to queries, i.e., potentially by allowing answers to have the form {x1 7→ C1 , . . . , xk 7→ Ck }. (3) In this alternative setting “certain answers” are defined as follows: Definition 5 (Referring Concepts and Certain Answers) ∀ Let K be a CF Dnc knowledge base, Q a conjunctive query with free variables ∀ x1 , . . . , xk and C1 , . . . , Ck CF Dnc concept descriptions. We say that C1 , . . . , Ck are referring concepts in a certain answer {x1 7→ C1 , . . . , xk 7→ Ck } to Q if the following two conditions hold: 1. K |= ∀x1 , . . . , xk .C1 (x1 ) ∧ . . . ∧ Ck (xk ) → Q, and 2. |{o ∈ 4 | I, [xi 7→ o] |= Ci (xi ) ∧ (∃x1 , . . . , xi−1 , xi+1 , . . . xk .Q)}| = 1 for every I |= K and 0 < i ≤ k. where C (x) is the first-order formula derived from the concept description C . 2 The first condition states that Ci objects (as values of xi s) satisfy Q and the second one that we are interested in singular referring expressions Ci , as gen- eralizations of simple individual names. In the rest of the paper we call {x1 7→ C1 , . . . , xk 7→ Ck } a candidate answer to Q if condition (1) in the above definition 3 To improve readability in the rest of the paper we allow constants to appear in CQs. However, conjuncts of the form x = a are just syntactic sugar for a conjunct {a}(x) formed using a concept {a}. Similarly, f (x) = a is ∃y(f (x) = y ∧ {a}(y)), etc. is satisfied,and call it a weakly identifying answer to Q if only an upper bound (of one) is guaranteed to hold in condition (2) We call concepts Ci that are used in this way singular referring concepts since they replace the role of individual names as referring expressions in certain answers to conjunctive queries. To illustrate, assume K captures information about persons, and consider a query with body Person(x). In the rest of the paper, we will use Pf = a as an alternative syntax for ∀Pf.{ a } (to improve readability). Possibilities for certain answers to the query now include one or more of the following: {x 7→ (ssn# = 1234)} or {x 7→ F emale u (hasSpouse.name = ’Enya’) u (hasSpouse.phone# = 1234567)}. Note that Definition 5(2) disallows answers of the form {x 7→ Female} or more generally, rules out any C for x in which |(C)I | = 6 1 when I |= K. Thus, the earlier examples of certain answers would be contingent on TK ensuring that persons have unique ssn#, as well as unique spouses, who can be identified by a (name, phone#) pair. 4 Referring Concept Types in Conjunctive Queries Allowing referring concepts beyond nominals in certain answers to a query Q may lead to infinitely many syntactically distinct certain answers. In this section, we develop a framework that ensures the set of certain answers to any conjunctive query is finite by introducing a specific language for referring concept types, which bottom out at individual nominals. ∀ Important note: while Definition 5 allows general CF Dnc concepts to serve as referring concepts, in the rest of the paper we restrict our attention to the subset of referring concepts adhering to the more limited grammar C ::= {a} | A | ∀Pf.C | C u C where {a} is a nominal, A is a primitive concept name, and Pf ∈ F∗ . These C are intuitively instances of the following referring types. Definition 6 (Referring Concept Types) A referring concept type Rt is given by the following grammar, where T denotes a finite conjunction of primitive concepts (or > standing for empty conjunction), to be called henceforth a simple type. Rt ::= {?} | Pf = {?} | Rt1 u Rt2 | T → Rt | Rt1 ; Rt2 The referring concept set RC(Rt, K) is the “extension” of a referring concept type Rt with respect to KB K, and is defined inductively as follows, where Si is short for RC(Rti , K): 1. RC({?}, K) = {{a} | a occurs in AK }; 2. RC(Pf = {?}, K) = {(Pf = {a}) | a occurs in ABox AK }; 3. RC(Rt1 u Rt2 , K) = {C1 u C2 | Ci ∈ Si }; 4. RC(T → Rt1 , K) = {T u C | C ∈ S1 }; and 5. RC(Rt1 ; Rt2 , K) = S1 ∪ {C2 ∈ S2 | ¬∃C1 ∈ S1 s.t. K |= C1 ≡ C2 }. A referring concept type is homogeneous if it is free of any occurrence of the construct in 5. 2 Examples of their use will follow immediately after the next definition. Note that, as desired, the set of referring concepts associated with a single referring concept type is finite if AK is finite. Definition 7 (Certain Answers and Singular Referring Concepts) Let Q be a conjunctive query with free variables {x1 , . . . , xk }. A query head H for Q is a set of pairs {x1 : Rt1 , . . . , xk : Rtk } that associates a referring concept type Rti with each xi . The set of certain answers to Q with respect to a head H and a knowledge base K, denoted Ans(Q, H, K), is the set of all certain answers {xi 7→ Ci | 0 < i ≤ k} to Q over K for which Ci ∈ RC(Rti , K), for 0 < i ≤ k. 2 After the examples below, the objective in this section is to show that computing ∀ Ans(Q, H, K) can be achieved in PTIME for CF Dnc , provided that referring concept types satisfy a “weak identification condition”. The following examples illustrate the use of referring concepts with con- junctive queries to extend the current situation with the more expressive cases motivated in the Introduction (a conjunctive query Q with a head H will be written in the following SQL-like style: select H where Body(Q)): 1. (expressing the current case) “Any journals published by Italians” select x1 : {?} where Journal (x1 ) ∧ (publishedBy(x1 ) = x2 ) ∧ Italian(x2 ) 2. (reference via single key) “The ssn# of any person with phone 12345567” select x : ssn# = {?} where Person(x) ∧ (phone#(x) = 1234567) 3. (multiple attribute key) “The title and publisher of any journals” select x : title = {?} u publishedBy = {?} where Journal (x) 4. (choice of identification in heterogeneous set) “Any legal entity” select x : Person → ssn# = {?} ; Company → tickerSymbol = {?} where LegalEntity(x) An example certain answer would be x 7→ Person u (ssn# = 7654) while another might be x 7→ Company u (tickerSymbol = 0 IBM0 ). 5. (preferred identification) “Any publication, identified by its most specific identifier, when available.” select x : Journal → (title = {?} u publisher = {?}) ; EditedCollection → isbn# = {?} ; {?} where Publication(x) We now make concrete our requirement that referring concepts occurring in query answers do indeed satisfy the ability to identify objects. Definition 8 (Weak Identification in Certain Answers) Let Q be a con- ∀ junctive query, H a head for Q, and T a CF Dnc TBox. Q is weakly identifying I for H with respect to T if |(C) | ≤ 1 for every ABox A, model I of T ∪ A, and candidate answer θ to Q with respect to H and T ∪ A in which x 7→ C ∈ θ. 2 Lemma 9 (Normal Form of Referring Concept Types) For every refer- ring concept type Rt, there is an equivalent normal form Rt1 ; . . . ; Rtn , denoted Norm(Rt), consisting of tagged record types Rti that are, in turn, ho- mogeneous referring concept types of the form Ti → ((Pfi,1 = {?}) u . . . u (Pfi,mi = {?})). (4) By equivalent, we mean that, for any KB K, C1 ∈ RC(Rt, K) implies there exists C2 ∈ RC(Norm(Rt), K) for which { } |= C1 ≡ C2 , and vise versa. 2 Proof (sketch): By application of the following equivalence preserving rewrites. {a} ; { } → (id = {a}) (T → Rt1 ) u Rt2 or Rt1 u (T → Rt2 ) ; T → (Rt1 u Rt2 ) (Rp1 ; Rt2 ) u Rt3 ; (Rt1 u Rt3 ) ; (Rt2 u Rt3 ) Rt1 u (Rt2 ; Rt3 ) ; (Rt1 u Rt2 ) ; (Rt1 u Rt3 ) T → (Rt1 ; Rt2 ) ; (T → Rt1 ) ; (T → Rt2 ) T1 → (T2 → Rt) ; (T1 u T2 ) → Rt 2 We now present our first main result on deciding weak identification. Our theo- rem refers to the following auxiliary function M(·) which abstracts query bodies as DL concepts (note that the function assumes, without harm, that variables are also elements of F): ∀ x.C if φ = “C(x)”; x1 .f = x2 if φ = “f (x1 ) = x2 ”; M(φ) = M(ψ 1 ) u M(ψ 2 ) if φ = “ψ1 ∧ ψ2 ”; and M(ψ) otherwise, when φ = “∃xk+1 , . . . , ∃xm : ψ”. Theorem 10 (Deciding Weak Identification) Let Q be a conjunctive query, ∀ H a head for Q, and T a CF Dnc TBox. Q is weakly identifying for H with respect to T if and only if T ∪ {AQ v ((∀x.T ) u M(Body(Q))) |= AQ v AQ : x.Pfi,1 , . . . , x.Pfi,mi → x for all (x : Rt) ∈ H, and all T → (Pf1 = {?}) u . . . u (Pfm = {?}) ∈ Norm(Rt). Proof (sketch): The if direction is straightforward. For the only-if direction, consider the earliest tagged record type Rt 0 ∈ Norm(Rt) for some x : Rt ∈ H for which the check for the logical consequence of the key PFD fails. Then one can construct an ABox A where there exists C ∈ RC(Rt 0 , T ∪ A) that occurs in some candidate answer θ for Q for which one can also construct an interpretation I for T ∪ A for which |(C)I | > 1. 2 Definition 11 Let Q be a CQ with free variables {x1 , . . . , xk }. We say that H is a homogeneous head for Q if it is of the form H = {xi : Ti → (Pfi,1 = {?}) u . . . u (Pfi,`i = {?}) | 0 < i ≤ k}. For every CQ Q and a homogeneous head H for Q we define a conjunctive query k ^ QH = Q ∧ (Ti (xi ) ∧ (Pfi,1 (xi ) = xi,1 ∧ . . . ∧ (Pfi,`i = xi,`i ) i=1 with additional free variables {xi,j | 0 < i ≤ k, 0 < j ≤ `i }. 2 To handle preferences, i.e., non-homogeneous heads H of queries of the form {xi : Rpi,1 ; . . . ; Rpi,ni | 0 < i ≤ k} we first define a sequence of homogeneous heads Hj1 ,...,jk = {xi : Rpi,ji | 0 < i ≤ k} over all 0 < ji < ni . In addition, we define a normalized ABox A0 for an ABox A to contain only individual relationship assertions of the form f (a) = b obtained from A by introducing additional individuals for the intermediate individuals participating in A’s individual relationship assertions.4 Theorem 12 Let K = (T , A) be a knowledge base, Q a CQ, and H a head for Q, such that Q is weakly identifying for H in T . Then {xi 7→ Ti u (Pfi,1 = {ai,1 }) u . . . u (Pfi,`i = {ai,`i })} ∈ Ans(Q, H, K) if and only if {xi 7→ {bi }, xi,j 7→ {ai,j } | 0 < i ≤ k, 0 < j ≤ `i } ∈ Ans(QHj1 ,...,jk , H0 , K0 ) and there is no {xi 7→ {bi }, xi,j 7→ {a0i,j } | 0 < i ≤ k, 0 < j ≤ `i } ∈ Ans(QHj0 ,...,j0 , H0 , K0 ) 1 k for Hj10 ,...,jk0 dominates Hj1 ,...,jk , where H0 = {xi : {?} | 0 < i ≤ k} ∪ {xi,j : {?} | 0 < i ≤ k, 0 < j ≤ `i }, K0 = (T , A0 ) where A0 is a normalized A, and ai,j and a0i,j are constants in A. Proof (sketch): Consider first the case where H is homogeneous. Then we can reconstruct an answer to Q and H from an answer to QH and H0 (which consists of nominal concepts only. Conversely, from answer to Q and H we can extract nominal concepts that are an answer to ∃x1 , . . . , xk .QH and H0 (restricted to free variables of ∃x1 , . . . , xk .QH ). This answer is then extended to QH and (full) H0 by using individual names introduced in the normalized ABox A0 , this extension is unique since Q is weakly identifying for H in T . For the non-homogeneous case we simply consider all possible homogeneous sub- cases and then filter the answers by the valuations of the variables x1 , . . . , xk (since those describe the possibly anonymous individuals in A using their system- assigned names A0 ). 2 Note that the queries QHj0 ,...,j0 are answered in K with respect to H0 , a triv- 1 k ial and homogeneous head: this reduces the answering to standard CQ query ∀ answering in CF Dnc knowledge base K as introduced in [12]. Corollary 13 Computing certain answers to conjunctive queries with respect ∀ to referring concepts and CF Dnc knowledge bases is complete for PTIME. 4 Such a transformation is already part of the CQ answering algorithm [12]. Note that in our setting, however, the new individuals cannot participate in Q’s answers. 5 Conclusions The paper’s contributions are as follows. First and foremost, on the non-technical side, it recognized and motivated the utility of “singular referring expressions” for query answers, which are more complex than just nominals, and it argued for the need for a new separation of concerns in query writing: qualification (what the query body does) vs. identifi- cation (how results are presented). On the specification side, the paper defined formally the notion of “query answering using referring expressions” for certain answers in conjunctive queries ∀ over DLs. In the context of CF Dnc , it introduced a specific language for referring ∀ expressions, which are a subset of CF Dnc concepts, and which allows us to handle all the motivating problems (except intensional descriptions in this paper. This language generalizes the notion of nominal, currently used in OBDA, to handle keys (as found in both the database and DL KB literature), and supports heterogeneous sets (as in the case of LegalEntity), as well as preferential choice of referring expressions (as in the EditedCollection example). On the algorithmic and complexity side, it first considered, in the context of ∀ CF Dnc , the problem of determining (in polynomial time) whether a referring concept type was “weakly” identifying in the context of a query and TBox, in the sense that its instances necessarily referred to at most one object. It also showed how one can transform a query and knowledge base so that the answers had cardinality one. This led to the result that computing certain answers to ∀ conjunctive queries with respect to referring concepts and CF Dnc KBs was complete for PTIME. There are many avenues left to explore in this work. We have already men- tioned that lack of space prevented us from considering referring concepts types for “Michelle’s mother”, which use inverse functions. Another direction to con- sider are additional forms or desirable properties of referring expressions. For example, the variety of references raises a problem: the same object may be re- turned in an answer with different references to it, even if the knowledge base works with the UNA. As a simplest example, a journal has multiple candidate keys. We are currently investigating ways to reason about and avoid such dupli- cation. The reader may also have observed that so far it was up to the programmer to select the referring expression(s) to consider for each variable. A form of type inference on the query variables would be useful, as the basis of a tool which would suggest to the user a (bounded) list of possible referring expressions that are guaranteed to have the singular reference property with respect to a particular TBox. Of course, all the technical questions considered in this paper will have dif- ferent answers for different DLs. Therefore, an orthogonal avenue of research is to re-consider these issues in the context of other DLs, especially DL-Lite. Acknowledgments: We wish to thank those reviewers who have made excellent suggestions for improving the paper, and financial support from NSERC Canada. References 1. Sonia Bergamaschi, Claudio Sartori, and Maurizio Vincini. Dl techniques for in- tensional query answering in oodbs. In Working Notes of the KI’95 Workshop: KRDB-95 Reasoning about Structured Objects: Knowledge Representation Meets Databases, page 3 pages, 1995. 2. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowl- edge. In ACM SIGMOD International Conference on Management of Data, pages 1247–1250. ACM, 2008. 3. Alexander Borgida. Description logics in data management. Knowledge and Data Engineering, IEEE Transactions on, 7(5):671–682, 1995. 4. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Mariano Rodriguez-Muro, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. The MASTRO system for ontology-based data access. Se- mantic Web, 2(1):43–53, 2011. 5. Ramez Elmasri and Shamkant B. Navathe. Fundamentals of Database Systems, 3rd Edition. Addison-Wesley-Longman, 2000. 6. Tomasz Imielinski. Intelligent query answering in rule based systems. The Journal of Logic Programming, 4(3):229–257, 1987. 7. Andrey Andreyevich Markov, Jr. On the impossibility of certain algorithm in the theory of associative systems. Dokl. Akad. Nauk SSSR, 55:587–590, 1947. 8. Amihai Motro. Intensional answers to database queries. Knowledge and Data Engineering, IEEE Transactions on, 6(3):444–454, 1994. 9. Emil Post. Recursive unsolvability of a problem of Thue. The Journal of Symbolic Logic, 12:1–11, 1947. 10. Abraham Silberschatz, Henry F. Korth, and S. Sudarshan. Database System Con- cepts, 4th Edition. McGraw-Hill Book Company, 2005. 11. David Toman and Grant E. Weddell. Conjunctive query answering in CFDnc : A PTIME description logic with functional constraints and disjointness. In AI 2013: Advances in Artificial Intelligence - 26th Australasian Joint Conference, Dunedin, New Zealand, December 1-6, 2013. Proceedings, pages 350–361, 2013. 12. David Toman and Grant E. Weddell. Answering Queries over CFD∀nc Knowl- edge Bases. Technical Report CS-2014-14, Cheriton School of Computer Science, University of Waterloo, 2014. 13. David Toman and Grant E. Weddell. On adding inverse features to the description logic CFD∀nc . In PRICAI 2014: Trends in Artificial Intelligence - 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, December 1-5, 2014., pages 587–599, 2014.