=Paper= {{Paper |id=Vol-1350/paper-13 |storemode=property |title=Singular Referring Expressions in Conjunctive Query Answers: the case for a CFD DL Dialect |pdfUrl=https://ceur-ws.org/Vol-1350/paper-13.pdf |volume=Vol-1350 |dblpUrl=https://dblp.org/rec/conf/dlog/BorgidaTW15 }} ==Singular Referring Expressions in Conjunctive Query Answers: the case for a CFD DL Dialect== https://ceur-ws.org/Vol-1350/paper-13.pdf
 Singular Referring Expressions in Conjunctive
 Query Answers: the case for a CF D DL Dialect

             Alexander Borgida† , David Toman‡ and Grant Weddell‡

                              †
                            Department of Computer Science
                        Rutgers University, New Brunswick, USA
                               borgida@cs.rutgers.edu
                          ‡
                              Cheriton School of Computer Science
                                University of Waterloo, Canada
                               {david,gweddell}@uwaterloo.ca



       Abstract. A referring expression in linguistics is any noun phrase iden-
       tifying an object in a way that will be useful to interlocutors. In the
       context of conjunctive queries over a description logic knowledge base
       (DL KB), typically constant symbols (usually treated as rigid designa-
       tors) are used as referring expressions in a certain answer to the query.
       In this paper, we begin to explore how this can be usefully generalized by
       allowing more general DL concept descriptions, called singular referring
       expressions, to replace constants in this role. In particular, we lay the
       foundation for singular referring expressions in conjunctive query answers
       over a DL KB using a member of the CFD family of DL dialects. In the
       process, we introduce a specific language for referring concept types, and
       present initial results on how conjunctive queries with referring concept
       types can be efficiently supported.


1    Introduction and Motivation
Query answering in logic-based approaches to data and knowledge bases has
traditionally been viewed as finding constant names, appearing in the knowledge-
base, which can be substituted for the variables of the query. More formally, a
query q(x1 , . . . , xn ) is viewed as a formula with free variables x1 , . . . , xn and, if
the knowledge-base K contains individual constant names IN, query answering
consists of computing the set { (a1 , . . . , an ) | ai ∈ IN, K |= q(a1 /x1 , . . . , ak /xk ) }.
We believe that in a number of circumstances this is less than ideal.
    (1) In object-based KBMSs (including Object-Relational, XML and Object-
Oriented DBMSs, as well as DLs with UNA), all known individual objects must
have unique (internal) distinguishing identifiers. However, these identifiers are
often insufficient to allow users to figure out what real-world object they refer
to, especially for large KBs. For example, system generated ref expressions in
object-oriented databases [10] and blank node identifiers in RDF are semantically
opaque to end-users. A specific example of this are identifiers that individual
authors or the system must invent in community-developed ontologies such as
Freebase [2]. There, for example, the id of the “Synchronicity” album by the
Police is "/guid/9202a8c04000641f8000000002f9e349" (as of April, 2015.)
     (2) In Relational DBMSs, the above problem is supposedly avoided by using
“external keys”: tuples of attributes whose values (strings, integers, ...) uniquely
identify rows of tables. Problem (1) above will then arise in OBDA access to
legacy relational systems, since the ontology will surely be object-based.1
     We note that even in databases, universally unique keys are hard to find
(e.g., newly arrived foreign students do not have ssn#), though they may work
for subsets of individuals, such as those returned by queries.
     (3) Additional problems for finding identifying attributes for classes of ob-
jects arise in conceptual modeling. For example, consider all cases where Ex-
tended Entity-Relationship modeling creates a new heterogeneous entity set by
“generalization” [5] from others. For example, we want to generalize Person
(whose key might be ssn#) and Company (whose key might be tickerSymbol) to
LegalEntity, which can own things. In EER modeling, such a situation forces the
introduction of a new, artificial attribute as a key, with the attendant problems.
Yet when we retrieve a set of legal entities, we can reference them in different,
more natural ways, depending on which subclass they belong to.
     (4) The next example illustrates a subtler version of the above: consider the
following hierarchy of concepts relating to publications:
          Journal v EditedCollection , EditedCollection v Publication
And suppose edited collections are identified by isbn#, while journals are iden-
tified by title and publisher. When we retrieve a set of objects in Publication,
we would like to describe them in different ways, depending on the subclass
they belong to; but in this case, there would be an additional preference for
textittitle, publisher) over isbn# for elements of Journal.
     (5) Many kinds of KBMSs, including those based on DLs and FOL, allow
us to describe situations where objects can be inferred to exist, without having
an explicit (internal) identifier. For example, if Michelle is a person, then she has
a mother, and if she is married then she has a spouse. Normally, such objects
cannot be returned in the list of answers. This is all the more unpleasant if we
can capture information about this unknown person, such as the phone number
of Michelle’s mother: {Michelle} v ∃hasMother.∃hasPhone.{1234567}. Yet it is
common in human communication to identify objects by their relationship to
other known objects. For example, “Michelle’s mother” is a perfectly reasonable
intensional description of someone who has phone 1234567.
   The standard response to some of the above problems would be to have the
user modify the query by finding the appropriate values for identifying attributes
(external keys). For example, instead of the query q1(x) :- Journal(x), the
programmer would be expected to write
          q2(t, p) : −Journal(x), hasTitle(x, t), hasPublisher(x, p).
1
    OBDA systems such as Mastro [4] and others try to deal with this issue by using
    function symbols over database keys to generate “object terms” that act as object
    identifiers. The name of the function symbols is however not semantically motivated.
This approach has several problems: (i) In the enumeration of answers to q2,
the relationship between the original object of interest, x, and its descriptors, t
and p, is lost; something akin to “objects x with title = t and publisher =
p” would be more desirable. (ii) The above reformulation cannot be done using
regular conjunctive queries in the case of item 4 above, because the answer for
edited collections that are not journals should be identified by isbn#, for which
the query is
       q5(isb) : −EditedCollection(x), ¬Journal(x), hasIsbn(x, isbn)
which is not a conjunctive query, since it includes a negation. (iii) From the point
of view of software engineering, the task of choosing these identifying references
is mentally distinct from the task of selecting the objects of interest to begin
with. Both SQL’s select clause, and XQuery’s return clause are examples of
separating these two aspects in existing query languages.
    This paper is then dedicated to the task of proposing a first solution to (some
of) the issues raised by providing “singular referring expressions” in the place of
individuals returned by conjunctive queries, in the context of DL KBs.
    Our plan and contributions are as follows: We will start by proposing a lan-
guage for referring concept expressions and types. This language will generalize
the usual case of presenting answers to queries as individual names to situations
that: (i) allow object identification by key (paths), possibly within the limited
context of some concept instances; (ii) deal with heterogeneous answer sets, such
as LegalEntity; and (iii) allow preferential choice of referring expressions, as for
EditedCollection. We will use this language to define answers for conjunctive
queries over DL KBs. More generally, the proposed approach introduces a new
separation of concerns for knowledge bases (identification vs. qualification). In
our case, the query head will annotate each variable returned with an answer
concept type; this will be instantiated to an answer concept (a subset of our DL
concepts) for each answer; such concepts will eventually bottom out to individual
constants, rather than atomic concepts.
    Because we wish to generalize the usual case of constant names in answers,
we desire referring concept types to be singular expressions – i.e., to identify
one individual.2 Unfortunately, without knowing anything else, it is impossible,
for example, to tell whether an expression such as “object with p-value 3” will be
singular or not: if p is a key, then yes, but not otherwise. Therefore, we need to use
information from the ontology to verify the singularity of referring concept type.
This can be extended by examining the body of the query (and hence learning
more about what possible values variables may take). We will concentrate in
this paper on: (i) the (compile-time) analysis of conjunctive query bodies in the
context of the TBox to determine whether a referring concept type will return
a reference to at most one individual; (ii) the reformulation of the query to
2
    Researchers interested in so-called co-operative query answering have considered re-
    turning predicates/concepts describing sets of individuals (e.g., [1, 3, 6, 8]), where an
    answer to the query “Who can take the Data Structures course?” might include,
    “Anyone who has passed the Intro to Computer Science course with at least a C
    grade”. Please note that we are not considering that problem in this paper.
guarantee that the referring expression will indeed return exactly 1 value. Our
technical results will show that this can be done in polynomial time for the DL
     ∀
CF Dnc , which allows the capture of identification constraints such as keys.

                                               ∀
2    Preliminaries: the Description Logic CF D nc
                                                                        ∀
The knowledge bases that we consider are based on the logic CF Dnc        , a re-
cent member of the CF D family of DL dialects. All members of this family are
fragments of FOL with underlying signatures based on disjoint sets of unary
predicate symbols called primitive concepts, constant symbols called individuals
and unary function symbols called attributes. Although attributes deviate from
the normal practice of using binary predicate symbols called roles, they make
it easier to incorporate concept constructors suited to the capture of relational
data sources that include various dependencies, e.g., by a straightforward reifi-
cation of arbitrary n-ary predicates, and also make it easier to explore varieties
of concepts that can serve as referring expressions.
                         ∀
Definition 1 (CF Dnc         Concepts) Let F, PC and IN be disjoint sets of (names
of) attributes, primitive concepts and individuals, respectively. A path function
Pf is a word in F∗ with the usual convention that the empty word is denoted
                                                            ∀
by id and concatenation by “.”. The set of CF Dnc              concepts C is given by the
following grammar, where a ∈ IN, A ∈ PC, Pf and Pfi are path functions, k > 0
and fi ∈ F.
    C ::= {a} | A | ∀Pf.C | C1 u C2 | ¬A | Pf1 = Pf2 |
                                                                                           (1)
            A : Pf1 .Pf, Pf2 , . . . , Pfk → Pf1 | A : Pf1 .Pf, Pf2 , . . . , Pfk → Pf1 .f
Semantics is defined in the standard way with respect to an interpretation I =
(4, (·)I ), where 4 is a domain of “objects” and (·)I an interpretation function
that fixes the interpretation of attributes f to be total functions on 4, primitive
concepts A to be subsets of 4, and individuals a, b to be elements of 4. The

                        ({a})I   =   {(a)I },
                      (∀Pf.C)I   =   {x ∈ 4 | (Pf )I (x) ∈ (C)I },
                   (C1 u C2 )I   =   (C1 )I ∩ (C2 )I ,
                        (¬A)I    =   4 \ (A)I ,
                 (Pf1 = Pf2 )I   =   {x ∈ 4 | (Pf1 )I (x) = (Pf2 )I (x)} and
 (A : Pf1 , . . . , Pfk → Pf)I   =   {x ∈ 4 | ∀ y ∈ (A)I :
                                             (Pfi )I (x) = (Pfi )I (y) → (Pf)I (x) = (Pf)I (y)}
                                        V
                                           i


                                                   ∀
                           Fig. 1. Semantics of CFDnc Concepts.
interpretation function is extended to path expressions by interpreting id as the
identity function and concatenation as function composition. The semantics of
                     ∀
the remaining CF Dnc   concepts are then defined in Figure 1.                  2
Concepts having the last two forms in (1) are called a (path) key and a path
functional dependency (PFD), respectively. Informally, such concepts each de-
note a set of objects, each of which, whenever agreeing with any A-object on all
left-hand-side path functions, also agrees with that object on the right-hand-side
path function. Thus, the axiom EditedCollection v EditedCollection : isbn# →
id, expresses that isbn# is a key for edited collections, while P erson v P erson :
home.phone# → home.address, says that if two persons have the same home
phone then they have the same home address.
                      ∀
Definition 2 (CF Dnc     Knowledge Bases) Generic knowledge/metadata and
                              ∀
specific facts/data in a CF Dnc  knowledge base K are respectively defined by a
TBox TK and an ABox AK .
A TBox TK consists of a finite set of general concept inclusion axioms, which
adhere to one of the following six forms, where A and Ai are primitive concepts,
f is an attribute in F, B is a primitive concept or a negation of a primitive
concept, and where Pf and Pfi are path functions:
 A v B ; A v ∀f.B ; ∀f.A v B ; A v (Pf1 = Pf2 ); A1 v A2 : Pf1 , . . . , Pfk → Pf.
TK must also satisfy the following condition:

 stratification of path function equalities: If A v (Pf1 = Pf2 ) ∈ TK then A is a
    primitive concept that does not occur on the right-hand-side of any inclusion
    axiom in TK .

An ABox AK consists of a finite set of axioms that express facts, each of which
has one of the following two forms, respectively called individual membership
assertions and individual relationship assertions,
                              A(a) and Pf1 (a) = Pf2 (b),
for individuals a, b ∈ IN, primitive concept A ∈ PC, and path functions Pfi ∈ F∗ .
Note that ({a})I = {(a)I } and thus we can use nominal concepts as proxies
for individuals. An interpretation I satisfies an inclusion axiom C1 v C2 if
(C1 )I ⊆ (C2 )I . It satisfies ABox axioms A(a) and Pf1 (a) = Pf2 (b) if (a)I ∈ (A)I
and (Pf1 )I ((a)I ) = (Pf2 )I ((b)I ), respectively. I satisfies a knowledge base K if
it satisfies each axiom in K.                                                       2
The need for the additional restriction on a TBox to avoid undecidability of
TBox reasoning due to equational constraints derives in a straightforward way
from the undecidability of the word problem for monoids [7, 9].
                                                                   ∀
Proposition 3 (Consistency and Logical Implication in CF Dnc         [11, 13])
                                                           ∀
Knowledge base consistency and logical implication for CF Dnc are complete for
PTIME.                                                                      2


3    Conjunctive Queries and Certain Answers
In this section we re-evaluate the way answers to conjunctive queries are under-
stood and presented.
Definition 4 (Conjunctive Queries) A conjunctive query (CQ) Q, with free
variables {x1 , . . . , xk }, has the form ∃xk+1 , . . . , ∃xm : Body(Q) where Body(Q),
the query body of Q, is a first order formula over the signature C ∪ F of the form
                        ^           ^                 
                              C(xi ) ∧     f (xj ) = xk ) ,                     (2)
where each xi , xj and xk occurs in {x1 , . . . , xm }.3                                  2
                                                                    ∀
Recall that our objective in this paper is to study how CF Dnc        concepts can help
serve the role of a singular referring expression in certain answers to conjunctive
queries. To review current practice, assume K is a knowledge base over some
DL dialect, and consider one of the purposes served by an ABox in defining
the certain answers over K to a CQ Q: the finite collection of individual names
{a1 , . . . , an } in the ABox defines a space of nk potential answers to Q: k-tuples
θi that map query variables xj to individual names aij ,
                                {x1 7→ ai1 , . . . , xk 7→ aik }.
Viewed as a substitution, recall that each θi is a certain answer to Q over K
exactly when K |= Qθi . Note that the occurrence of an individual aij in a certain
answer is a simple example of a referring expression, i.e., a syntactic artifact that
identifies elements of an underlying domain.
    This notion of a potential answer is easily modified to accommodate a much
larger variety of referring expressions. To start, one can view a potential answer
θi as a set of size k that maps query variables to nominal concepts instead of
individuals, as in
                              {x1 7→ {ai1 }, . . . , xk 7→ {aik }},
                                                             ∀
and then extend this idea by allowing arbitrary CF Dnc         concepts in potential
answers to queries, i.e., potentially by allowing answers to have the form
                             {x1 7→ C1 , . . . , xk 7→ Ck }.                    (3)
In this alternative setting “certain answers” are defined as follows:
Definition 5 (Referring Concepts and Certain Answers)
                      ∀
Let K be a CF Dnc           knowledge base, Q a conjunctive query with free variables
                                        ∀
x1 , . . . , xk and C1 , . . . , Ck CF Dnc concept descriptions. We say that C1 , . . . , Ck
are referring concepts in a certain answer {x1 7→ C1 , . . . , xk 7→ Ck } to Q if the
following two conditions hold:

 1. K |= ∀x1 , . . . , xk .C1 (x1 ) ∧ . . . ∧ Ck (xk ) → Q, and
 2. |{o ∈ 4 | I, [xi 7→ o] |= Ci (xi ) ∧ (∃x1 , . . . , xi−1 , xi+1 , . . . xk .Q)}| = 1
                                                           for every I |= K and 0 < i ≤ k.

where C (x) is the first-order formula derived from the concept description C . 2
The first condition states that Ci objects (as values of xi s) satisfy Q and the
second one that we are interested in singular referring expressions Ci , as gen-
eralizations of simple individual names. In the rest of the paper we call {x1 7→
C1 , . . . , xk 7→ Ck } a candidate answer to Q if condition (1) in the above definition
3
    To improve readability in the rest of the paper we allow constants to appear in CQs.
    However, conjuncts of the form x = a are just syntactic sugar for a conjunct {a}(x)
    formed using a concept {a}. Similarly, f (x) = a is ∃y(f (x) = y ∧ {a}(y)), etc.
is satisfied,and call it a weakly identifying answer to Q if only an upper bound
(of one) is guaranteed to hold in condition (2) We call concepts Ci that are used
in this way singular referring concepts since they replace the role of individual
names as referring expressions in certain answers to conjunctive queries.
    To illustrate, assume K captures information about persons, and consider a
query with body Person(x). In the rest of the paper, we will use Pf = a as an
alternative syntax for ∀Pf.{ a } (to improve readability). Possibilities for certain
answers to the query now include one or more of the following:
                               {x 7→ (ssn# = 1234)}
or
{x 7→ F emale u (hasSpouse.name = ’Enya’) u (hasSpouse.phone# = 1234567)}.
Note that Definition 5(2) disallows answers of the form {x 7→ Female} or more
generally, rules out any C for x in which |(C)I | = 6 1 when I |= K. Thus, the
earlier examples of certain answers would be contingent on TK ensuring that
persons have unique ssn#, as well as unique spouses, who can be identified by
a (name, phone#) pair.


4   Referring Concept Types in Conjunctive Queries
Allowing referring concepts beyond nominals in certain answers to a query Q may
lead to infinitely many syntactically distinct certain answers. In this section, we
develop a framework that ensures the set of certain answers to any conjunctive
query is finite by introducing a specific language for referring concept types,
which bottom out at individual nominals.
                                                             ∀
    Important note: while Definition 5 allows general CF Dnc     concepts to serve
as referring concepts, in the rest of the paper we restrict our attention to the
subset of referring concepts adhering to the more limited grammar
                    C ::= {a} | A | ∀Pf.C | C u C
where {a} is a nominal, A is a primitive concept name, and Pf ∈ F∗ . These C
are intuitively instances of the following referring types.
Definition 6 (Referring Concept Types)
A referring concept type Rt is given by the following grammar, where T denotes
a finite conjunction of primitive concepts (or > standing for empty conjunction),
to be called henceforth a simple type.
      Rt ::= {?} | Pf = {?} | Rt1 u Rt2 | T → Rt | Rt1 ; Rt2
The referring concept set RC(Rt, K) is the “extension” of a referring concept
type Rt with respect to KB K, and is defined inductively as follows, where Si is
short for RC(Rti , K):

 1. RC({?}, K) = {{a} | a occurs in AK };
 2. RC(Pf = {?}, K) = {(Pf = {a}) | a occurs in ABox AK };
 3. RC(Rt1 u Rt2 , K) = {C1 u C2 | Ci ∈ Si };
 4. RC(T → Rt1 , K) = {T u C | C ∈ S1 }; and
 5. RC(Rt1 ; Rt2 , K) = S1 ∪ {C2 ∈ S2 | ¬∃C1 ∈ S1 s.t. K |= C1 ≡ C2 }.

A referring concept type is homogeneous if it is free of any occurrence of the
construct in 5.                                                             2
Examples of their use will follow immediately after the next definition. Note
that, as desired, the set of referring concepts associated with a single referring
concept type is finite if AK is finite.
Definition 7 (Certain Answers and Singular Referring Concepts)
Let Q be a conjunctive query with free variables {x1 , . . . , xk }. A query head H
for Q is a set of pairs {x1 : Rt1 , . . . , xk : Rtk } that associates a referring concept
type Rti with each xi .
The set of certain answers to Q with respect to a head H and a knowledge base
K, denoted Ans(Q, H, K), is the set of all certain answers {xi 7→ Ci | 0 < i ≤ k}
to Q over K for which Ci ∈ RC(Rti , K), for 0 < i ≤ k.                                  2
After the examples below, the objective in this section is to show that computing
                                                      ∀
Ans(Q, H, K) can be achieved in PTIME for CF Dnc         , provided that referring
concept types satisfy a “weak identification condition”.
   The following examples illustrate the use of referring concepts with con-
junctive queries to extend the current situation with the more expressive cases
motivated in the Introduction (a conjunctive query Q with a head H will be
written in the following SQL-like style: select H where Body(Q)):
 1. (expressing the current case) “Any journals published by Italians”
         select x1 : {?}
         where Journal (x1 ) ∧ (publishedBy(x1 ) = x2 ) ∧ Italian(x2 )
 2. (reference via single key) “The ssn# of any person with phone 12345567”
         select x : ssn# = {?}
         where Person(x) ∧ (phone#(x) = 1234567)
 3. (multiple attribute key) “The title and publisher of any journals”
         select x : title = {?} u publishedBy = {?}
         where Journal (x)
 4. (choice of identification in heterogeneous set) “Any legal entity”
        select x : Person → ssn# = {?} ; Company → tickerSymbol = {?}
        where LegalEntity(x)
    An example certain answer would be x 7→ Person u (ssn# = 7654) while
    another might be x 7→ Company u (tickerSymbol = 0 IBM0 ).
 5. (preferred identification) “Any publication, identified by its most specific
    identifier, when available.”
         select x : Journal → (title = {?} u publisher = {?}) ;
                    EditedCollection → isbn# = {?} ; {?}
         where Publication(x)

We now make concrete our requirement that referring concepts occurring in
query answers do indeed satisfy the ability to identify objects.
Definition 8 (Weak Identification in Certain Answers) Let Q be a con-
                                                ∀
junctive query, H a head for Q, and T a CF Dnc    TBox. Q is weakly identifying
                               I
for H with respect to T if |(C) | ≤ 1 for every ABox A, model I of T ∪ A, and
candidate answer θ to Q with respect to H and T ∪ A in which x 7→ C ∈ θ. 2
Lemma 9 (Normal Form of Referring Concept Types) For every refer-
ring concept type Rt, there is an equivalent normal form
                                  Rt1 ; . . . ; Rtn ,
denoted Norm(Rt), consisting of tagged record types Rti that are, in turn, ho-
mogeneous referring concept types of the form
                  Ti → ((Pfi,1 = {?}) u . . . u (Pfi,mi = {?})).           (4)
By equivalent, we mean that, for any KB K, C1 ∈ RC(Rt, K) implies there exists
C2 ∈ RC(Norm(Rt), K) for which { } |= C1 ≡ C2 , and vise versa.             2
Proof (sketch): By application of the following equivalence preserving rewrites.
                                       {a}    ;    { } → (id = {a})
    (T → Rt1 ) u Rt2 or Rt1 u (T → Rt2 )      ;    T → (Rt1 u Rt2 )
                        (Rp1 ; Rt2 ) u Rt3    ;    (Rt1 u Rt3 ) ; (Rt2 u Rt3 )
                         Rt1 u (Rt2 ; Rt3 )   ;    (Rt1 u Rt2 ) ; (Rt1 u Rt3 )
                          T → (Rt1 ; Rt2 )    ;    (T → Rt1 ) ; (T → Rt2 )
                         T1 → (T2 → Rt)       ;    (T1 u T2 ) → Rt               2
We now present our first main result on deciding weak identification. Our theo-
rem refers to the following auxiliary function M(·) which abstracts query bodies
as DL concepts (note that the function assumes, without harm, that variables
are also elements of F):
            
            
              ∀ x.C               if φ = “C(x)”;
               x1 .f = x2          if φ = “f (x1 ) = x2 ”;
            
   M(φ) =
            
              M(ψ   1 ) u M(ψ 2 ) if φ = “ψ1 ∧ ψ2 ”; and
               M(ψ)                otherwise, when φ = “∃xk+1 , . . . , ∃xm : ψ”.
            

Theorem 10 (Deciding Weak Identification) Let Q be a conjunctive query,
                               ∀
H a head for Q, and T a CF Dnc   TBox. Q is weakly identifying for H with respect
to T if and only if
  T ∪ {AQ v ((∀x.T ) u M(Body(Q))) |= AQ v AQ : x.Pfi,1 , . . . , x.Pfi,mi → x
for all (x : Rt) ∈ H, and all T → (Pf1 = {?}) u . . . u (Pfm = {?}) ∈ Norm(Rt).
Proof (sketch): The if direction is straightforward. For the only-if direction,
consider the earliest tagged record type Rt 0 ∈ Norm(Rt) for some x : Rt ∈ H
for which the check for the logical consequence of the key PFD fails. Then one
can construct an ABox A where there exists C ∈ RC(Rt 0 , T ∪ A) that occurs in
some candidate answer θ for Q for which one can also construct an interpretation
I for T ∪ A for which |(C)I | > 1.                                             2
Definition 11 Let Q be a CQ with free variables {x1 , . . . , xk }. We say that H
is a homogeneous head for Q if it is of the form
        H = {xi : Ti → (Pfi,1 = {?}) u . . . u (Pfi,`i = {?}) | 0 < i ≤ k}.
For every CQ Q and a homogeneous head H for Q we define a conjunctive query
                   k
                   ^
         QH = Q ∧     (Ti (xi ) ∧ (Pfi,1 (xi ) = xi,1 ∧ . . . ∧ (Pfi,`i = xi,`i )
                       i=1
with additional free variables {xi,j | 0 < i ≤ k, 0 < j ≤ `i }.                          2
To handle preferences, i.e., non-homogeneous heads H of queries of the form
{xi : Rpi,1 ; . . . ; Rpi,ni | 0 < i ≤ k} we first define a sequence of homogeneous
heads
                             Hj1 ,...,jk = {xi : Rpi,ji | 0 < i ≤ k}
over all 0 < ji < ni . In addition, we define a normalized ABox A0 for an ABox A
to contain only individual relationship assertions of the form f (a) = b obtained
from A by introducing additional individuals for the intermediate individuals
participating in A’s individual relationship assertions.4
Theorem 12 Let K = (T , A) be a knowledge base, Q a CQ, and H a head for
Q, such that Q is weakly identifying for H in T . Then
     {xi 7→ Ti u (Pfi,1 = {ai,1 }) u . . . u (Pfi,`i = {ai,`i })} ∈ Ans(Q, H, K)
if and only if
    {xi 7→ {bi }, xi,j 7→ {ai,j } | 0 < i ≤ k, 0 < j ≤ `i } ∈ Ans(QHj1 ,...,jk , H0 , K0 )
and there is no
    {xi 7→ {bi }, xi,j 7→ {a0i,j } | 0 < i ≤ k, 0 < j ≤ `i } ∈ Ans(QHj0 ,...,j0 , H0 , K0 )
                                                                       1    k

for Hj10 ,...,jk0 dominates Hj1 ,...,jk , where H0 = {xi : {?} | 0 < i ≤ k} ∪ {xi,j : {?} |
0 < i ≤ k, 0 < j ≤ `i }, K0 = (T , A0 ) where A0 is a normalized A, and ai,j and
a0i,j are constants in A.
Proof (sketch): Consider first the case where H is homogeneous. Then we can
reconstruct an answer to Q and H from an answer to QH and H0 (which consists
of nominal concepts only. Conversely, from answer to Q and H we can extract
nominal concepts that are an answer to ∃x1 , . . . , xk .QH and H0 (restricted to free
variables of ∃x1 , . . . , xk .QH ). This answer is then extended to QH and (full) H0
by using individual names introduced in the normalized ABox A0 , this extension
is unique since Q is weakly identifying for H in T .
For the non-homogeneous case we simply consider all possible homogeneous sub-
cases and then filter the answers by the valuations of the variables x1 , . . . , xk
(since those describe the possibly anonymous individuals in A using their system-
assigned names A0 ).                                                                2
Note that the queries QHj0 ,...,j0 are answered in K with respect to H0 , a triv-
                           1     k
ial and homogeneous head: this reduces the answering to standard CQ query
                   ∀
answering in CF Dnc   knowledge base K as introduced in [12].
Corollary 13 Computing certain answers to conjunctive queries with respect
                                 ∀
to referring concepts and CF Dnc    knowledge bases is complete for PTIME.
4
    Such a transformation is already part of the CQ answering algorithm [12]. Note that
    in our setting, however, the new individuals cannot participate in Q’s answers.
5    Conclusions
The paper’s contributions are as follows.
    First and foremost, on the non-technical side, it recognized and motivated
the utility of “singular referring expressions” for query answers, which are more
complex than just nominals, and it argued for the need for a new separation of
concerns in query writing: qualification (what the query body does) vs. identifi-
cation (how results are presented).
    On the specification side, the paper defined formally the notion of “query
answering using referring expressions” for certain answers in conjunctive queries
                                   ∀
over DLs. In the context of CF Dnc   , it introduced a specific language for referring
                                               ∀
expressions, which are a subset of CF Dnc        concepts, and which allows us to
handle all the motivating problems (except intensional descriptions in this paper.
This language generalizes the notion of nominal, currently used in OBDA, to
handle keys (as found in both the database and DL KB literature), and supports
heterogeneous sets (as in the case of LegalEntity), as well as preferential choice
of referring expressions (as in the EditedCollection example).
    On the algorithmic and complexity side, it first considered, in the context of
      ∀
CF Dnc  , the problem of determining (in polynomial time) whether a referring
concept type was “weakly” identifying in the context of a query and TBox, in
the sense that its instances necessarily referred to at most one object. It also
showed how one can transform a query and knowledge base so that the answers
had cardinality one. This led to the result that computing certain answers to
                                                                        ∀
conjunctive queries with respect to referring concepts and CF Dnc           KBs was
complete for PTIME.
    There are many avenues left to explore in this work. We have already men-
tioned that lack of space prevented us from considering referring concepts types
for “Michelle’s mother”, which use inverse functions. Another direction to con-
sider are additional forms or desirable properties of referring expressions. For
example, the variety of references raises a problem: the same object may be re-
turned in an answer with different references to it, even if the knowledge base
works with the UNA. As a simplest example, a journal has multiple candidate
keys. We are currently investigating ways to reason about and avoid such dupli-
cation.
    The reader may also have observed that so far it was up to the programmer
to select the referring expression(s) to consider for each variable. A form of
type inference on the query variables would be useful, as the basis of a tool
which would suggest to the user a (bounded) list of possible referring expressions
that are guaranteed to have the singular reference property with respect to a
particular TBox.
    Of course, all the technical questions considered in this paper will have dif-
ferent answers for different DLs. Therefore, an orthogonal avenue of research is
to re-consider these issues in the context of other DLs, especially DL-Lite.

Acknowledgments: We wish to thank those reviewers who have made excellent
suggestions for improving the paper, and financial support from NSERC Canada.
References
 1. Sonia Bergamaschi, Claudio Sartori, and Maurizio Vincini. Dl techniques for in-
    tensional query answering in oodbs. In Working Notes of the KI’95 Workshop:
    KRDB-95 Reasoning about Structured Objects: Knowledge Representation Meets
    Databases, page 3 pages, 1995.
 2. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor.
    Freebase: a collaboratively created graph database for structuring human knowl-
    edge. In ACM SIGMOD International Conference on Management of Data, pages
    1247–1250. ACM, 2008.
 3. Alexander Borgida. Description logics in data management. Knowledge and Data
    Engineering, IEEE Transactions on, 7(5):671–682, 1995.
 4. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,
    Antonella Poggi, Mariano Rodriguez-Muro, Riccardo Rosati, Marco Ruzzi, and
    Domenico Fabio Savo. The MASTRO system for ontology-based data access. Se-
    mantic Web, 2(1):43–53, 2011.
 5. Ramez Elmasri and Shamkant B. Navathe. Fundamentals of Database Systems,
    3rd Edition. Addison-Wesley-Longman, 2000.
 6. Tomasz Imielinski. Intelligent query answering in rule based systems. The Journal
    of Logic Programming, 4(3):229–257, 1987.
 7. Andrey Andreyevich Markov, Jr. On the impossibility of certain algorithm in the
    theory of associative systems. Dokl. Akad. Nauk SSSR, 55:587–590, 1947.
 8. Amihai Motro. Intensional answers to database queries. Knowledge and Data
    Engineering, IEEE Transactions on, 6(3):444–454, 1994.
 9. Emil Post. Recursive unsolvability of a problem of Thue. The Journal of Symbolic
    Logic, 12:1–11, 1947.
10. Abraham Silberschatz, Henry F. Korth, and S. Sudarshan. Database System Con-
    cepts, 4th Edition. McGraw-Hill Book Company, 2005.
11. David Toman and Grant E. Weddell. Conjunctive query answering in CFDnc : A
    PTIME description logic with functional constraints and disjointness. In AI 2013:
    Advances in Artificial Intelligence - 26th Australasian Joint Conference, Dunedin,
    New Zealand, December 1-6, 2013. Proceedings, pages 350–361, 2013.
12. David Toman and Grant E. Weddell. Answering Queries over CFD∀nc Knowl-
    edge Bases. Technical Report CS-2014-14, Cheriton School of Computer Science,
    University of Waterloo, 2014.
13. David Toman and Grant E. Weddell. On adding inverse features to the description
    logic CFD∀nc . In PRICAI 2014: Trends in Artificial Intelligence - 13th Pacific Rim
    International Conference on Artificial Intelligence, Gold Coast, QLD, Australia,
    December 1-5, 2014., pages 587–599, 2014.