=Paper=
{{Paper
|id=Vol-2211/paper-34
|storemode=property
|title=Identity Resolution in Conjunctive Querying over DL-Based Knowledge Bases
|pdfUrl=https://ceur-ws.org/Vol-2211/paper-34.pdf
|volume=Vol-2211
|authors=David Toman,Grant Weddell
|dblpUrl=https://dblp.org/rec/conf/dlog/TomanW18
}}
==Identity Resolution in Conjunctive Querying over DL-Based Knowledge Bases==
<pdf width="1500px">https://ceur-ws.org/Vol-2211/paper-34.pdf</pdf>
<pre>
    Identity Resolution in Conjunctive Querying
          over DL-based Knowledge Bases

                          David Toman and Grant Weddell

       Cheriton School of Computer Science, University of Waterloo, Canada
                         {david,gweddell}@uwaterloo.ca


       Abstract. Earlier work has proposed a notion of referring expressions
       and types in first order knowledge bases as a way of more effectively
       answering conjunctive queries in ontology based data access (OBDA). We
       consider how PTIME description logics can be combined with referring
       expressions to provide a more effective virtual front-end to relational data
       sources via OBDA. In particular, we consider replacing the standard
       notion of an assertion box, or ABox, with a more general notion of a
       concept box, or CBox.


1    Introduction

In a query answer (a1 , . . . , an ) over a first order knowledge base K, the common
assumption is that each ai will correspond to some constant symbol occurring
in K. A more general option has been proposed in [1] in which each ai can
now be a referring expression, in particular, a well-formed formulae ψ that is
free in one variable and that satisfies a number of additional conditions for any
interpretation I of K. First, ψ should not be vacuous: it should hold of at least
one individual in 4I . Second, ψ should be singular : it should hold of at most
one individual in 4I . And third, the singularity property of ψ should be ensured
by the ontological component of K.
    In this paper, we consider query answering in which the ontological compo-
nent of K consists of a TBox T expressed in terms of a DL, and in which the
remaining part of K consists of a CBox C instead of an ABox, where C consists
of a finite set of referring expressions in the form of concept descriptions in the
DL, and for which each is presumed non-empty and singular in all models of K.
    The DL we consider is partial −CF DI ∀−         nc [4, 6], a dialect of the PTIME
feature-based CF D family designed for capturing relational data sources, and
our main focus is on query answering over a knowledge base K consisting of a
TBox and CBox pair (T , C) expressed in terms of partial −CF DI ∀−        nc . The main
technical difficulty is on mapping C to a combination of an ABox A and a way
of distinguishing the constant symbols occurring in A that “stand in place”
of referring expressions in C. This must be done in a way that ensures off-the-
shelf query answering over (T , A) can be used to compute the certain answers to
queries over the original K = (T , C) by a simple substitution of the distinguished
constants by their referring expressions.
    A core problem in deriving the ABox relates to identity issues when intro-
ducing new constants. Of particular significance is that fix-point computations
are necessary when such constants are introduced. Indeed, this can be necessary
when a TBox derives from relational data sources with tables that have unique-
ness constraints as well as primary keys, or for which primary keys themselves
are not minimal. In database parlance, one would say in this case that primary
keys are superkeys but not candidate keys. Such “key conversion” tables can
serve to map between alternative primary keys and thereby lead to additional
query answers.
    While several approaches to integrating information in settings in which the
same individual can be identified in several (even syntactically incomparable)
ways have been considered in the past [2], we show how the integration can be
achieved naturally within partial −CF DI ∀−nc by reducing the problem to exist-
ing ABox completion procedures for query answering over partial −CF DI ∀−  nc . In
particular, using concepts and procedures developed in [1] and [4], we define a
natural way of capturing (perhaps multiple) external identities of objects. Sub-
sequently, we show how query answering can be achieved in such a setting via
an embedding into standard partial −CF DI ∀−  nc reasoning and query answering
problems. We also present examples that show applications of this technique
both in the relational setting and in the setting of document databases such as
MongoDB.
    The remainder of the paper is organized as follows. We begin with the neces-
sary background material in Section 2 in which we introduce partial −CF DI ∀−  nc
concepts, and “standard” knowledge bases consisting of a TBox of inclusion de-
pendencies over such concepts and an ABox of assertions. Our main results then
follow in Section 3 in which an ABox is replaced with a CBox of partial −CF DI ∀−
                                                                               nc
concepts called referring expressions. We then define a mapping of CBoxes to
ABoxes and show how each of the following can be resolved with the use of this
mapping: (1) diagnosing an admissibility condition for a CBox, (2) satisfiabil-
ity of knowledge bases with a CBox, and (3) query answering over knowledge
bases with a CBox. The admissibility condition requires that the TBox ensures
each referring expression occurring in the CBox is singular in the sense outlined
above. Throughout, we introduce examples to illustrate why CBoxes are useful
and how identification issues become far more complicated as a consequence. We
conclude with summary comments in Section 4.


2   Background

The description logic partial −CF DI ∀−nc is a member of the CF D family of DLs
which are fragments of FOL with underlying signatures based on disjoint sets of
unary predicate symbols called primitive concepts, constant symbols called in-
dividuals and unary function symbols called features. Note that features deviate
from the normal practice of admitting roles denoting binary predicate symbols.
However, features make it easier to incorporate concept constructors that are
better suited to the capture of relational data sources, particularly so when they


                                        2
include dependencies such as primary keys, uniqueness constraints, functional
dependencies and foreign keys. This is achieved by a straightforward reification
of n-ary predicates and by using a concept constructor peculiar to the CF D fam-
ily called a path functional dependency. Consider the case of a role R. It can be
reified as a primitive concept RC , two features domR and ranR and an inclusion
dependency of the form

                               RC v RC : domR, ranR → id

in partial −CF DI ∀−
                  nc . The latter effectively ensures any combination of domR
and ranR values uniquely determine an R 2-tuple. Note that an ALC inclusion
dependency mentioning R of the form “A v ∀R.B”, can also be captured in
partial −CF DI ∀−
               nc as the inclusion dependency

                                   ∀domR.A v ∀ranR.B.

Concepts in partial −CF DI ∀−
                           nc are defined as follows:


Definition 1 (partial −CF DI ∀− nc Concepts)
Let F and PC be sets of feature names and primitive concept names, respectively.
A partial path expression is defined by the grammar “ Pf ::= f. Pf | id ” for f ∈ F.
We define derived concept descriptions by the grammar on the left-hand-side of
Fig. 1. A path functional dependency concept, or PFD, is obtained by using the
final production of this grammar.
An inclusion dependency C is an expression of the form C1 v C2 . A terminology
(TBox) T consists of a finite set of inclusion dependencies. A posed question Q
is a single inclusion dependency.


                   Syntax                Semantics: Defn of “·I ”
     C ::= A                             AI ⊆ 4           (primitive concept; A ∈ PC)
         | C1 u C2                       C1I ∩ C2I                       (conjunction)
         | ¬C                            4 \ CI                              (negation)
         | ∀ Pf .C                       {x : Pf I (x) ∈ C I }      (value restriction)
         | ∃Pf                           {x : Pf I (x) exists} (existential restriction)
         | ∃f −1 .C                      {f I (x) : x ∈ C I }         (inverse feature)
         | {a}                           {aI }                                (nominal)
         | C : Pf 1 , ..., Pf k → Pf 0   (see text)                              (PFD)

                 Fig. 1. Syntax and semantics of CFD concepts.


The semantics of expressions is defined with respect to a structure I = (4, ·I ),
where 4 is a domain of “objects” and ·I an interpretation function that fixes the
interpretations of primitive concepts A to be subsets of 4 and primitive features
f to be partial functions f I : 4 → 4. Note that partial −CF DI ∀−nc adopts the


                                               3
strict interpretation of undefined values, which means that arguments terms
must be defined whenever equality and set membership do hold.
The interpretation is extended to partial path expressions, id I = λx.x, (f. Pf)I =
Pf I ◦f I , in the natural way, and derived concept descriptions C not including
PFDs as defined in the centre column of Fig. 1. For concept descriptions that
are PFDs, the interpretation is defined as follows:1

 (C : Pf 1 , . . . , Pf k → Pf 0 )I = {x | ∀y.y ∈ C I ∧ x ∈ (∃Pf 0 )I ∧ y ∈ (∃Pf 0 )I ∧
     Vk                     I               I     I         I            I          I
       i=1 (x ∈ (∃Pf i ) ∧ y ∈ (∃Pf i ) ∧ Pf i (x) = Pf i (y)) → Pf 0 (x) = Pf 0 (y) }.


An interpretation I satisfies an inclusion dependency C1 v C2 if C1I ⊆ C2I and
is a model of T (I |= T ) if it satisfies all inclusion dependencies in T . The logical
implication problem asks if T |= Q holds, that is, if Q is satisfied in all models
of T .                                                                               2

Observe that features are still functional, and that there is therefore no need
for a qualified existential restriction of the form ∃f.C, since such restrictions
can be equivalently written as ∀f.C u ∃f . Hence the use of qualified existential
restrictions in the rest of the paper should be considered to be syntactic sugar.
    To ensure PTIME reasoning in partial −CF DI ∀− nc we require all subsumptions
in the TBox to adhere to the following restrictions (for more general TBoxes and
normalization see [6]): all subsumptions have to be of the form C v D, where
the structure of concepts C and D are given by the following grammars:

             C ::= A | ∀f.A
             D ::= A | ⊥ | ¬A | ∀f.A | ∃f −1 | ∃f | A : Pf 1 , . . . , Pf k → Pf

In addition, PFDs must adhere to one of the following two forms to avoid unde-
cidability [5]:
                   1. C : Pf 1 , . . . , Pf . Pf i , . . . , Pf k → Pf or
                                                                           (1)
                   2. C : Pf 1 , . . . , Pf .g, . . . , Pf k → Pf .f

Definition 2 (partial −CF DI ∀−nc ABox)
The second component of a partial −CF DI ∀− nc knowledge base is an ABox A
that contains assertions of the form “A(a)”, “a = b”, “a 6= b”, and “f (a) = b”,
with the usual interpretation mapping constant symbols to domain elements,
and interpreting the assertions as set membership and an equality/inequality
between a constant and another constant or a function application to a constant,
respectively.                                                                 2

Proposition 3 (partial −CF DI ∀−   nc KB Satisfiability[6])
Satisfiability of partial −CF DI ∀−
                                 nc knowledge bases is complete for PTIME.

1
    This constitutes the minimum necessary conditions needed for recognizing when one
    violates an inclusion dependency of the form “ C1 v C2 : Pf 1 , . . . , Pf k → Pf 0 ” .


                                              4
Conjunctive queries are, as usual, formed from atomic queries (or atoms), corre-
sponding to concept descriptions, and equalities between variables and applica-
tion of function to variables, using conjunction and existential quantification. To
simplify notation, we conflate conjunctive queries with the set of its constituent
atoms and a set of answer variables:
Definition 4 (Conjunctive Query)
Let ϕ be a set of atoms (representing a conjunction) A(xi ) and f (xi1 ) = xi2 ,
where A is a primitive concept description, f a feature (including id ), and x̄ a
tuple of variables. We call the expression {x̄ | ϕ} a conjunctive query (CQ). 2
A conjunctive
    V           query {x̄ | ϕ} is therefore a notational variant of the formula
∃ȳ. ψ∈ϕ ψ in which ȳ contains all variables appearing in ϕ but not in x̄. The
usual definition of certain answers is given by the following:

Definition 5 (Certain Answer)
Let K be a partial −CF DI ∀−nc KB and Q = {x̄ | ϕ} a CQ. A certain answer to Q
over K is a substitution of constant symbols ā, [x̄ 7→ ā], such that K |= Q[x̄ 7→ ā].
2
Proposition 6 (partial −CF DI ∀−
                               nc Query Answering [4])
Query answering over partial −CF DI ∀−
                                    nc knowledge bases is complete for PTIME
(data complexity).
The query answering algorithm presented in [4] requires both ABox completion
and query reformulation. The former is needed to propagate concept member-
ships along feature chains present in the data when implied by a TBox, and the
latter is needed to avoid the need for potentially exponentially many witnesses
of anonymous objects. Note that the ABox completion also deals with equalities
stipulated in the ABox and/or generated by PFD-based subsumptions in the
knowledge base TBox.


3    Referring Expressions and CBoxes
In this section we introduce referring expressions, concept descriptions that will
serve as external identifiers of objects in partial −CF DI ∀−
                                                           nc knowledge bases.
    We will require that these concept descriptions behave the same way constant
symbols behave in the traditional setting: we expect their interpretations to be
singular in every model of a given knowledge base.

Definition 7 (Referring Expressions and Singularity)
Let C be a partial −CF DI ∀−
                          nc concept description conforming to the grammar

                     C ::= A | C1 u C2 | ∃f.C | ∃f −1 .C | {a},

where a is a constant symbol, and T a TBox. We say that C is a (singular)
referring expression (w.r.t. T ) if |C I | ≤ 1 for all interpretations I that are
models of T .                                                                  2


                                           5
We use referring expressions to define the counterpart of assertions in traditional
knowledge bases.

Example 8
Consider a situation in which objects are identified by their f value, such as an
employee number, within a class A. Two A objects can then be captured by the
following pair of concepts:
                       A u ∃f.{123} and A u ∃f.{345}.
The singularity requirement, in this example, can be enforced by ensuring f -
values can indeed serve as the key for A objects, e.g., by adding the inclusion
dependency
                               A v A : f → id
to the TBox. The two concepts now replace the usual ABox assertions of the
form A(c1 ) and A(c2 ), which need an invention of additional constant symbols
for the two abstract objects. Moreover, ABox assertions of the form g(c1 ) = c2
can also be captured by concepts, in our example:
                       A u ∃f.{123} u ∃g.(A u ∃f.{345}).
Note that such descriptions naturally arise when the assertion part of a knowl-
edge base is captured in various database back-ends, e.g., in relational databases
(via keys and foreign keys) or in document databases, such as MongoDB2 , in
which the structure of the referring expressions correspond to JSON3 .

Example 9
To illustrate the flexibility of our approach based on CBoxes, consider the follow-
ing JSON fragment describing persons, hypothetically occurring in a MongoDB
document source:

     {"fname" : "John", "lname" : "Smith", "age" : 25,
       "phoneNum" : [
          {"loc" : "home", "dialnum" : "212 555-1234"},
          {"loc" : "work", "dialnum" : "212 555-4567"}
        ]}

In our setting, this document can be naturally and directly represented as a
CBox assertion of the form
    PERSON u (∃fname.{“John”}) u (∃lname.{“Smith”}) u ∃age.{25}
       u ∃phoneNumFor−1 .((∃loc.{“home”}) u (∃dialnum.{“212 555-1234”}))
       u ∃phoneNumFor−1 .((∃loc.{“work”}) u (∃dialnum.{“212 555-4567”}))
One way in which this assertion satisfies an admissibility condition, defined be-
low, happens whenever the combination of an fname and an lname can serve to
identify a PERSON.
2
    https://www.mongodb.com/.
3
    https://www.json.org/.


                                        6
Identities of documents (and sub-documents) are now captured using referring
expressions; this potentially allows for join operations on document databases
(not typically supported by such systems). Queries navigating JSON documents
can be now expressed as conjunctive queries over the partial −CF DI ∀−   nc repre-
sentation.
   This development leads to a revision of the definition of partial −CF DI ∀− nc
knowledge bases in which the traditional notion of an ABox is replaced by a
CBox: a set of concept descriptions that are referring expressions for individuals
the knowledge base knows about.

Definition 10 (CBoxes, Knowledge Bases, and Query Answers)
Let T be a partial −CF DI ∀−       nc TBox and Q a conjunctive query with answer
variables x1 , . . . , xk . We define a CBox C to be a set of partial −CF DI ∀−
                                                                             nc concept
descriptions.
A partial −CF DI ∀−    nc knowledge base K is a pair (T , C).
We say that the CBox C is admissible for T if |C I | ≤ 1 for all C ∈ C and all
models I of T .
We say that K is consistent if there is an interpretation I such that

 1. I |= T , and
 2. |C I | = 1 for every C ∈ C.

We say that (C1 , . . . , Ck ) is a certain answer to Q in K if

                              K |= Q ∧ C1 (x1 ) ∧ . . . ∧ Ck (xk )

for {C1 , . . . , Ck } ⊆ C.                                                          2


3.1    Identity Resolution

CBoxes inherently represent information about how objects in a knowledge base
are identified. Note there is no restriction on how such identification must be
captured. In particular, there are no uniformity conditions on identification of
objects that must hold, such as requiring each object to have a single global
identifier in all assertions in the knowledge base.
    However, CBoxes allow one to capture various resolutions of the heterogeneity
of identification, e.g., through translation tables or cross-links [2]. These can be
captured using TBox/CBox assertions as follows:

Example 11
Consider the TBox

              { A v B, C v B,
                A v A : f → id , B v B : f, g → id, C v C : g → id
                A v B : f → id , C v B : g → id                    },

                                               7
and the CBox

             { A u ∃f.{3},    B u ∃f.{3} u ∃g.{5},     C u ∃g.{5} }.

Note that the referring expression “A u ∃f.{3}” identifies the same object as
“B u ∃f.{3} u ∃g.{5}”, due to the second-last TBox subsumption, and in turn
as “C u ∃g.{5}”, due to the last TBox assertion. Thus, the object described by
“Au∃f.{3}” should be a certain answer to a conjunctive query {x | A(x)∧C(x)}.
The same happens for all pairs of referring expressions in the CBox subsumed
by A and C, respectively, for which there is a B cross-link.


3.2   On Minimal Referring Expressions

In relational databases the notion of candidate key, a key that has a minimal set
of attributes of a relation, is typically used as an external identifier of objects
stored in the database.
    Our development of a referring expression strictly generalizes the notion of
a superkey in the relational setting: sets of attributes, not necessarily minimal,
that identifies an object or entity. We now present a procedure that (syntacti-
cally) minimizes a referring expression to obtain minimal co-referring referring
expressions that are counterparts to relational candidate keys.

Theorem 12 (Minimal Referring Expressions)
Let T be a partial −CF DI ∀−
                          nc TBox and C a referring expression w.r.t. T . We say
that subconcepts of C of the form A, {a}, ∃f.>, ∃f −1 .>, and > u > are leaves
of C and write C[L 7→ >] for a description C in which a leaf L was replaced
by >. Assuming “first-leaf” and “next-leaf” denote functions that successively
enumerate all leaves of C, the procedure

                   1. L := first-leaf(C);
                   2. while C[L 7→ >] is singular w.r.t. T do
                   3.     C := C[L 7→ >]; L := next-leaf(C);
                   4. done
                   5. return C;

computes a syntactically-minimal co-referring expression for C. (Note that re-
placing a leaf by > may create additional leaves.)
Proof (sketch): Since the C[L 7→ >] operation weakens the concept descrip-
tion C, it preserves satisfiability. The algorithm tests for singularity at every
step. Hence the result is a minimal referring expression equivalent to C since no
additional leaves can be removed.                                               2

The algorithm finds a minimal referring expression in time linear in |C|. Anal-
ogous to the relational setting, backtracking this algorithm facilitates the dis-
covery of alternative minimal referring expressions, and, also analogous to the
relational setting, there can be exponentially many of these.


                                        8
3.3   Reasoning with CBoxes

Our technique crucially depends on mapping CBoxes to (standard) ABoxes as
follows. We begin by defining how individual concepts corresponding to referring
expressions are transformed:

       ToAbox(a : C1 u C2 )          7→   ToAbox(a : C1 ) ∪ ToAbox(a : C2 )
         ToAbox(a : ∃f.C)        7→       {f (a) = b} ∪ ToAbox(b : C), b fresh
       ToAbox(a : ∃f −1 .C)       7→      {f (b) = a} ∪ ToAbox(b : C), b fresh
          ToAbox(a : {b})          7→     {a = b}
           ToAbox(a : A)            7→    {A(a)}, A primitive

The ToAbox function converts a CBox assertion C to a set of ABox assertions
by introducing constant names for all necessary individuals, in particular a con-
stant aC for the (witness of satisfiability of) C itself. The mapping is then lifted
to CBoxes by applying it on all referring expressions in the CBox as follows:
                 [
 ToAbox(C) =        ToAbox(aC : C) ∪ {ai 6= aj | ai , aj individuals in C, i 6= j}
                  C∈C

Note that we make nominals distinct since they correspond to values from a
relational backend that typically assumes UNA. Also, one could reuse the textual
representation of the concepts to serve as the invented constant names.

Theorem 13 (CBox Admissibility)
Let T be a partial −CF DI ∀−nc TBox and C a concept description. Then C is a
singular referring expression w.r.t. T if and only if the knowledge base

      (T ∪ {A v ¬B}, ToAbox(a : C) ∪ ToAbox(b : C) ∪ {a : A, b : B})

is inconsistent, where A and B are primitive concepts not occurring in T and C
and a and b are distinct constant symbols.
Proof (sketch): The ToAbox mapping expands complex concepts in a CBox
to sets of assertions in a corresponding ABox. By case analysis we can show that
a model of (T ∪ {A v ¬B}, ToAbox(a : C) ∪ ToAbox(b : C) ∪ {a : A, b : B})
provides an counterexample to C’s singularity (w.r.t. T ). Moreover, whenever
C is not singular, such a model can be constructed by appropriately naming
additional individuals in a counterexample to C’s singularity.                2

It is easy to verify that the CBox in Example 11 is admissible w.r.t. the given
TBox since ‘f ’, ‘f, g’ and ‘g’ are keys on A, B, and C, respectively.

Theorem 14 (Satisfiability of KBs with CBoxes)
Let K = (T , C) be a knowledge base with an admissible CBox C. Then K is
consistent if (T , ToAbox(C)) is consistent.
Proof (sketch):    Similar to the argument in the proof sketch in Theorem 13. 2


                                             9
3.4     Query Answering over CBoxes

We now show how query answering over CBox-based knowledge bases can be
reduced to the standard case of ABoxes. We also show an example of the utility
of CBoxes in capturing distinct co-references to a particular object, and how
partial −CF DI ∀−
               nc based TBoxes can account for such co-references.


Theorem 15 (Query Answering)
Let K = (T , C) be a consistent knowledge base and Q = {(x1 , . . . , xk ) : ϕ} a
conjunctive query over K. Then (C1 , . . . , Ck ) is a certain answer to Q in K if
and only if (aC1 , . . . , aCk ) is a certain answer to Q over (T , ToAbox(C)).
Proof (sketch): Since C1 , . . . , Ck are singular referring expressions, there must
be individuals o1 , . . . , ok witnessing nonemptiness of C1 , . . . , Ck , respectively,
that make the query true in every model of K; case analysis shows that, in the
corresponding models of (T , ToAbox(C)), these individuals will be the inter-
pretations of the constant symbols (aC1 , . . . , aCk ) (and vice versa).              2

Note that ABox completion [4, 6] will make constant symbols belonging to co-
referring referring expressions equal automatically. This, in turn, realizes all
reasoning needed to capture the effects of translation tables in a TBox/CBox:

Example 16
Reconsider the TBox and CBox in Example 11. The ABox constructed from the
CBox is as follows:
      { A(aAu∃f.{3} ), f (aAu∃f.{3} ) = 3,
        B(aBu∃f.{3}u∃g.{5} ), f (aAu∃f.{3}u∃g.{5} ) = 3, g(aAu∃f.{3}u∃g.{5} ) = 5,
        C(aCu∃g.{5} ), f (aCu∃g.{5} ) = 5,                                         }

Note that the referring expressions A u ∃f.{3} and C u ∃g.{5}, despite being
syntactically distinct, co-refer to the same object in the knowledge base due to
the existence of the B referring expression and the TBox subsumptions. The
standard ABox completion [4, 6] then generates the equality

                                {aAu∃f.{3} = aCu∃g.{5} }

using the last two constraints in the TBox and the fact that both A and C are
subsumed by B, that serves as a translation concept generated from a translation
table. This yields the referring expression A u ∃f.{3} to be a certain answer to
the query.


On Query Answers. The definition of certain answers asks for all tuples of
constants—in our setting proxied by referring expressions—for which the query
is entailed by the knowledge base. Thus the selection of referring expressions in
the CBox determines components of query answers presented to the user. There
are two considerations:


                                           10
 1. Additional answers may be needed; these can be obtained by considering
    additional referring expressions describing, e.g., sub-documents, to the CBox
    (as long as the CBox remains admissible);
 2. Simpler answers may be desired, i.e., simpler referring expressions denoting
    the answers; these can be obtained by appropriate selection of minimal re-
    ferring expressions (and removing all the more complex referring expressions
    from answers).
Both of these goals can be achieved by a simple housekeeping that determines
which referring expressions are eligible to appear in query answers. This step can
be easily combined with the CBox-to-ABox mapping by appropriately marking
the generated constant symbols. Indeed, similar marking is commonly used, e.g.,
when ABoxes are normalized in most OBDA settings.

4   Summary
We have considered how referring expressions corresponding to concepts in a
description logic can serve the role of constant symbols in both assertion boxes
and in query answering, and how doing so leads to a more effective and direct
way of achieving an integration of backend data sources via OBDA, as well as
more descriptive and meaningful answers to queries.
    Admitting referring expressions leads naturally to a notion of a concept box
or CBox in place of an ABox in a knowledge base. This in turn raises a number
of technical issues: how to ensure referring expressions in a CBox refer to a single
individual, how to check for knowledge base consistency, and how to evaluate
conjunctive queries over the knowledge base. We have shown how all these issues
can be resolved by a mapping of CBoxes to ABoxes. The mapping enables off-the-
shelf procedures for ABox completion, for consistency checking, and for ABox
completion and query rewriting over standard knowledge bases consisting of a
TBox and ABox.
    In [1], the notion of a referring expression type was also introduced. For
future work, we plan to explore how such a typing discipline can be used to push
parts of the mapping of CBoxes to ABoxes to backend database sources along
the lines outline in [3]. Our new ability of detecting co-reference to objects by
referring expressions can also lead to an ability to detect duplicate answers in
query results. Future work along this line can enable additional capabilities in
query formulation and answering, such as an ability for “limit k” operators in
queries.

References
1. Borgida, A., Toman, D., Weddell, G.: On referring expressions in query answering
   over first order knowledge bases. In: Proc. of KR’16. pp. 319–328 (2016)
2. Calvanese, D., Giese, M., Hovland, D., Rezk, M.: Ontology-based integration of
   cross-linked datasets. In: The Semantic Web - ISWC 2015 - 14th International Se-
   mantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings,
   Part I. pp. 199–216 (2015)


                                        11
3. Jacques, J.S., Toman, D., Weddell, G.E.: Object-relational queries over CFDI nc
   knowledge bases: OBDA for the SQL-Literate. In: Proc. International Joint Confer-
   ence on Artificial Intelligence, IJCAI. pp. 1258–1264 (2016)
4. McIntyre, S., Borgida, A., Toman, D., Weddell, G.: On limited conjunctions in
   polynomial feature logics, with applications in OBDA. In: Proc. KR18 (extended
   abstract). p. (in press) (2018)
5. Toman, D., Weddell, G.E.: On keys and functional dependencies as first-class citi-
   zens in description logics. J. Aut. Reasoning 40(2-3), 117–132 (2008)
6. Toman, D., Weddell, G.E.: On partial features in the DLF family of description
   logics. In: PRICAI 2016: Trends in Artificial Intelligence - 14th Pacific Rim In-
   ternational Conference on Artificial Intelligence, Phuket, Thailand, August 22-26,
   2016, Proceedings. pp. 529–542 (2016)


                                         12

</pre>