=Paper=
{{Paper
|id=None
|storemode=property
|title=On Separability of Ontological Constraints
|pdfUrl=https://ceur-ws.org/Vol-866/paper3.pdf
|volume=Vol-866
|dblpUrl=https://dblp.org/rec/conf/amw/CaliCF12
}}
==On Separability of Ontological Constraints==
<pdf width="1500px">https://ceur-ws.org/Vol-866/paper3.pdf</pdf>
<pre>
       On Separability of Ontological Constraints

               Andrea Calı̀1,3 , Marco Console2 , and Riccardo Frosini2
 1
     Dept. of Computer Science and Inf. Systems, Birkbeck University of London, UK
      2
        Dip. di Informatica e Sistemistica, Università di Roma “La Sapienza”, Italy
        3
          Oxford-Man Institute of Quantitative Finance, University of Oxford, UK
                               andrea@dcs.bbk.ac.uk
                    {console.marco,frosini.riccardo}@gmail.com


         Abstract. When data schemata are enriched with expressive con-
         straints that aim at representing the domain of interest, in order to an-
         swer queries one needs to consider the logical theory consisting of both
         the data and the constraints. Query answering in such a context is called
         ontological query answering. Commonly adopted database constraints
         in this field are tuple-generating dependencies (TGDs) and equality-
         generating dependencies (EGDs). It is well known that their interaction
         leads to intractability or undecidability of query answering even in the
         case of simple subclasses. Several conditions have been found to guar-
         antee separability, that is lack of interaction, between TGDs and EGDs.
         Separability makes EGDs (mostly) irrelevant for query answering and
         therefore often guarantees tractability, as long as the theory is satisfi-
         able. In this paper we review the two notions of separability found in
         the literature, as well as several syntactic conditions that are sufficient
         to prove them. We then shed light on the issue of satisfiability checking,
         showing that under a sufficient condition called deep separability it can
         be done by considering the TGDs only. We show that, fortunately, in
         the case of TGDs and EGDs, separability implies deep separability. This
         result generalizes several analogous ones, proved ad hoc for particular
         classes of constraints. Applications include the class of sticky TGDs and
         EGDs, for which we provide a syntactic separability condition which ex-
         tends the analogous one for linear TGDs; preliminary experiments show
         the feasibility of query answering in this case.


1      Introduction
When a database D is equipped with an ontology Σ, that is, a knowledge base
consisting of constraints that express relevant properties of the underlying do-
main, queries are not answered merely against the database instance D, but
against the logical theory D ∪ Σ. Several languages have been proposed for
ontologies, with different computational properties. The DL-Lite family [2, 15,
23] has the advantage of a low (ac0 , which is contained in logspace) data
complexity of conjunctive query answering and of knowledge base satisfiability.
The well-known Entity-Relationship (ER) [17] model has recently gained im-
portance in ontology specification, due to the fact that it is comprehensible to


                                          48
theorists and practitioners, while having good expressive power. The ER± family
of ER-like languages [12], in particular, comprises several tractable (w.r.t. con-
junctive query answering) ontology languages, which properly generalize the
main languages of the DL-Lite family. Another relevant, more general class of
ontology languages, is the Datalog ± family, that is, a family of rule-based lan-
guages derived from Datalog (see, e.g., [9] and references therein) whose rules are
(function-free) Horn rules, possibly with existentially quantified variables in the
head, called tuple-generating dependencies (TGDs), enriched with functionality
constraints in the form of equality-generating dependencies (EGDs), and nega-
tive constraints, a form of denial constraints. The simplest and least expressive
Datalog± languages are able to properly capture the most prominent ontology
languages, including most of the DL-lite family.
    In this paper we focus on the interaction between TGDs and EGDs. A
TGD is a first-order implication that forces the existence of tuples under cer-
tain conditions, and it is of the form ∀X∀Y ϕ(X, Y) → ∃Z ψ(X, Z), where
ϕ(X, Y) and ψ(X, Z) are conjunctions of atoms over a relational schema. An
EGD forces equality of values under certain conditions, and it is of the form
∀X ϕ(X) → Xi = Xj , where ϕ(X) is a conjunction of atoms over a relational
schema, and {Xi , Xj } ⊆ X. To answer a query Q over an instance D and a set
Σ of TGDs and EGDs, we could in principle expand D according to Σ, inferring
all the entailed additional knowledge, and then evaluate Q against the obtained
instance1 . In doing so, unknown values (those corresponding to the existentially-
quantified variables of TGDs) will be represented by labelled nulls, which are a
sort of placeholder. EGDs play a role in answering as they can enforce equalities;
if an equality between two distinct constants is enforced, the process fails as the
theory is unsatisfiable.

Example 1. Consider the following set Σ of TGDs and EGDs (we omit universal
quantifiers to avoid clutter):

            σ1 : r1 (X) → ∃Y r2 (X, Y ) σ4 : r4 (X, Y ) → r5 (X, Y )
            σ2 : r2 (X, Y ) → r3 (X, Y ) σ5 : r5 (X, Y ) → r2 (X, Y )
            σ3 : r3 (X, Y ) → r4 (X, Y )  η : r3 (X, Y ), r3 (X, Z) → Y = Z

Notice that η is a key dependency, imposing that atoms in r3 have unique values
on their first attribute. Now, let us take D = {r1 (a), r3 (a, b)} and the (ground)
Boolean conjunctive query Q defined as q ← r2 (a, b) (q is a propositional predi-
cate), which simply asks whether r2 (a, b) holds. Let us answer Q by expanding
D according to Σ. We first obtain r2 (a, ζ) from σ1 , where ζ is a null. From σ3 we
get r4 (a, b). We then add r3 (a, ζ) from σ2 ; at this point we need to apply η and
enforce ζ = b, so that r3 (a, ζ) becomes r3 (a, b). Due to η, therefore, the query
has positive answer, written D ∪ Σ |= Q. However, even in the absence of η, Q
would be answered positively. In fact, if we proceed with the expansion we get
1
    This is not a query answering procedure in general, as the expansion (which is called
    chase; see Section 2) can be infinite; however, the chase is a a key tool for studying
    the query answering problem, as it will be clear in Section 2.


                                         49
r5 (a, b) from σ4 and finally r2 (a, b) from σ5 . Therefore we would have had the
same result without η. This can be shown to hold for every query Q; therefore,
provided that the theory D ∪ Σ is satisfiable (that is, the expansion does not
fail), we can answer every query, for every D, by considering the TGDs only.
This property is called separability, and it defines a form of lack of interaction
(hence the name) between EGDs and TGDs in a certain set.

    In this paper we provide the following contributions:
 1. We illustrate the two notions of separability (the old and the new one) found
    in the literature, relating them to each other, and to the several syntactic
    conditions, sufficient to separability (one of the two notions), that have been
    proposed in the literature in the last decade or so.
 2. We consider the problem of checking satisfiability of a theory consisting of
    an instance and a set of TGDs and EGDs. This can be done by answering
    suitable Boolean Conjunctive queries, and this has been shown in several
    cases. However, the proof that this technique works has always been ad
    hoc, depending on the particular class of TGDs and EGDs. Here, we show
    that this technique for the satisfiability check is sound and complete if a
    semantic condition, which we call deep separability, holds. We show that,
    fortunately, (new) separability implies deep separability. This way, we unify
    several known results under a single general theorem, and we also provide a
    useful tool for all separable classes that are yet to be discovered.
 3. Finally, we mention, as an application, a syntactic condition which is suffi-
    cient for separability of sticky sets of TGDs [10] and EGDs, we hint at our
    preliminary experiment with a prototype system, and we discuss some open
    problems.


2    Preliminaries
In this section we recall some basics on databases, queries, tuple-generating
dependencies, equality-generating dependencies, and the chase procedure.
    General. We define the following pairwise disjoint (infinite) sets of symbols:
(i) a set Γ of constants (constitute the “normal” domain of a database), (ii)
a set ΓN of labeled nulls (used as placeholders for unknown values, and thus
can be also seen as variables), and (iii) a set ΓV of variables (used in queries
and dependencies). Different constants represent different values (unique name
assumption), while different nulls may represent the same value. A lexicographic
order is defined on Γ ∪ ΓN , such that every value in ΓN follows all those in Γ .
Throughout the paper, we denote by X the sequence of variables X1 , . . . , Xk ,
where k > 0. Also, let [n] be the set {1, . . . , n}, for any integer n > 1.
    A relational schema R (or simply schema) is a set of relational symbols (or
predicates), each with its associated arity. A position r[i] (in a schema R) is
identified by a predicate r ∈ R and its i-th argument (or attribute). A term t is
a constant, null, or variable. An atomic formula (or simply atom) has the form
r(t1 , . . . , tn ), where r is an n-ary relation, and t1 , . . . , tn are terms. Conjunctions


                                          50
of atoms are often identified with the sets of their atoms. A database (instance) D
for a schema R is a (possibly infinite) set of atoms of the form r(t) (a.k.a. facts),
where r is an n-ary predicate of R, and t ∈ (Γ ∪ ΓN )n . We denote as r(D) the
set {t | r(t) ∈ D}.
    A substitution from one set of symbols S1 to another set of symbols S2
is a function h : S1 → S2 defined as follows: (i) ∅ is a substitution (empty
substitution), (ii) if h is a substitution, then h ∪ {X → Y } is a substitution,
where X ∈ S1 and Y ∈ S2 , and h does not already contain some X → Z with
Y 6= Z. If X → Y ∈ h we write h(X) = Y . A homomorphism from a set of
atoms A1 to a set of atoms A2 , both over the same schema R, is a substitution
h : Γ ∪ ΓN ∪ ΓV → Γ ∪ ΓN ∪ ΓV such that: (i) if t ∈ Γ , then h(t) = t, and
(ii) if r(t1 , . . . , tn ) is in A1 , then h(r(t1 , . . . , tn )) = r(h(t1 ), . . . , h(tn )) is in A2 .
If there are homomorphisms from A1 to A2 and vice-versa, then A1 and A2 are
homomorphically equivalent. The notion of homomorphism naturally extends to
conjunctions of atoms. Given a set of symbols S, two atoms a1 and a2 are S-
isomorphic iff there exists a bijection h such that h(a1 ) = a2 , h−1 (a2 ) = a1 , and
h(X) = X, for each X ∈ S.
    Conjunctive Queries. A conjunctive query (CQ) Q of arity n over a schema
R has the form q(X) ← ϕ(X, Y), where ϕ(X, Y) is a conjunction of atoms over
R, X and Y are sequences of variables or constants in Γ , and q is an n-ary
predicate that does not occur in R. ϕ(X, Y) is called the body of q, denoted as
body(q). A Boolean CQ (BCQ) is a CQ of zero arity.
    The answer to an n-ary CQ Q of the form q(X) ← ϕ(X, Y) over a database
D, denoted as Q(D), is the set of all n-tuples t ∈ Γ n for which there exists a
homomorphism h : X∪Y → Γ ∪ΓN such that h(ϕ(X, Y)) ⊆ D and h(X) = t. A
BCQ has only the empty tuple hi as possible answer, in which case it is said that
has positive answer. Formally, a BCQ Q has positive answer over D, denoted as
D |= Q, iff hi ∈ Q(D), or, equivalently, Q(D) 6= ∅.
    Dependencies. Given a schema R, a tuple-generating dependency (TGD)
σ over R is a first-order formula ∀X∀Y ϕ(X, Y) → ∃Z ψ(X, Z), where ϕ(X, Y)
and ψ(X, Z) are conjunctions of atoms over R, called the body and the head
of σ, denoted as body(σ) and head (σ), respectively. Henceforth, to avoid nota-
tional clutter, we will omit the universal quantifiers in TGDs. Such σ is satis-
fied by a database D for R iff, whenever there exists a homomorphism h such
that h(ϕ(X, Y)) ⊆ D, there exists an extension h′ of h (i.e., h′ ⊇ h) where
h′ (ψ(X, Z)) ⊆ D.
    Inclusion dependencies (IDs) are the simplest class of TGDs; they have just
one body-atom and one head-atom, without repetition of variables neither in
the body nor in the head. They are usually denoted as r1 [A] ⊆ r2 [B], where A
and B are sets of attributes (arguments) of r1 and r2 respectively. Such an ID
expresses that for each r1 -atom, its values in A have to appear in B some r2
atom. How this is represented by a TGD is obvious.
    An equality-generating dependency (EGD) η over R is a first-order formula
∀X ϕ(X) → Xi = Xj , where ϕ(X) is a conjunction of atoms over R, called the
body and denoted as body(η), and Xi = Xj is an equality among variables of X.


                                               51
Henceforth, for brevity, we will omit the universal quantifiers in EGDs. Such η
is satisfied by a database D for R iff, whenever there exists a homomorphism
h such that h(ϕ(X)) ⊆ D, then h(Xi ) = h(Xj ). Special cases of EGDs are
functional dependencies (FDs) of the form r : A → B, where A and B are sets
of attributes of r; such a dependency is satisfied in an instance D if there are
no two atoms in D that have the same values on A but different values on B. If
the union of A and B is the whole set of attributes of r, the FD is said to be a
key dependency (KD). How a FD is expressed by EGDs is obvious.
    CQ Answering under Dependencies. We now define the notion of query
answering under TGDs and EGDs. Given a database D for R, and a set Σ of
TGDs and EGDs over R, the models of D w.r.t. Σ, denoted as mods(D, Σ), is
the set of all databases B such that B |= D ∪ Σ, i.e., B ⊇ D and B satisfies
Σ. The answer to a CQ Q w.r.t. D and Σ, denoted as ans(Q, D, Σ), is the set
{t | t ∈ Q(B), for each B ∈ mods(D, Σ)}. The answer to a BCQ Q w.r.t. D
and Σ is positive, denoted as D ∪ Σ |= Q, iff ans(Q, D, Σ) 6= ∅. Note that query
answering under general TGDs and EGDs is undecidable. In fact, this is true
even in extremely simple cases such as that of IDs and keys [16].
   We recall that the two problems of CQ and BCQ answering under TGDs
and EGDs are logspace-equivalent [6]. Moreover, it is easy to see that the
query output tuple problem (as a decision version of CQ answering) and BCQ
evaluation are ac0 -reducible to each other. Thus, we henceforth focus only on
the BCQ answering problem.
    The Chase Procedure. The chase procedure (or simply chase) is a funda-
mental algorithmic tool introduced for checking implication of dependencies [21],
and later for checking query containment [20]. Informally, the chase is a process
of repairing a database w.r.t. a set of dependencies so that the resulted database
satisfies the dependencies. We shall use the term chase interchangeably for both
the procedure and its result. The chase works on an instance through the so-
called TGD and EGD chase rules. The TGD chase rule comes in two different
equivalent fashions: oblivious and restricted [6], where the restricted one repairs
TGDs only when they are not satisfied. In the sequel, we focus on the oblivious
one for better technical clarity. The chase rules follow.
    TGD Chase Rule. Consider a database D for a schema R, and a TGD σ
of the form ϕ(X, Y) → ∃Z ψ(X, Z) over R. If σ is applicable to D, i.e., there
exists a homomorphism h such that h(ϕ(X, Y)) ⊆ D, then: (i) define h′ ⊇ h
such that h′ (Zi ) = zi , for each Zi ∈ Z, where zi ∈ ΓN is a “fresh” labeled null
not introduced before, and following lexicographically all those introduced so
far, and (ii) add to D the set of atoms h′ (ψ(X, Z)), if not already in D.
    EGD Chase Rule. Consider a database D for a schema R, and an EGD η
of the form ϕ(X) → Xi = Xj over R. If η is applicable to D, i.e., there exists a
homomorphism h such that h(ϕ(X)) ⊆ D and h(Xi ) 6= h(Xj ), then: (i) if h(Xi )
and h(Xj ) are both constants of Γ , then there is a hard violation of η, and the
chase fails, otherwise (ii) replace each occurrence of h(Xj ) with h(Xi ), if h(Xi )
precedes h(Xj ) in the lexicographic order, or vice-versa otherwise.


                                      52
   Given a database D and a set of dependencies Σ = ΣT ∪ ΣE , where ΣT
are TGDs and ΣE are EGDs, the chase algorithm for D and Σ consists of an
exhaustive application of the chase rules in a breadth-first fashion, which leads
to a (possibly infinite) database. Roughly, the chase of D w.r.t. Σ, denoted as
chase(D, Σ), is the (possibly infinite) instance constructed by iteratively apply-
ing (i) the TGD chase rule once, and (ii) the EGD chase rule as long as it is
applicable (i.e., until a fixed point is reached). A formal definition of the chase
algorithm is given, e.g., in [10].

Example 2 (from [14]). Let R = {r, s}. Consider the set Σ of TGDs and EGDs
over R constituted by the TGDs σ1 = r(X, Y ) → ∃Z r(Z, X), s(Z) and σ2 =
r(X, Y ) → r(Y, X), and the EGD η = r(X, Y ), r(X ′ , Y ) → X = X ′ . Let D be
the database for R consisting of the single atom r(a, b). During the construction
of chase(D, Σ) we first apply σ1 , and we add the atoms r(z1 , a), s(z1 ), where z1
is a “fresh” null of ΓN . Moreover, σ2 is applicable and we add the atom r(b, a).
Now, the EGD η is applicable and we replace each occurrence of z1 with the
constant b; thus, we get the atom s(b). We continue by applying exhaustively
the chase rules as explained above.

     The (possibly infinite) chase of D w.r.t. Σ is a universal model of D w.r.t. Σ,
i.e., for each database B ∈ mods(D, Σ), there exists a homomorphism from
chase(D, Σ) to B [19, 18]. Using this fact it can be shown that the chase is a
formal tool for query answering under TGDs and EGDs. In particular, given a
BCQ Q, D ∪ Σ |= Q iff chase(D, Σ) |= Q, providing that the chase does not fail.
If the chase fails, then the set of models of D w.r.t. Σ is empty, and D ∪ Σ |= Q
trivially.


3   What is separability?

In this section we provide a brief review of the interaction of TGDs and EGDs
in ontological query answering.
    It is well know that the interaction of TGDs and EGDs leads to undecidability
of query answering; this happens even in simple sub-classes of constraints, such
as key and inclusion dependencies [4, 13]. It is therefore useful to identify classes
of constraints which do not suffer from his “harmful” interaction. To this aim,
a key condition is that of separability [12, 11], which we give below.

Definition 1 (Separability). Consider a set ΣT of TGDs over a schema R,
and a set ΣE of EGDs over R. We say that the set Σ = ΣT ∪ ΣE is separable
if, for every database D for R, either chase(D, Σ) fails, or, chase(D, Σ) |= Q
iff chase(D, ΣT ) |= Q, for every BCQ Q over R.

   Notably, there is another definition of separability in the literature, which is
adopted in [4, 13, 8]. Such a definition is similar to the above Definition 1, but
with a difference which makes it stronger. For the sake of completeness, we give
such a definition below.


                                      53
Definition 2 (Old separability). Consider a set ΣT of TGDs over a schema
R, and a set ΣE of EGDs over R. We say that the set Σ = ΣT ∪ΣE is separable
if, for every database D for R, (i) if the chase fails, then D 6|= ΣE (D does not
satisfy Σ), and (ii) if the chase does not fail, we have chase(D, Σ) |= Q iff
chase(D, ΣT ) |= Q, for every BCQ Q over R.

   The old separability is a special case of the new separability as it enforces
condition (i) (Definition 2), which we reformulate below, calling it EGD-stability.

Definition 3 (EGD-stability). Consider a set ΣT of TGDs over a schema R,
and a set ΣE of EGDs over R. We say that the set Σ = ΣT ∪ ΣE is EGD-stable
if, for every instance D for R, D |= ΣE implies that chase(D, Σ) does not fail.

    EGD stability guarantees that the satisfiability of D ∪ Σ (i.e., the existence,
or non-failure, of chase(D, Σ)) can be determined by merely checking whether
D |= ΣE . In fact, the difference between the old and the new definitions of
separability resides in the fact that under the new separability condition, a chase
failure does not imply that the initial database D violates some EGD in ΣE ,
while this holds under the old definition. The satisfiability check is a fundamental
step in query answering (see Section 3.2), and EGD stability guarantees it can be
easily done. However, with a more sophisticated version of the satisfiability check
(see Section 4), the old separability can be relaxed to the new separability, giving
raise to the discovery of new (syntactic) classes that enjoy (new) separability.
Section 3.1 below provides an overview of the most relevant syntactic classes in
the literature.

3.1     Related Work
The notion of (old) separability was first introduced in [4, 13], in the context
of inclusion dependencies (IDs) and key dependencies (KDs) (see e.g. [1]). The
general idea is to define a sufficient syntactic condition for separability, which
can be efficiently checked. This was done while extending an early class of IDs
and KDs (see e.g. [1]), called key-based. Key-based IDs and KDs were proposed
in the milestone work by Johnson and Klug [20], and they are in fact separable,
though not defined explicitly as such. Key-based IDs and KDs are defined as
follows2 : (i) for each relational predicate r, there is only one KD defined on it;
(ii) for each ID r[X] ⊆ s[Y], where X and Y are set of attributes of r and s
respectively (see [1] for the notation), (ii.a) X is disjoint from any key set of
attributes for r, and (ii.b) Y is properly contained in the set of key attributes
for s.
    The more general class of non-key-conflicting (NKC) IDs, with respect to a
set of KDs, is defined as follows: (i) for each relational predicate r, there is only
one KD defined on it; (ii) for every ID r[X] ⊆ s[Y], Y is not a proper superset
of the set of key attributes for s (if such set exists). The class of KDs and NKC
2
    The definition in the original paper is different but equivalent; we choose a definition
    which is more clear in the context of this paper.


                                          54
IDs is separable, and it properly captures the well known class of foreign-key
dependencies.
    The class of KDs and non-key conflicting IDs was generalized in [8] to general
TGDs, which are assumed to have a single atom in the head, without loss of
generality. The idea is analogous, and the condition is as follows: (i) for each
relational predicate r, there is only one KD defined on it; (ii) each existentially
quantified variable in the head of a TGD must occur only once; (iii) for each
TGD σ = ϕ(X, Y) → ∃Z r(X, Z), the set of X-attributes of r(X, Z) is not a
proper superset of the set of key attributes of r.
    In [10], the class of non-key-conflicting TGDs was straightforwardly extended
to treat functional dependencies (FDs) rather than keys.
    The literature so far discussed deals with sufficient syntactic conditions that
guarantee, in fact, EGD-stability. However, there are classes of TGDs and EGDs
such that EGDs are triggered in the chase, but which enjoy separability.
Example 3. Consider the set of TGDs and EGDs in Example 1. It is sepa-
rable, but it is not EGD-stable. In fact, if we consider the instance D′ =
{r3 (a, b), r2 (a, c)}, we have that D′ |= ΣE but D ∪ ΣE is unsatisfiable because
the chase fails due to a hard violation.
    This leads in [12] to the definition of separability as in Definition 1 in this
paper. This work deals with IDs and KDs which express an expressive variant
of the Entity-Relationship model [17]. By means of graph-related properties,
necessary and sufficient syntactic conditions for separability are provided, thus
defining useful tractable classes of constraints.
    The case of linear TGDs (TGDs with a single body-atom) and KDs was later
considered in [14, 11]. In these works, sufficient syntactic conditions are proposed
that guarantee separability, without imposing non-egd-triggerability. The condi-
tions are quite involved and they make use of backward-resolution. Interestingly,
the complexity of checking the syntactic condition, called non-conflict condition,
is the same as that of query answering, that is pspace-complete in combined
complexity.
    At this point, we considered separability but not the problem of satisfiability.
While separability allows us to ignore EGDs in the case of satisfiable theories,
the problem of deciding the satisfiability remains, as well as that of determining
its complexity. This will be the subject of next section.

3.2   Query Answering under Separable Constraints
In the case of a set Σ = ΣT ∪ ΣE of separable TGDs and EGDs, such that BCQ
answering under ΣT is decidable, given an instance D and a BCQ Q, to decide
whether D ∪ Σ |= Q, the following steps are needed:
 1. Check whether the chase fails, that is, whether D ∪ Σ is satisfiable; if D ∪ Σ
    is unsatisfiable, then trivially D ∪ Σ |= Q (“Ex falso quodlibet”).
 2. If D ∪ Σ is satisfiable, then by Definition 1 we know chase(D, Σ) |= Q iff
    chase(D, ΣT ) |= Q, therefore we check whether chase(D, ΣT ) |= Q.


                                      55
    Apart from the complexity of the satisfiability check, we have that the com-
plexity of query answering is the same as that of answering under TGDs only,
which is a highly desirable property. In the case of EGD-stable constraints, the
satisfiability check amounts simply to checking whether D |= ΣE , which can be
done in np, and in ptime (or better, in the even lower complexity class ac0 ) if
we consider ΣE fixed. However, in the cases of separable but not EGD-stable
constraints, the problem is to be addressed differently; this will be the subject
of Section 4.


4   Separability and Satisfiability

In this section we address the problem of deciding whether, given an instance
D and a set Σ of separable TGDs and EGDs, D ∪ Σ is satisfiable. As seen in
Section 3, this preliminary check is needed, in the case of separability, before one
proceeds to answer a BCQ Q by taking into account the TGDs only.
    The satisfiability check is done as in [8, 12], by encoding hard violations of
EGDs as a set QV of Boolean conjunctive queries. Given a separable set Σ =
ΣT ∪ ΣE , and an instance D, we have that satisfiability holds if and only if,
for each Q ∈ QV we have D ∪ ΣT 6|= Q or, equivalently, if and only if all
queries in QV have negative answer when evaluated against D and ΣT (TGDs
only). The encoding is done as follows. First, we need to add auxiliary facts
to D. For each pair of distinct constants c1 , c2 appearing in D as arguments,
we add the facts neq(c1 , c2 ) and neq(c2 , c1 ) to D, where neq/2 is an auxiliary
predicate expressing that two constants are distinct. Then, for each EGD η of
the form φ(X) → Xi = Xj , with Xi and Xj in X, we construct the BCQ Qη
as follows (quantifiers omitted for brevity): Qη : q ← φ(X), neq(Xi , Xj ), where
q is a propositional predicate. The set QV encoding hard violations      S is then
defined as the set of of all Qη so constructed, or equivalently QV = η∈ΣE Qη .
It is not difficult to prove that if D ∪ Σ |= Qη then chase(D, Σ) fails, and
therefore D ∪ Σ is unsatisfiable. However, it still remains to determine whether
D ∪Σ |= Qη . Notice that in the case of non-egd-triggerable constraints we do not
have this problem, as we merely need to check whether D |= ΣE . In principle,
separability (see Definition 1) does not tell us anything about the cases when the
chase fails. However, we are still in luck because we can evaluate the (Boolean)
conjunctive queries in QV against D ∪ ΣT rather than D ∪ Σ. This is proved,
for instance, in [12, 14], but ad hoc, by using the properties of the particular
class of constraints involved. Here we show a general condition that allows us to
perform the satisfiability check as above. We first need to introduce a preliminary
condition, called deep separability, which is apparently more restrictive than
separability (we shall then prove that it is implied by separability). We start by
denoting by chase k (D, Σ) the result of the first k chase steps under Σ applied
to an instance D, where a step is either an EGD or a TGD application.

Definition 4. Let Σ be a set of TGDs and EGDs, with Σ = ΣT ∪ ΣE , where
the dependencies in ΣT are TGDs and those in ΣE are EGDs. We say that Σ is


                                      56
deeply separable if, for each integer k ≥ 0, for each instance D with with values
in Γ ∪ ΓN , and for each Boolean conjunctive query Q, the following holds: if
chase k (D, Σ) exists, then chase k (D, Σ) |= Q implies chase(D, ΣT ) |= Q.

   The notion of deep separability, intuitively, guarantees that, at any step before
a possible failure, the chase does not entail any atoms that are not entailed by
the chase computed according to ΣT only. We now come to the main result
about deeply separable TGDs and EGDs. The result states that in the case of
deep separability the satisfiability check done by answering suitable queries as
above is correct and complete.

Theorem 1. Let Σ be a set of deeply separable TGDs and EGDs, with Σ =
ΣT ∪ ΣE , where the dependencies in ΣT are TGDs and those in ΣE are EGDs.
Let QV = {Q1 , . . . , Qm } be the set of BCQs encoding violations of the EGDs in
ΣE as from the above construction. Then we have that chase(D, ΣT ) |= Qi for
some i ∈ {1, . . . , m} if and only if chase(D, Σ) fails (or equivalently D ∪ Σ is
unsatisfiable).

     Proof (sketch). We prove the two directions of the implication separately.
     “Only if”. By contradiction, assume chase(D, ΣT ) |= Qi for some i ∈
{1, . . . , m} but chase(D, Σ) exists. It is not difficult to show that if
chase(D, ΣT ) |= Qi then also chase(D, Σ) |= Qi ; this holds because
chase(D, ΣT ) is a universal model for D ∪ ΣT and chase(D, Σ) is a model (not
necessarily universal) for D ∪ ΣT ; therefore there exists a homomorphism from
the former to the latter. If chase(D, Σ) |= Qi then the chase must necessarily
fail. Contradiction.
     “If”. Assume chase(D, Σ) fails at step k by violation of the EGD
η ∈ ΣE . Then, chase k−1 (D, Σ) exists; moreover, it is easily seen that
chase k−1 (D, Σ) |= Qη , where Qη ∈ QV encodes the violation of η. By the
hypothesis of deep separability we have chase(D, ΣT ) |= Qη , hence the claim. 

   Notice that for the “If” direction we do not need deep separability nor sep-
arability; this direction of the implication holds for general TGDs and EGDs.
Finally we show that, despite the appearances, deep separability is not a stricted
condition than separability. In fact, separability implies deep separability.

Theorem 2. Let Σ be a set of TGDs and EGDs. If Σ is separable, then it is
deeply separable.

    Proof (sketch). We distinguish two cases.
    Case 1: chase(D, Σ) exists. In this case the claim obviously holds.
    Case 2: chase(D, Σ) fails. Assume chase(D, Σ) fails at step k; therefore
chase k−1 (D, Σ) exists. Take DF obtained from D by replacing each constant
in Γ with a fresh null from ΓN . Obviously chase(DF , Σ) exists and so does
chase k−1 (DF , Σ). Take any BCQ Q such that chase k−1 (DF , Σ) |= Q. It
is straightforwardly seen that chase(DF , Σ) |= Q and, due to separability,
chase(DF , ΣT ) |= Q. Since chase(DF , ΣT ) |= Q is obtained from chase(DF , ΣT )


                                      57
by the above renaming of constants into nulls, we have chase(D, ΣT ) |= Q.
Since this holds for every step ℓ ≤ k − 1, the claim is proved. 

    The following result immediately follows from the above.

Corollary 1. For every separable set Σ of TGDs and EGDs, for every instance
D and for every BCQ Q:

 – checking unsatisfiability of D∪Σ has the same complexity as query answering
   under ΣT alone;
 – checking whether D ∪ Σ |= Q has the same complexity as query answering
   under ΣT alone.

   Notice that we mention unsatisfiability rather than satisfiability because The-
orem 1 shows a reduction from failure (unsatisfiability) to BCQ answering under
TGDs alone.


5    Preliminary Experiments

We show here some preliminary experiments on the separability check in ontolo-
gies. First of all, let us describe the ontology language we have adopted. In [11],
a sufficient condition for separability (in the “new” meaning) is provided for
the case of EGDs and linear TGDs (which, we remind the reader, are TGDs
with exactly one head-atom and one body-atom). We have extended such con-
dition, with a rather straightforward adaptation, to the class of sticky sets of
TGDs [10], a relevant class that allows for a form of joins in the body. The
condition requires, as in the case of [11], a significant number of containment
checks under sticky sets of TGDs in practical cases. More specifically, there is
a first step, like in [11], where the body of each EGD is expanded via a (ter-
minating) variant of backward resolution, and then for each obtained subgoal,
a suitable containment check (under TGDs alone) is performed. We have built
a prototype system which performs such a separability check which requires –
as is obvious from this paper – also the satisfiability check. In the first tests,
we considered a few known ontologies, including the GALEN medical ontology
and the LUBM ontology (benchmark from Lehigh University, which describes
the organizational structure of universities). The tests have been run on an Intel
QuadCore I7 at 2.0 GHz and with 4 Gb of RAM; the operating system was
Windows 7 Professional carrying Java Standard Edition Runtime Environment
1.7.0. The considered ontologies do not include any expressive functionality con-
straints, therefore we designed some ourselves, according to the semantics. In
such cases, separability checks ran in extremely short time, confirming our idea
that real-world functionality constraints can be efficiently checked for separa-
bility. We therefore moved on to test the system on near-worst-case schemata.
We “artificially” designed some ad-hoc ontologies that, to be checked for sep-
arability, require a high number of containment checks (double exponential in
the maximum predicate arity). Some values are shown in Figure 1. Notice that


                                     58
                    Max arity Containment checks Time (s)
                       4              18           0.372
                       5              61           1.635
                       6             186           7.603
                       7             540          67.797
                       8             1450         527.608
                       9             4748        6191.924

Fig. 1. Experimental results for the separability check in ”near-worst-case” ontologies.


execution times are still reasonable for arities that are already too large to exist
in practice. Given that the check is independent of queries, and therefore it is
run only in case either the constraints or the instance change, we believe the first
results are promising and the separability check is generally efficient in practice.


6    Discussion
In this paper we have given an overview of the problem of separability between
TGDs and EGDs in the context of ontological query answering, where queries
are (Boolean) conjunctive queries. We have reviewed the two main notions of
separability found in the literature, the “old” one and the “new” one, and we
have clarified the difference between them. We have then addressed the issue of
checking satisfiability of a set of TGDs and EGDs together with an instance,
and we have shown that this can be done by merely answering suitable queries
in the case of deeply separable classes of constraints. We have shown the de-
sirable property that all separable sets of constraints are also deeply separable.
We have therefore clarified and proved formally a satisfiability check which was
already employed in the literature, but proved on a case-by-case basis [11, 8,
12], depending on the class of constraints. We believe that our generalisation
provides a useful tool for future studies, and a better insight into the satisfia-
bility problem. Then, we have shown some preliminary experimental results on
a prototype system we have built. While the experimentation is still at an ini-
tial stage, the first results are promising, even in “pathological” cases where we
designed ontologies that behave, with respect to the separability check, as the
(asymptotically) worst case.
     Our research, given the broad variety of expressive ontology languages cap-
tured by the formalisms we considered, has broad applications in ontology-based
data access, as we provide tools for the separability check. Notice that in the
case of data exchange [19], where the chase is guaranteed to terminate for each
instance, general EGDs can be easily treated, and the notion of separability has
little significance.
     Open problems. We currently have two main open problems at hand, which
seem non-trivial. First, we would like to study the decidability of determining
whether a given set of TGDs and EGDs is separable. This has been proved to
be undecidable for the case of arbitrary TGDs and EGDs [22]; however, the


                                       59
proof relies on the fact that query answering under arbitrary TGDs (without
EGDs) is undecidable [3]. It is not clear whether undecidability holds also in the
cases where query answering is undecidable under TGDs and EGDs together,
but decidable under TGDs alone; relevant cases are, for instance, IDs and KDs,
sticky sets of TGDs and EGDs, or guarded TGDs and KDs [7]. We conjecture
that determining separability is undecidable for these classes, but a proof is still
to be devised. The second problem is more technical, and is determining the
complexity of checking the syntactic condition for the separability of sticky sets
of TGDs and EGDs. We have an obvious exptime lower bound, but the upper
bound is currently unknown.
    Acknowledgments. We would like to thank Georg Gottlob, Andreas Pieris
and the anonymous reviewers for their insightful comments on this research.


References
 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley,
    1995.
 2. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite
    family and relations. J. Artif. Intell. Res., 36:1–69, 2009.
 3. C. Beeri and M. Y. Vardi. The implication problem for data dependencies. In
    Proc. of ICALP, pages 73–85, 1981.
 4. A. Calı̀. Query Answering and Optimisation in Data Integration Systems. PhD
    thesis, Università di Roma “La Sapienza”, 2003.
 5. A. Calı̀, M. Console, R. Frosini, and A. Pieris. On equality constraints in ontological
    query answering. Technical report, University of London, Birkbeck College, 2012.
    Available from the authors.
 6. A. Calı̀, G. Gottlob, and M. Kifer. Taming the infinite chase: Query answering
    under expressive relational constraints. In Proc. of KR, pages 70–80, 2008.
 7. A. Calı̀, G. Gottlob, and T. Lukasiewicz. A general datalog-based framework for
    tractable query answering over ontologies. In Proc. of PODS, pages 77–86, 2009.
 8. A. Calı̀, G. Gottlob, and T. Lukasiewicz. A general datalog-based framework for
    tractable query answering over ontologies. J. Web Sem., 2012. to appear.
 9. A. Calı̀, G. Gottlob, T. Lukasiewicz, B. Marnette, and A. Pieris. Datalog+/-: A
    family of logical knowledge representation and query languages for new applica-
    tions. In Proc. of LICS, pages 228–242, 2010.
10. A. Calı̀, G. Gottlob, and A. Pieris. Advanced processing for ontological queries.
    PVLDB, 3(1):554–565, 2010.
11. A. Calı̀, G. Gottlob, and A. Pieris. Querying conceptual schemata with expressive
    equality constraints. In Proc. of ER, pages 161–174, 2011.
12. A. Calı̀, G. Gottlob, and A. Pieris. Ontological query answering under expressive
    Entity-Relationship schemata. Inf. Syst., 37(4):320–335, 2012.
13. A. Calı̀, D. Lembo, and R. Rosati. On the decidability and complexity of query
    answering over inconsistent and incomplete databases. In Proc. of PODS, pages
    260–271, 2003.
14. A. Calı̀ and A. Pieris. On equality-generating dependencies in ontology querying
    – Preliminary report. In Proc. of AMW, 2011.
15. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable
    reasoning and efficient query answering in description logics: The DL-lite family.
    J. Autom. Reasoning, 39(3):385–429, 2007.


                                         60
16. A. K. Chandra and M. Y. Vardi. The implication problem for functional and
    inclusion dependencies. SIAM Journal of Computing, 14:671–677, 1985.
17. P. P. Chen. The entity-relationship model: towards a unified view of data. ACM
    TODS, 1(1):124–131, 1995.
18. A. Deutsch, A. Nash, and J. B. Remmel. The chase revisited. In Proc. of PODS,
    pages 149–158, 2008.
19. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: Semantics and
    query answering. Theor. Comput. Sci., 336(1):89–124, 2005.
20. D. S. Johnson and A. C. Klug. Testing containment of conjunctive queries under
    functional and inclusion dependencies. J. Comput. Syst. Sci., 28(1):167–189, 1984.
21. D. Maier, A. O. Mendelzon, and Y. Sagiv. Testing implications of data dependen-
    cies. ACM Trans. Database Syst., 4(4):455–469, 1979.
22. A. Pieris, 2012. Personal communication.
23. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati.
    Linking data to ontologies. J. Data Semantics, 10:133–173, 2008.


                                      61

</pre>