1. Introduction and Preliminaries

On Equality Constraints in Datalog+/- Knowledge Bases (Position Paper)

Andrea Calí

Marco Console

Riccardo Frosini

0 0 Birkbeck University of London , UK 1 Sapienza University of Rome , Italy 2 University of Naples “Federico II” , Italy

2026

Commonly adopted database constraints in knowledge bases are tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs), which form the core of the Datalog+/- family of languages, which are able to capture a variety of ontology formalisms. The presence of EGDs in Datalog+/- programs, and even in simpler relational schema languages based e.g. on inclusion dependencies, is known to easily lead to undecidability or intractability of query answering. The notion of separability was hence introduced to characterise sets of EGDs that have limited interaction with TGDs. We review two notions of separability found in the literature, as well as syntactic conditions that are suficient to them. In particular, we define the notion of deep separability, which captures several analogous notions defined ad-hoc in the literature, and provide a suficient condition for it. We then establish a relationship between decidability of query answering and decidability of determining separability. This works sheds light on the interaction between EGDs and TGDs in Datalog+/- knowledge bases and beyond, providing a basis for further investigations.

eol>Ontology Based Data Access Semantic Technologies Integrity Constraints

1. Introduction and Preliminaries When a database is equipped with an ontology Σ , that is, a knowledge base consisting of

constraints that express relevant properties of the underlying domain, queries are not answered merely against the database instance , but against the logical theory ∪ Σ . Several languages have been proposed for ontologies, with diferent computational properties.

The DL-Lite family [1] has the advantage of a low (ac0, which is contained in logspace) data

complexity of conjunctive query answering and of knowledge base satisfiability.

The ER± family of ER-like languages [2], in particular, comprises several tractable (w.r.t. con

junctive query answering) ontology languages, which properly generalize the main languages of the DL-Lite family.

Another relevant, more general class of ontology languages, capable of capturing most DL

lite language, is the Datalog± family, that is, a family of rule-based languages derived from

Datalog (see, e.g., [3] and references therein) whose rules are (function-free) Horn rules, possibly

with existentially quantified variables in the head, called tuple-generating dependencies (TGDs), enriched with functionality constraints in the form of equality-generating dependencies (EGDs), and negative constraints, a form of denial constraints.

In this paper we focus on the interaction between TGDs and EGDs. A TGD is an first-order

implication that forces the existence of tuples under certain conditions, and it is of the form ∀X∀Y (X, Y) → ∃Z (X, Z) , where (X, Y) and (X, Z) are conjunctions of atoms over a relational schema. An EGD forces equality of values under certain conditions, and it is of the form ∀X (X) → = , where (X) is a conjunction of atoms over a relational schema, and {, } ⊆ X . To answer a query over an instance and a set Σ of TGDs and EGDs, we could in principle “expand” according to Σ , inferring all the entailed additional knowledge, and then evaluate against the obtained instance. Said expansion is called chase in the literature, for which we refer the reader to [ 4 ]. In the chase, atoms are added according to TGDs; in doing so, unknown values (those corresponding to the existentially-quantified variables of TGDs) are represented by labelled nulls, which are a sort of placeholder. The set of labelled nulls is denoted by Γ , while the set of constants is denoted by Γ . EGDs force the equality between a labelled null and a constant, or between two labelled nulls1. The application of an EGD can trigger the application of TGDs. The chase as a procedure applies one TGD, then all EGDs until no further EGD application is possible, and then starts again with an EGD application, and so on, possibly to the infinite. The chase is a universal model in this case, which entails that the correct answer to a query on a database under a set Σ of TGDs and EGDs can be obtained by evaluating on the chase of (only in principle, as the result of the chase, also called chase for convenience of notation, can be infinite).

In the rest of this section we show two examples to define the notion of separability, which

shall be introduced formally in the next section.

Example 1. Consider the following set Σ of TGDs and EGDs (we omit universal quantifiers to avoid clutter): 1 : 1() → ∃ 2(, ) 2 : 2(, ) → 3(, ) 3 : 3(, ) → 4(, ) 4 : 4(, ) → 5(, ) 5 : 5(, ) → 2(, )

: 3(, ), 3(, ) → = Notice that is a key dependency, imposing that atoms in 3 have unique values on their first attribute. Now, let us take = {1(), 3(, )} and the (ground) Boolean conjunctive query defined as ← 2(, ) ( is a propositional predicate), which simply asks whether 2(, ) holds. Let us answer by computing the chase of according to Σ , denoted chase(, Σ) . We first obtain 2(, ) from 1, where is a labelled null. From 3 we get 4(, ). We then add 3(, ) from 2; at this point we need to apply and enforce = , so that 3(, ) becomes 3(, ). Due to , therefore, the query has positive answer, written ∪ Σ |= . However, even in the absence of , would be answered positively. In fact, if we proceed with the expansion we get 5(, ) from 4 and finally 2(, ) from 5. Therefore we would have had the same result without . This can be shown to hold for every query ; therefore, provided that the theory ∪ Σ is satisfiable (that is, the expansion does not fail), we can answer every query, for every , by considering the TGDs only. This property is called separability, and it defines a form of lack of interaction (hence the name) between EGDs and TGDs in a certain set.

1Equating two distinct constants results in a logical inconsistency because the unique name assumption is adopted.

Example 2. Suppose we have = {(, )}, a TGD (, ) → ∃ (, ), and an EGD (1, 2), (3, 2) → 1 = 2. The chase (according to the dependencies above) then adds (, ) by virtue of the TGD turns into and the added atom becomes (, ). Hence a query defined as → (, ) has positive answer, and would have negative answer without the EGD above. In this case separability does not hold.

2. What is separability? It is well know that the interaction of TGDs and EGDs leads to undecidability of query answering;

this happens even in simple sub-classes of constraints, such as key and inclusion dependencies [5,

6]. It is therefore useful to identify classes of constraints which do not sufer from his “harmful”

interaction. To this aim, a key condition is that of separability [ 2, 7 ], which we give below. Definition 1 (Separability). Consider a Σ of TGDs over a schema ℛ, and a set Σ of EGDs over ℛ. We say that the set Σ = Σ ∪ Σ is separable if, for every database for ℛ, either chase(, Σ) fails, or, chase(, Σ) |= if chase(, Σ ) |= , for every BCQ over ℛ.

Notably, there is another definition of separability in the literature, which is adopted in [ 5, 6, 8]. Such a definition is similar to the above Definition 1, but with a diference which makes it stronger. For the sake of completeness, we give such a definition below.

Definition 2 (Old separability). Consider a Σ of TGDs over a schema ℛ, and a set Σ of EGDs over ℛ. We say that the set Σ = Σ ∪ Σ is separable if, for every database for ℛ, (i) if the chase fails, then ̸|= Σ ( does not satisfy Σ ), and (ii) if the chase does not fail, we have chase(, Σ) |= if chase(, Σ ) |= , for every BCQ over ℛ.

The old separability is a special case of the new separability as it enforces condition (i) (Definition 2), which we reformulate below, calling it EGD-stability.

Definition 3 (EGD-stability). Consider a set Σ of TGDs over a schema ℛ, and a set Σ of EGDs over ℛ. We say that the set Σ = Σ ∪ Σ is EGD-stable if, for every instance for ℛ, |= Σ implies that chase(, Σ) does not fail.

EGD stability guarantees that the satisfiability of ∪Σ (i.e., the existence of chase(, Σ) ) can be determined by merely checking whether |= Σ . The satisfiability check is a fundamental step in query answering (see Section 2.2), and EGD stability guarantees it can be easily done.

However, with a more sophisticated version of the satisfiability check (see Section 3), the old separability can be relaxed to the new separability, giving raise to the discovery of new (syntactic) classes that enjoy (new) separability. Section 2.1 below provides an overview of the most relevant syntactic classes in the literature.

2.1. Related Work The notion of (old) separability was first introduced in [ 5, 6 ], in the context of inclusion dependencies (IDs) and key dependencies (KDs) (see e.g. [ 9 ]). The general idea is to define a suficient syntactic condition for separability, which can be eficiently checked. This was done while extending an early class of IDs and KDs (see e.g. [ 9 ]), called key-based. Key-based IDs and KDs were proposed in the milestone work by Johnson and Klug [ 10 ], and they are in fact separable, though not defined explicitly as such. Key-based IDs and KDs are defined as follows 2: (i) for each relational predicate , there is only one KD defined on it; (ii) for each ID [X] ⊆ [Y] , where X and Y are set of attributes of and respectively (see [ 9 ] for the notation), (ii.a) X is disjoint from any key set of attributes for , and (ii.b) Y is properly contained in the set of key attributes for .

The more general class of non-key-conflicting (NKC) IDs, with respect to a set of KDs, is

defined as follows: (i) for each relational predicate , there is only one KD defined on it; (ii) for every ID [X] ⊆ [Y] , Y is not a proper superset of the set of key attributes for (if such set exists). The class of KDs and NKC IDs is separable, and it properly captures the well known class of foreign-key dependencies.

The class of KDs and non-key conflicting IDs was generalized in [ 8] to general TGDs, which

are assumed to have a single atom in the head, without loss of generality. The idea is analogous, and the condition is as follows: (i) for each relational predicate , there is only one KD defined on it; (ii) each existentially quantified variable in the head of a TGD must occur only once; (iii) for each TGD = (X, Y) → ∃Z (X, Z) , the set of X-attributes of (X, Z) is not a proper superset of the set of key attributes of .

In [11], the class of non-key-conflicting TGDs was straightforwardly extended to treat functional dependencies (FDs) rather than keys. The literature so far discussed deals with suficient syntactic conditions that guarantee, in

fact, EGD-stability. However, there are classes of TGDs and EGDs such that EGDs are triggered in the chase, but which enjoy separability.

Example 3. Consider the set of TGDs and EGDs in Example 1. It is separable, but it is not EGDstable. In fact, if we consider the instance ′ = {3(, ), 2(, )}, we have that ′ |= Σ but ∪ Σ is unsatisfiable because the chase fails due to a hard violation.

This leads in [2] to the definition of separability as in Definition 1 in this paper. This work deals

with IDs and KDs which express an expressive variant of the Entity-Relationship model [ 12 ]. By means of graph-related properties, necessary and suficient syntactic conditions for separability are provided, thus defining useful tractable classes of constraints.

The case of linear TGDs (TGDs with a single body-atom) and KDs was later considered

in [ 13, 7 ]. In these works, suficient syntactic conditions are proposed that guarantee separability, without imposing non-egd-triggerability. The conditions are quite involved and they make use of backward-resolution. Interestingly, the complexity of checking the syntactic condition, called non-conflict condition, is the same as that of query answering, that is pspace-complete.

At this point, we considered separability but not the problem of satisfiability. While separability allows us to ignore EGDs in the case of satisfiable theories, the problem of deciding the satisfiability remains, as well as that of determining its complexity. This will be the subject of next section. 2The definition in the original paper is diferent but equivalent; we choose a definition which is more clear in the context of this paper. 2.2. Query Answering under Separable Constraints In the case of a set Σ = Σ ∪ Σ of separable TGDs and EGDs, such that BCQ answering under Σ is decidable, given an instance and a BCQ , to decide whether ∪ Σ |= , the following steps are needed: 1. Check whether the chase fails, that is, whether ∪Σ is satisfiable; if ∪Σ is unsatisfiable, then trivially ∪ Σ |= (“Ex falso quodlibet” ). 2. If ∪ Σ is satisfiable, then by Definition 1 we know chase(, Σ) |= if chase(, Σ ) |= , therefore we check whether chase(, Σ ) |= .

Apart from the complexity of the satisfiability check, we have that the complexity of query

answering is the same as that of answering under TGDs only, which is a highly desirable property. In the case of EGD-stable constraints, the satisfiability check amounts simply to checking whether |= Σ , which can be done in np, and in ptime (or better, in the even lower complexity class ac0) if we consider Σ fixed. However, in the cases of separable but not

EGD-stable constraints, the problem is to be addressed diferently; this will be the subject of Section 3. 3. Separability and Satisfiability In this section we address the problem of deciding whether, given an instance and a set Σ of

separable TGDs and EGDs, ∪ Σ is satisfiable. As seen in Section 2, this preliminary check is needed, in the case of separability, before one proceeds to answer a BCQ by taking into account the TGDs only.

The satisfiability check is done as in [ 8, 2 ], by encoding hard violations of EGDs as a set of Boolean conjunctive queries. Given a separable set Σ = Σ ∪ Σ , and an instance , we have that satisfiability holds if and only if, for each ∈ we have ∪Σ ̸|= or, equivalently, if and only if all queries in have negative answer when evaluated against and Σ (TGDs only). The encoding is done as follows. First, we need to add auxiliary facts to . For each pair of distinct constants 1, 2 appearing in as arguments, we add the facts neq (1, 2) and neq (2, 1) to , where neq /2 is an auxiliary predicate expressing that two constants are distinct. Then, for each EGD of the form (X) → = , with and in X, we construct the BCQ as follows (quantifiers omitted for brevity): : ← (X), neq ( , ), where is a propositional predicate. It is not dificult to prove that if ∪ Σ |= then chase(, Σ) fails, and therefore ∪Σ is unsatisfiable. However, it still remains to determine whether ∪Σ |= .

Notice that in the case of non-egd-triggerable constraints we do not have this problem, as we

merely need to check whether |= Σ . In principle, separability (see Definition 1 does not tell us anything about the cases when the chase fails. However, we are still in luck because we can evaluate the (Boolean) conjunctive queries in against ∪ Σ rather than ∪ Σ .

This is proved, for instance, in [2, 13], but ad hoc, by using the properties of the particular

class of constraints involved. Here we show a general condition that allows us to perform the satisfiability check as above. We first need to introduce a preliminary condition, called deep separability, which is apparently more restrictive than separability (we shall then prove that it is implied by separability). We start by denoting by chase(, Σ) the result of the first chase steps under Σ applied to an instance , where a step is either an EGD or a TGD application. Definition 4. Let Σ be a set of TGDs and EGDs, with Σ = Σ ∪Σ , where the dependencies in Σ are TGDs and those in Σ are EGDs. We say that Σ is deeply separable if, for each integer ⩾ 0, for each instance with with values in Γ ∪ Γ , and for each Boolean conjunctive query , the following holds: if chase(, Σ) exists, then chase (, Σ) |= implies chase(, Σ ) |= .

The notion of deep separability, intuitively, guarantees that, at any step before a possible

failure, the chase does not entail any atoms that are not entailed by the chase computed according to Σ only. We now come to the main result about deeply separable TGDs and EGDs. The result states that in the case of deep separability the satisfiability check done by answering suitable queries as above is correct and complete.

Theorem 1. Let Σ be a set of deeply separable TGDs and EGDs, with Σ = Σ ∪ Σ , where the dependencies in Σ are TGDs and those in Σ are EGDs. Let = {1, . . . , } be the set of BCQs encoding violations of the EGDs in Σ as from the above construction. Then we have that chase(, Σ ) |= for some ∈ {1, . . . , } if and only if chase(, Σ) fails (or equivalently ∪ Σ is unsatisfiable).

Proof (sketch). We prove the two directions of the implication separately.

“Only if”. By contradiction, assume chase(, Σ ) |= for some ∈ {1, . . . , } but chase(, Σ) exists. It is not dificult to show that if chase(, Σ ) |= then also chase(, Σ) |= ; this holds because chase(, Σ ) is a universal model for ∪ Σ and chase(, Σ) is a model (not necessarily universal) for ∪ Σ ; therefore there exists a homomorphism from the former to the latter. If chase(, Σ) |= then the chase must necessarily fail. Contradiction.

“If”. Assume chase(, Σ) fails at step by violation of the EGD ∈ Σ . Then, chase−1 (, Σ) exists; moreover, it is easily seen that chase−1 (, Σ) |= , where ∈ encodes the violation of . By the hypothesis of deep separability we have chase(, Σ ) |= , hence the claim. □

Notice that for the “If” direction we do not need deep separability nor separability; this direction of the implication holds for general TGDs and EGDs. Finally we show that, despite the appearances, deep separability is not a stricted condition than separability. In fact, separability implies deep separability.

Theorem 2. Let Σ be a set of TGDs and EGDs. If Σ is separable, then it is deeply separable.

Proof (sketch). We distinguish two cases. Case 1: chase(, Σ) exists . In this case the claim obviously holds. Case 2: chase(, Σ) fails . Assume chase(, Σ) fails at step ; therefore chase−1 (, Σ)

exists. Take obtained from by replacing each constant in Γ with a fresh null from Γ . Obviously chase( , Σ) exists and so does chase−1 ( , Σ) . Take any BCQ such that chase−1 ( , Σ) |= . It is straightforwardly seen that chase( , Σ) |= and, due to separability, chase( , Σ ) |= . Since chase( , Σ ) |= is obtained from chase( , Σ ) by the above renaming of constants into nulls, we have chase(, Σ ) |= . Since this holds for every step ℓ ⩽ − 1, the claim is proved. □

The following result immediately follow from the above.

Corollary 1. For every separable set Σ of TGDs and EGDs, for every instance and for every BCQ : • checking unsatisfiability of ∪ Σ has the same complexity as query answering under Σ alone; • checking whether ∪ Σ |= has the same complexity as query answering under Σ alone.

Notice that we mention unsatisfiability rather than satisfiability because Theorem 1 shows a

reduction from failure (unsatisfiability) to BCQ answering under TGDs alone. 3.1. A Syntactic Condition for Separability

We consider positions in the schema as arguments of predicates; if we have a predicate /3

of arity 3, we denote its arguments as positions [ 1 ], [ 2 ] and [ 3 ]. Given a set Σ of TGDs, we say that a position is a bud position if, whenever it appears in a head-atom of any TGD of Σ , it contains an existentially-quantified variable. Given an EGD body (X) , we denote by (X)| 1/2 , with {1, 2} ⊆ X , the conjunction of atoms obtained from (X) by replacing 1 with 2. Given an EGD (X) → 1 = 2, we call 1, 2 equality variables (EVs). Definition 5. Given a set Σ = Σ ∪ Σ as before, we say that Σ is transparent if: (i) for every EGD (X) → 1 = 2 in Σ , the equality variables appear in the body only in bud positions, and (ii) every atom in (X)| 1/2 or (X)| 2/1 is such that (X) ∪ Σ |= .

Intuitively, transparency ensures that (i) whenever an EGD is applied in the chase, the

equation of two symbols (two labelled nulls, or one labelled null and a constant) does not propagate backwards in the portion of the chase constructed at that point; in fact, any labelled null corresponding to an EV in the application appears in the chase for the first time; (ii) all atoms appearing as a consequence of the EGD application would appear anyway in chase(, Σ ).

The following theorem can then be proved.

Theorem 3. If a set Σ = Σ

∪ Σ of TGDs and EGDs is transparent, then Σ is separable. 3.2. Separability and Decidability

To establish a relationship between the problem of determining the separability of a set of TGDs plus EGDs and decidability of conjunctive query answering, we show the following result.

Theorem 4. Consider a set Σ = Σ is separable is undecidable. ∪ Σ of TGDs and EGDs as before. Determining whether Σ

The above theorem can be proved by reducing the problem of conjunctive query answering to the one of checking separability. 4. Discussion In this paper we have given an overview of the problem of separability between TGDs and EGDs

in the context of ontological query answering, where queries are (Boolean) conjunctive queries. We have reviewed the two main notions of separability found in the literature, the “old” one and the “new” one, and we have clarified the diference between them. We have then addressed the issue of checking satisfiability of a set of TGDs and EGDs together with an instance, and we have shown that this can be done by merely answering suitable queries in the case of deeply separable classes of constraints. We have shown the desirable property that all separable sets of constraints are also deeply separable. We have therefore clarified and proved formally a satisfiability check which was already employed in the literature, but proved on a case-by-case basis [ 7, 8, 2 ], depending on the class of constraints. We believe that our generalisation provides a useful tool for future studies, and a better insight into the satisfiability problem.

Further results. More results on this paper’s topics (for which we also refer the reader to

an older paper [ 14 ]), which we did not include here due to space limitations, will be published elsewhere. In [ 7 ] a suficient condition for separability (in the “new” meaning) is provided for the case of EGDs and linear TGDs (which, we remind the reader, are TGDs with exactly one head-atom and one body-atom), with ontologies that encode so-called Extended Entity

Relationship schemata. Such a condition can be extended (with some changes) to the class of

sticky sets of TGDs [ 11 ], a relevant class that allows for a form of joins in the body.

Open problems. We currently have two main open problems at hand, which seem non

trivial. First, we would like to study the decidability of determining whether a given set of

TGDs and EGDs is separable. This has been proved to be undecidable for the case of arbitrary TGDs and EGDs [15]; however, the proof relies on the fact that query answering under arbitrary TGDs (without EGDs) is undecidable [16]. It is not clear whether undecidability holds also in

the cases where query answering is undecidable under TGDs and EGDs together, but decidable under TGDs alone; relevant cases are, for instance, IDs and KDs, sticky sets of TGDs and EGDs, or guarded TGDs and KDs [ 17 ]. We conjecture that determining separability is undecidable for these classes. The second problem is more technical, and is determining the complexity of checking the aforementioned, syntactic condition for the separability of sticky sets of TGDs and

EGDs. We have an obvious exptime lower bound, but the upper bound is currently unknown. Ackowledgments. Andrea Calí acknowledges financial support from: project SERICS

(PE00000014); “cascade” project SMIMI (CUP F53C24000190005), under the NRRP MUR program funded by the EU-NGEU; PNRR ECS00000037 “cascade” project MUSA; project INFANT (CUP

E23C24000390006); project MELODY (PRIN-PNRR, CUP E53D23017550001). The authors would like to thank Andreas Pieris for his insightful comments on this research. The authors have not employed any Generative AI tools.

[1]

Artale ,

Calvanese ,

Kontchakov , M. Zakharyaschev, The DL-Lite family and relations , J. Artif. Intell. Res . 36 ( 2009 ) 1 - 69 .

[2]

Calì ,

Gottlob ,

Pieris , Ontological query answering under expressive EntityRelationship schemata , Inf. Syst . 37 ( 2012 ) 320 - 335 .

[3]

Calì , G. Gottlob,

Lukasiewicz ,

Marnette ,

Pieris , Datalog+/ -: A family of logical knowledge representation and query languages for new applications , in: Proc. of LICS , 2010 , pp. 228 - 242 .

[4]

Calì ,

Console ,

Frosini , Deep separability of ontological constraints, 2013 . URL: https://arxiv.org/abs/1312.5914.

[5]

Calì , Query Answering and Optimisation in Data Integration Systems, Ph.D. thesis , Università di Roma “La Sapienza”, 2003 .

[6]

Calì ,

Lembo ,

Rosati , On the decidability and complexity of query answering over inconsistent and incomplete databases , in: Proc. of PODS , 2003 , pp. 260 - 271 .

[7]

Calì ,

Gottlob ,

Pieris , Querying conceptual schemata with expressive equality constraints , in: Proc. of ER , 2011 , pp. 161 - 174 .

[8]

Calì , G. Gottlob,

Lukasiewicz , A general datalog-based framework for tractable query answering over ontologies , J. Web Sem . ( 2012 ). To appear.

[9]

Abiteboul ,

Hull ,

Vianu , Foundations of Databases, Addison-Wesley, 1995 .

[10]

D. S.

Johnson ,

A. C.

Klug , Testing containment of conjunctive queries under functional and inclusion dependencies , J. Comput. Syst. Sci . 28 ( 1984 ) 167 - 189 .

[11]

Calì ,

Gottlob ,

Pieris , Advanced processing for ontological queries , PVLDB 3 ( 2010 ) 554 - 565 .

[12]

P. P.

Chen , The entity-relationship model: towards a unified view of data , ACM TODS 1 ( 1995 ) 124 - 131 .

[13]

Calì ,

Pieris , On equality-generating dependencies in ontology querying - Preliminary report , in: Proc. of AMW , 2011 .

[14]

Calì ,

Console ,

Frosini ,

Pieris , On Equality Constraints in Ontological Query Answering , Technical Report , University of London, Birkbeck College, 2012 . Available from the authors .

[15]

Pieris , 2012 . Personal communication.

[16]

Beeri ,

M. Y.

Vardi , The implication problem for data dependencies , in: Proc. of ICALP , 1981 , pp. 73 - 85 .

[17]

Calì , G. Gottlob,

Lukasiewicz , A general datalog-based framework for tractable query answering over ontologies , in: Proc. of PODS , 2009 , pp. 77 - 86 .