Introduction

On Checking Domain Independence

Volha Kerhet

kerhet@inf.unibz.it 0

Enrico Franconi

franconi@inf.unibz.it 0 0 Faculty of Computer Science, Free University of Bozen-Bolzano , Italy

Domain independence is an important property of a formula that guarantees that the truth value of the formula in an interpretation remains the same regardless of the underlying domain of the interpretation. Unfortunately, checking domain independence of a first-order formula is in general undecidable. There have been several attempts to define syntactic fragments of first-order logic characterising domain independent formulas that easily can be checked, but all of them are incomplete. We reduce the problem of checking domain independence of a general first-order logic formula to checking a standard first-order logic entailment. This method can be applied in any decidable fragment where the formulas participating in the entailment can be expressed.

Introduction

Domain independence [ 1 ] is an important property of a formula that guarantees that the truth value of the formula in an interpretation remains the same regardless of the underlying domain of the interpretation. The importance of checking domain independence stems from the use of first-order logic as a query language for databases. Since a query can be an arbitrary first-order formula, its answer can be infinite (since the domain is not restricted to be finite) or it may depend on the domain. For example, the query Q(x) = :Student(x) over the database Student(a); Student(b), with domain fa; b; cg has the answer fx 7! cg, while with domain fa; b; c; dg has the answer fx 7! c; x 7! dg, and if we change the domain to an infinite one, the answer will be infinite even in presence of such a finite database. Therefore, the notion of domain independent queries has been introduced in relational databases: indeed, the above open formula :Student(x) turns out not to be domain independent. An example of a domain independent formula would be 9x: P (x), since if it turns out to be true in some interpretation of the unary predicate symbol P with a specific domain, it is also true in all the structures sharing the same interpretation for the predicate P and any other compatible domain. On the other hand, the formula 8x: P (x) is not a domain independent formula, since if it is true in an interpretation with some domain, it is definitely not true in any structure sharing the same interpretation for the predicate P but with a larger domain.

The problem of checking whether a FOL formula is domain independent is undecidable [ 2 ]. The well known safe-range syntactic fragment of FOL introduced by Codd is an equally expressive language; indeed any safe-range formula is domain independent, and any domain independent formula can be easily transformed into a logically equivalent safe-range formula. Intuitively, a formula is safe-range if and only if its variables are bounded by positive predicates or equalities – for the exact syntactical definition see, e.g., [ 2 ]. For example, the formula 8x: G(x) ! P (x) is safe-range, while formula 9x: (A(x) _ B(a)) is not. To check whether a formula is safe-range, the formula is transformed into a logically equivalent safe-range normal form and its range restriction is computed according to a set of syntax based rules; the range restriction of a formula is a subset of its free variables, and if it coincides with the free variables then the formula is said to be safe-range.

The problem arises since the safe-range property is an incomplete characterisation of domain independence: indeed, the above non-safe-range formula 9x: (A(x) _ B(a)) is logically equivalent to the formula (9x: A(x)) _ B(a) which is safe-range, and therefore we can conclude that also the former formula is domain independent. In this paper, we show for the first time a sound and complete procedure to check domain independence of a formula, by reducing this problem to an entailment problem in first-order logic.

All the proofs of the theorems presented in this paper can be found in the technical report [ 3 ]. 2

Preliminaries

Let F OL(C; P) be a classical function-free first-order language with equality over a signature = (C; P), where C is a set of constants and P is a set of predicates with associated arities.The arity of a predicate P we denote as ar(P ). We denote as ( ) the signature of the formula , that is all the predicates and constants occurring in . We denote the set of all variables appearing in as var( ), and the set of the free variables appearing in as free( ); we may use for the notation (x), where x = free( ) is the (possibly empty) set of free variables of the formula. A formula in F OL(C; P) is in prenex normal form, if it is written as a string of quantifiers followed by a quantifier-free part. Every formula is equivalent to a formula in prenex normal form and can be converted into it in polynomial time [ 4 ].

Let x and S be two countable sets, first of which is some set of variables. We define a substitution xS to be a total function x 7! S assigning an element of the set S to each variable in x, including the empty substitution when x = ;. We can see substitution as (at most) a countable set of assignments of elements from S to elements from x. That is, if x = fx1; x2; : : :g, then xS := fx1 ! s1; x2 ! s2; : : :g, where s1; s2; : : : are elements from S assigned to corresponding variables from x by xS.

As usual, an interpretation M = h M; Mi includes a non-empty set – the domain M – and an interpretation function M defined over constants and predicates of the signature. We use standard definitions of validity, satisfiability and entailment of a formula. An extension of (x) in interpretation M = h M; Mi, denoted ( (x))M, is the set of substitutions which satisfy ( (x))M = f x j M; x j= (x)g: If is closed, then the extension depends on whether holds in M or not. Thus, for a closed formula , ( )M = , if M j= , and ( )M = ;, if M 6j= .

Hereafter let 0 be a subset of = P [ C.

Two interpretations M1 = h 1; M1 i and M2 = h 2; M2 i are 0-compatible iff they agree on the interpretations of all the predicates and constants in 0. That is, for any predicate P and constant c in 0 the following holds: P M2 = P M1 cM2 = cM1 ( )M1 = ( )M2 Definition 1. (Domain Independence) Let M1 = h 1; M1 i and M2 = h 2; M2 i be any two -compatible interpretations, and be a formula in F OL(C; P). is domain independent, if and only if

The semantic active domain of sub-signature 0 in an interpretation M = h M; Mi, denoted adom( 0; M), is the set of all elements of the domain M occurring in interpretations of predicates and constants from 0 in M: adom( 0; M) := [

[ P 2 0\P (a1;:::;an)2P M fa1; : : : ; ang [

[ In this section we show how to prove domain independence of a formula by proving a validity of a first-order logic entailment.

We introduce a new unary predicate Adom 0 . According to the idea, we consider a theory encoding that interpretation of Adom 0 in any model of the theory is never empty and equal to the semantic active domain of 0 in this model, if it is not empty. We call such a theory 0-active domain theory and denote it as A 0 . Formally, A 0 is the smallest theory satisfying the following four requirements. 1. For every n-ary predicate P 2

0 \ P, 8x : P (x) ! ^ Adom 0 (x) 2 A 0 : x2x In other words, Adom 0 contains any domain element that occurs in the interpretation of some predicate from 0. 2. Adom 0 contains all the constants from 0:

^ 3. Let n be a maximal arity among the predicates in 0, and let z = fz1; : : : ; zn 1g.

Then _

_ 9x: (x = c) ! (8x:Adom 0 (x) ! P 2 0\P c2 0\C 9z: _ (P (x; z1; : : : ; zar(P ) 1) _ : : : _ P (z1; : : : ; zar(P ) 1; x)) _ _ (x = c)) 2 A 0 .

P 2 0\P c2 0\C It means, that every element from Adom 0 necessarily appears in at least one predicate from 0 or equal to some constant from 0, if not all the predicates from 0 are empty or 0 contains constants. 4. Adom 0 is not empty:

9x: Adom 0 (x) 2 A 0 :

The next proposition follows from the definition of A 0 .

Proposition 1. An interpretation M = h M; Mi is a model of A 0 iff – AdomM0 = adom( 0; M), if adom( 0; M) 6= ;; – AdomM0 6= ;, if adom( 0; M) = ;.

Let (x) be a formula in F OL(C; P) with free variables x, x 2 x, and let P be some predicate. For the sake of readability we write 9x 2 P: (x) to denote the formula 9x: P (x) ^ (x); and we write 8x 2 P: (x) to denote the formula 8x: P (x) ! (x).

Let now (x) be a query in F OL(C; P) in prenex normal form, that is (x) = Q1y1 : : : Qkyk: (y; x); where Qi stands for either 8 or 9, x = fx1; : : : ; xng is the set of all free variables of (x), y = fy1; : : : ; ykg is the set of all bounded (quantified) variables of (x), and (y; x) is a quantifier free formula. Let P be an unary predicate. We will use the following denotation: n (x)jP := (Q1y1 2 P : : : Qkyk 2 P: (y; x)) ^ ^ P (xi): i=1 We extend this denotation also for the formulas, that are not in prenex normal form, meaning that the aforementioned operation ( )j( ) is applied to the prenex normal form of the formula.

Theorem 1 (Checking Domain Independence).

Let (x) be a formula in F OL(C; P). Then (x) is domain independent iff A ( ) j= 8x : (x) $ (x)jAdom ( ) : (1) Example 1. Let us consider the formula = 9x: (A(x)_B(a)) from [ 5 ] mentioned in Section 1 and apply the theorem for it. This formula is not safe-range but it is domain independent, because it is logically equivalent to (9x: A(x)) _ B(a), which is safe-range and, hence, domain independent. Such kind of formulas are of particular interest for us.

Since adom( ( ); M) = AM [ BM [ faMg, because of the proposition ?? we have:

A ( ) j= 8x: Adom ( )(x) $ A(x) _ B(x) _ (x = a): Then A ( ) j= jAdom ( ) $ 9x: (A(x) _ B(x) _ (x = a)) ^ (A(x) _ B(a)). One can see, that

j= 9x: (A(x) _ B(a)) $ 9x: (A(x) _ B(x) _ (x = a)) ^ (A(x) _ B(a)): That is, A ( ) j= $ jAdom ( ) . Therefore, by applying the theorem we proved, that formula is domain independent. 4

Conclusion

We proposed a method that allows to reduce the problem of checking domain independence of a formula to checking standard first-order logic entailment: a formula (x) from F OL(C; P) is domain independent whenever the entailment (1) holds. Using this technique in particular makes sense when the formula to be checked for domain independence is not safe-range. Thus, in practice this method is useful and can be applied in any not safe-range decidable fragment of F OL(C; P), where A ( ) and 8x : (x) $ (x)jAdom ( ) can be expressed. Discovering of such interesting fragments and application of the technique there is our future work.

We would like to thank Martín Rezk and Nhung Ngo for ideas and fruitful discussions.

1. Avron , A. : Constructibility and decidability versus domain independence and absoluteness . Theor. Comput. Sci . 394 ( April 2008 ) 144 - 158

2. Abiteboul , S. , Hull , R. , Vianu , V. : Foundations of Databases. Addison-Wesley ( 1995 )

3. Franconi , E. , Kerhet , V.: On checking domain independence (extended version) . Technical Report KRDB12-3 , KRDB Research Centre, Free University of BozenBolzano ( 2012 ) http://www.inf.unibz.it/krdb/pub/TR/KRDB12-3.pdf.

4. Kleene , S.C. : Mathematical Logic. New York: Dover ( 2002 )

5. Topor , R.W.: Domain-independent formulas and databases . Theor. Comput. Sci . 52 ( 1987 ) 281 - 306