=Paper= {{Paper |id=None |storemode=property |title=On Checking Domain Independence |pdfUrl=https://ceur-ws.org/Vol-857/paper_s03.pdf |volume=Vol-857 |dblpUrl=https://dblp.org/rec/conf/cilc/KerhetF12 }} ==On Checking Domain Independence== https://ceur-ws.org/Vol-857/paper_s03.pdf
           On Checking Domain Independence

                       Volha Kerhet and Enrico Franconi

       Faculty of Computer Science, Free University of Bozen-Bolzano, Italy
                      {kerhet,franconi}@inf.unibz.it



      Abstract Domain independence is an important property of a formula
      that guarantees that the truth value of the formula in an interpreta-
      tion remains the same regardless of the underlying domain of the inter-
      pretation. Unfortunately, checking domain independence of a first-order
      formula is in general undecidable. There have been several attempts to
      define syntactic fragments of first-order logic characterising domain in-
      dependent formulas that easily can be checked, but all of them are in-
      complete.
      We reduce the problem of checking domain independence of a general
      first-order logic formula to checking a standard first-order logic entail-
      ment. This method can be applied in any decidable fragment where the
      formulas participating in the entailment can be expressed.


1   Introduction

Domain independence [1] is an important property of a formula that guaran-
tees that the truth value of the formula in an interpretation remains the same
regardless of the underlying domain of the interpretation. The importance of
checking domain independence stems from the use of first-order logic as a query
language for databases. Since a query can be an arbitrary first-order formula, its
answer can be infinite (since the domain is not restricted to be finite) or it may
depend on the domain. For example, the query Q(x) = ¬Student(x) over the
database Student(a), Student(b), with domain {a, b, c} has the answer {x 7→ c},
while with domain {a, b, c, d} has the answer {x 7→ c, x 7→ d}, and if we change
the domain to an infinite one, the answer will be infinite even in presence of such
a finite database. Therefore, the notion of domain independent queries has been
introduced in relational databases: indeed, the above open formula ¬Student(x)
turns out not to be domain independent. An example of a domain independent
formula would be ∃x. P (x), since if it turns out to be true in some interpretation
of the unary predicate symbol P with a specific domain, it is also true in all the
structures sharing the same interpretation for the predicate P and any other
compatible domain. On the other hand, the formula ∀x. P (x) is not a domain
independent formula, since if it is true in an interpretation with some domain,
it is definitely not true in any structure sharing the same interpretation for the
predicate P but with a larger domain.
     The problem of checking whether a FOL formula is domain independent is
undecidable [2]. The well known safe-range syntactic fragment of FOL intro-
duced by Codd is an equally expressive language; indeed any safe-range for-
mula is domain independent, and any domain independent formula can be easily
transformed into a logically equivalent safe-range formula. Intuitively, a formula
is safe-range if and only if its variables are bounded by positive predicates or
equalities – for the exact syntactical definition see, e.g., [2]. For example, the
formula ∀x. G(x) → P (x) is safe-range, while formula ∃x. (A(x) ∨ B(a)) is not.
To check whether a formula is safe-range, the formula is transformed into a lo-
gically equivalent safe-range normal form and its range restriction is computed
according to a set of syntax based rules; the range restriction of a formula is a
subset of its free variables, and if it coincides with the free variables then the
formula is said to be safe-range.
    The problem arises since the safe-range property is an incomplete charac-
terisation of domain independence: indeed, the above non-safe-range formula
∃x. (A(x) ∨ B(a)) is logically equivalent to the formula (∃x. A(x)) ∨ B(a) which
is safe-range, and therefore we can conclude that also the former formula is do-
main independent. In this paper, we show for the first time a sound and complete
procedure to check domain independence of a formula, by reducing this problem
to an entailment problem in first-order logic.
    All the proofs of the theorems presented in this paper can be found in the
technical report [3].


2    Preliminaries

Let FOL(C, P) be a classical function-free first-order language with equality over
a signature Σ = (C, P), where C is a set of constants and P is a set of predicates
with associated arities.The arity of a predicate P we denote as ar(P ). We denote
as σ(φ) the signature of the formula φ, that is all the predicates and constants
occurring in φ. We denote the set of all variables appearing in φ as var(φ), and
the set of the free variables appearing in φ as free(φ); we may use for φ the
notation φ(x̄), where x̄ = free(φ) is the (possibly empty) set of free variables of
the formula. A formula in FOL(C, P) is in prenex normal form, if it is written
as a string of quantifiers followed by a quantifier-free part. Every formula is
equivalent to a formula in prenex normal form and can be converted into it in
polynomial time [4].
    Let x̄ and S be two countable sets, first of which is some set of variables. We
define a substitution Θx̄S to be a total function x̄ 7→ S assigning an element of the
set S to each variable in x̄, including the empty substitution  when x̄ = ∅. We can
see substitution as (at most) a countable set of assignments of elements from S to
elements from x̄. That is, if x̄ = {x1 , x2 , . . .}, then Θx̄S := {x1 → s1 , x2 → s2 , . . .},
where s1 , s2 , . . . are elements from S assigned to corresponding variables from x̄
by Θx̄S .
    As usual, an interpretation M = h∆M , ·M i includes a non-empty set – the
domain ∆M – and an interpretation function ·M defined over constants and pre-
dicates of the signature. We use standard definitions of validity, satisfiability and
entailment of a formula. An extension of φ(x̄) in interpretation M = h∆M , ·M i,
denoted (φ(x̄))M , is the set of substitutions which satisfy φ in M. That is,

                       (φ(x̄))M = {Θx̄∆ | M, Θx̄∆ |= φ(x̄)}.

If φ is closed, then the extension depends on whether φ holds in M or not. Thus,
for a closed formula φ, (φ)M = , if M |= φ, and (φ)M = ∅, if M 6|= φ.
     Hereafter let σ 0 be a subset of σ = P ∪ C.
     Two interpretations M1 = h∆1 , ·M1 i and M2 = h∆2 , ·M2 i are σ 0 -compatible
iff they agree on the interpretations of all the predicates and constants in σ 0 .
That is, for any predicate P and constant c in σ 0 the following holds:

                                      P M2 = P M1
                                      cM2 = cM1

Definition 1. (Domain Independence) Let M1 = h∆1 , ·M1 i and M2 =
h∆2 , ·M2 i be any two σ-compatible interpretations, and φ be a formula in FOL(C, P).
φ is domain independent, if and only if

                                    (φ)M1 = (φ)M2

   The semantic active domain of sub-signature σ 0 in an interpretation M =
h∆ , ·M i, denoted adom(σ 0 , M), is the set of all elements of the domain ∆M
    M

occurring in interpretations of predicates and constants from σ 0 in M:

                             [             [                                      [
        adom(σ 0 , M) :=                                 {a1 , . . . , an } ∪          {cM }.
                           P ∈σ 0 ∩P (a1 ,...,an )∈P M                          c∈σ 0 ∩C



3   Checking Domain Independence

In this section we show how to prove domain independence of a formula φ by
proving a validity of a first-order logic entailment.
   We introduce a new unary predicate Adom σ0 . According to the idea, we
consider a theory encoding that interpretation of Adom σ0 in any model of the
theory is never empty and equal to the semantic active domain of σ 0 in this
model, if it is not empty. We call such a theory σ 0 -active domain theory and
denote it as Aσ0 . Formally, Aσ0 is the smallest theory satisfying the following
four requirements.

1. For every n-ary predicate P ∈ σ 0 ∩ P,
                                      ^
                       ∀x̄ . P (x̄) →   Adom σ0 (x) ∈ Aσ0 .
                                           x∈x̄

    In other words, Adom σ0 contains any domain element that occurs in the
    interpretation of some predicate from σ 0 .
 2. Adom σ0 contains all the constants from σ 0 :
                               ^
                                    Adom σ0 (c) ∈ Aσ0 .
                                            c∈σ 0 ∩C

 3. Let n be a maximal arity among the predicates in σ 0 , and let z̄ = {z1 , . . . , zn−1 }.
    Then
      _                  _
           ∃x̄. P (x̄) ∨     ∃x. (x = c) → (∀x.Adom σ0 (x) →
    P ∈σ 0 ∩P_                   c∈σ 0 ∩C                                                              _
    ∃z̄.               (P (x, z1 , . . . , zar(P )−1 ) ∨ . . . ∨ P (z1 , . . . , zar(P )−1 , x)) ∨              (x = c)) ∈ Aσ0 .
           P ∈σ 0 ∩P                                                                                 c∈σ 0 ∩C
    It means, that every element from Adom σ0 necessarily appears in at least one
    predicate from σ 0 or equal to some constant from σ 0 , if not all the predicates
    from σ 0 are empty or σ 0 contains constants.
 4. Adom σ0 is not empty:
                                 ∃x. Adom σ0 (x) ∈ Aσ0 .

   The next proposition follows from the definition of Aσ0 .
Proposition 1. An interpretation M = h∆M , ·M i is a model of Aσ0 iff
 – Adom M           0               0
        σ 0 = adom(σ , M), if adom(σ , M) 6= ∅;

 – Adom M                  0
        σ 0 6= ∅, if adom(σ , M) = ∅.

    Let φ(x̄) be a formula in FOL(C, P) with free variables x̄, x ∈ x̄, and let P
be some predicate. For the sake of readability we write ∃x ∈ P. φ(x̄) to denote
the formula ∃x. P (x) ∧ φ(x̄); and we write ∀x ∈ P. φ(x̄) to denote the formula
∀x. P (x) → φ(x̄).
    Let now φ(x̄) be a query in FOL(C, P) in prenex normal form, that is

                                     φ(x̄) = Q1 y1 . . . Qk yk . ψ(ȳ, x̄),

where Qi stands for either ∀ or ∃, x̄ = {x1 , . . . , xn } is the set of all free variables
of φ(x̄), ȳ = {y1 , . . . , yk } is the set of all bounded (quantified) variables of φ(x̄),
and ψ(ȳ, x̄) is a quantifier free formula. Let P be an unary predicate. We will
use the following denotation:
                                                                                  n
                                                                                  ^
                 φ(x̄)|P := (Q1 y1 ∈ P . . . Qk yk ∈ P. ψ(ȳ, x̄)) ∧                    P (xi ).
                                                                                  i=1

We extend this denotation also for the formulas, that are not in prenex normal
form, meaning that the aforementioned operation (·)|(·) is applied to the prenex
normal form of the formula.

Theorem 1 (Checking Domain Independence).
Let φ(x̄) be a formula in FOL(C, P). Then φ(x̄) is domain independent iff

                                 Aσ(φ) |= ∀x̄ . φ(x̄) ↔ φ(x̄)|Adom σ(φ) .                                        (1)
Example 1. Let us consider the formula φ = ∃x. (A(x)∨B(a)) from [5] mentioned
in Section 1 and apply the theorem for it. This formula is not safe-range but it
is domain independent, because it is logically equivalent to (∃x. A(x)) ∨ B(a),
which is safe-range and, hence, domain independent. Such kind of formulas are
of particular interest for us.
    Since adom(σ(φ), M) = AM ∪ B M ∪ {aM }, because of the proposition ??
we have:
               Aσ(φ) |= ∀x. Adom σ(φ) (x) ↔ A(x) ∨ B(x) ∨ (x = a).
Then Aσ(φ) |= φ|Adom σ(φ) ↔ ∃x. (A(x) ∨ B(x) ∨ (x = a)) ∧ (A(x) ∨ B(a)). One
can see, that

     |= ∃x. (A(x) ∨ B(a)) ↔ ∃x. (A(x) ∨ B(x) ∨ (x = a)) ∧ (A(x) ∨ B(a)).

That is, Aσ(φ) |= φ ↔ φ|Adom σ(φ) . Therefore, by applying the theorem we proved,
that formula φ is domain independent.                                          


4   Conclusion
We proposed a method that allows to reduce the problem of checking domain
independence of a formula to checking standard first-order logic entailment: a
formula φ(x̄) from FOL(C, P) is domain independent whenever the entailment
(1) holds. Using this technique in particular makes sense when the formula to
be checked for domain independence is not safe-range. Thus, in practice this
method is useful and can be applied in any not safe-range decidable fragment
of FOL(C, P), where Aσ(φ) and ∀x̄ . φ(x̄) ↔ φ(x̄)|Adom σ(φ) can be expressed.
Discovering of such interesting fragments and application of the technique there
is our future work.
    We would like to thank Martín Rezk and Nhung Ngo for ideas and fruitful
discussions.


References
1. Avron, A.: Constructibility and decidability versus domain independence and abso-
   luteness. Theor. Comput. Sci. 394 (April 2008) 144–158
2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley
   (1995)
3. Franconi, E., Kerhet, V.: On checking domain independence (extended version).
   Technical Report KRDB12-3, KRDB Research Centre, Free University of Bozen-
   Bolzano (2012) http://www.inf.unibz.it/krdb/pub/TR/KRDB12-3.pdf.
4. Kleene, S.C.: Mathematical Logic. New York: Dover (2002)
5. Topor, R.W.: Domain-independent formulas and databases. Theor. Comput. Sci.
   52 (1987) 281–306