<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Italy lastname @inf.unibz.it</addr-line>
        </aff>
      </contrib-group>
      <fpage>114</fpage>
      <lpage>128</lpage>
      <abstract>
        <p>The logic of nulls in databases has been subject of investigation since their introduction in Codd's Relational Model, which is the foundation of the SQL standard. In the logic based approaches to modelling relational databases proposed so far, nulls are considered as representing unknown values. Such existential semantics fails to capture the behaviour of the SQL standard. We show that, according to Codd's Relational Model, a SQL null value represents a non-existing value; as a consequence no indeterminacy is introduced by SQL null values. In this paper we introduce an extension of rst-order logic accounting for predicates with missing arguments. We show that the domain independent fragment of this logic is equivalent to Codd's relational algebra with SQL nulls. Moreover, we prove a faithful encoding of the logic into standard rst-order logic, so that we can employ classical deduction machinery.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 2
R : a b
b N</p>
      <p>SELECT * FROM R
WHERE R.1 = R.1 AND R.2 = R.2 ;</p>
      <p>In SQL, the query above returns the table R if and only if the table R does not
have any null value, otherwise it returns just the tuples not containing a null
value, i.e., in this case only the rst tuple ha; bi. Informally this is due to the
fact that a SQL null value is never equal (or not equal) to anything, including
itself. How can we formally capture this behaviour?</p>
      <p>
        In this paper we introduce a formal semantics for SQL null values in order to
capture exactly the behaviour of SQL queries and SQL constraints in presence
of null values. We restrict our attention to the rst-order fragment of SQL { e.g.,
we do not consider aggregates, and both queries and constraints should be set
based (as opposed to bag/multi-set based): this fragment can be expressed, in
absence of null values, into the standard relational algebra RA. Note that the
standard relational algebra RA is more expressive than the null-free rst-order
fragment of SQL, since it can express relations of arity zero [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>To the best of our knowledge, there has been no attempt so far to formalise
in logic a relational algebra with proper SQL null values. It is well known that
SQL null values require a special semantics. Indeed, if they were treated as
standard database constants, the direct translation in the standard relational
algebra RA of the above SQL query would be equivalent to the identity expression
for R: 1=1 2=2 R, giving as an answer the table R itself, namely the tuples
fha; bi; hb; Nig. However, we have seen above that the expected answer to this
query is di erent.</p>
      <p>
        The most popular semantics in the literature for null values is the one
interpreting null values with an existential meaning, namely a null value denotes
an object which exists but has an unknown identity: this is the semantics of
naive tables (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). It is easy to see that also with this semantics, the direct
translation of the above SQL query in the standard relational algebra RA over
naive tables would be equivalent to the identity expression for R: 1=1 2=2 R,
giving as an answer the table R itself, namely the tuples fha; bi; hb; Nig. And
again, we have seen above that the expected answer to this query is di erent.
      </p>
      <p>This paper is organised as follows. We rst introduce three equivalent di
erent representations of databases with null values: they will be used in di erent
contexts to prove the equivalence of the di erent semantics given to SQL null
values. Then, we introduce the relational algebra with null values RAN, as an
obvious extension to the standard relational algebra RA and compatible with
the documents de ning SQL. We introduce an extension to standard rst-order
logic FOL, called FOL", which takes into account the possibility that some of
the arguments of a relation might not exist. We then show that FOL" is equally
expressive in a strong sense to the standard rst-order logic FOL, which allows
us to devise simple implementations of FOL" by relying to standard rst-order
provers. At the end, we prove our main result: RAN is equally expressive to the
domain independent fragment of FOL" with the standard name assumption, a
result which parallels in an elegant way the classical equivalence result by Codd
for the standard relational algebra RA and the domain independent fragment of
rst-order logic FOL with the standard name assumption.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Database Instances with Null Values</title>
      <p>We introduce here the notions of tuple, of relation, and of database instance in
presence of null values. We consider the unnamed (positional) perspective for the
attributes of tuples: elements of a tuple are identi ed by their position within
the tuple.</p>
      <p>Given a set of domain values , an n-tuple is a total function from integers
from 1 up to n (the position of the element within the tuple) into the set of
domain values augmented by the special term N (the null value): if we denote
the set fi 2 N j 1 i^i ng, possibly empty if n = 0, as [1 n], then an n-tuple
is a total function [1 n] 7! [ fNg. Note that the zero-tuple is represented
by a constant zero-ary function that we call fg. For example, the tuple hb; Ni is
(a) instance IN
(b) instance I"</p>
      <p>Ref1;2g=2 : ff1 7! a; 2 7! bgg</p>
      <p>Ref1g=1 : ff1 7! bgg
Ref2g=1 : fg
Refg=0 : fg
(c) instance I}
represented as f1 7! b; 2 7! Ng, while the zero-tuple hi is represented as fg. A
relation of arity n is a set of n-tuples; if we want to specify that n is the arity
of a given relation R, we write the relation as R=n. Note that a relation of arity
zero is either empty or it includes only the zero-tuple fg. A relational schema R
includes a set of relation symbols with their arities and a set of constant symbols
C. A database instance IN associates to each relation symbol R of arity n from
the relational schema R a set of n-tuples IN(R), and to each constant symbol
in C a domain value in . Usually, in relational databases all constant symbols
are among the domain values and are associated in the database instance to
themselves { this is called the Standard Name Assumption.</p>
      <p>As an example of a database instance IN consider Figure 2(a).</p>
      <p>In our work we consider two alternative representations of a given a database
instance IN, where null values do not appear explicitly: I" and I}. We will show
that they are isomorphic to IN. The representations share the same constant
symbols, domain values, and mappings from constant symbols to domain values.
The di erences are in the way null values are encoded within a tuple.</p>
      <p>Compared with a database instance IN, a corresponding database instance
I" di ers only in the way it represents n-tuples: an n-tuple is a partial function
from integers from 1 up to n into the set of domain values - the function is
unde ned exactly for those positional arguments which are otherwise de ned
and mapped to a null value in IN.</p>
      <p>That is, if R 2 R is an n-ary relation, then</p>
      <p>I"(R) = ft 2 ([1
n] 7! C) j 9t0 2 IN(R): t0(i) 6= N ! t(i) = t0(i) ^
t0(i) = N ! t(i) unde ned g</p>
      <sec id="sec-2-1">
        <title>Obviously, also the inverse correspondence exists:</title>
        <p>IN(R) = ft0 2 ([1
n] 7! C [ fNg) j 9t 2 I"(R): t(i) de ned ! t0(i) = t(i) ^
t(i) unde ned ! t0(i) = Ng
As an example of a database instance I" consider Figure 2(b). From the gure
is clear that this representation takes a di erent approach w.r.t. the standard
relational model: i.e., relations may include tuples of dishomogeneous arity. We
will introduce the third kind of representation in order to demonstrate that this
approach can be embedded into a standard relational model by considering an
extended vocabulary.</p>
        <p>
          Compared with a database instance I", a corresponding database instance
I} di ers in the way relation symbols are interpreted: a relation symbol of
arity n is associated to a set of novel relation symbols having the arity less
or equal to n. The main idea is that a tuple with null arguments ends up in
the corresponding relation which excludes those arguments (see Figure 2(c)).
The concept is analogous to the notion of lossless horizontal decomposition as
described, e.g., in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Given a database instance IN de ned over a relational schema R, the
corresponding database instance I} is de ned over the decomposed relational schema
Re: for each relation symbol R 2 R of arity n and for each (possibly empty)
subset of its positional arguments A [1 n], the decomposed relational schema
Re includes a predicate ReA with arity jAj. The correspondence between IN and
I} is based on the fact that each jAj-tuple in the relation I}(ReA) corresponds
exactly to the n-tuple in IN(R) having non-null values only in the positional
arguments in A, with the same values and in the same relative positional order.
That is, if R 2 R is an n-ary relation, then
Abusing the notation for the sake of simplicity, we assume that if k = 0 then
f1; : : : ; kg = fi1; : : : ; ikg = ;.</p>
        <p>In absence of null values, clearly the IN and I" representations of a database
instance coincide. The third representation cannot be directly compared because
of the di erent signature, but it can be noted that, in absence of null values, for
every n-ary relation R in R, I}(ReA) = ; if jAj &lt; n. Therefore for each n-ary
relation in R, IN(R) = I"(R) = I}(Re[1 n]).</p>
        <p>Given the discussed isomorphisms, in the following { whenever the di erence
in the representation of null values does not generate ambiguity { we will denote
as I the database instance represented in any of the above three forms IN, I",
or I}.</p>
        <sec id="sec-2-1-1">
          <title>Constant singleton - hvi - (where v 2 C)</title>
          <p>R(I) = IN(R):
hvi(I) = f1 7! vg:
Selection - i=v e, i=j e - (where v 2 C, ` is the arity of e, and i; j
`)
i=v e(I) = fs is a `-tuple j s 2 e(I) ^ s(i) = vg;
i=j e(I) = fs is a `-tuple j s 2 e(I) ^ s(i) = s(j)g:
Projection - i1;:::;ik e - (where ` is the arity of e, and fi1; : : : ; ikg
[1 : : : `])
i1;:::;ik e (I) = fs is a k-tuple j exists s0 2 e(I) s.t. for all 1
j
k: s(j) = s0(ij)g:
Cartesian product - e</p>
          <p>e0 - (where n; m are the arities of e; e0)
(e
e0)(I) = fs is a (n + m)-tuple j exists t 2 e(I); t0 2 e0(I) s.t.
for all 1 j n: s(j) = t(j) ^
for all 1 + n j (n + m): s(j) = t0(j
n)g:</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Union/Di erence - e [ e0 , e</title>
          <p>e0 - (where ` is the arity of e and e0)
(e [ e0)(I) = fs is a `-tuple j s 2 e(I) _ s 2 e0(I)g,
(e e0)(I) = fs is a `-tuple j s 2 e(I) ^ s 62 e0(I)g.</p>
          <p>Derived operators - (where v 2 C, `
i; j; i1; : : : ; i` m, and k1; : : : ; k` n)
min(m; n), m; n are the arities of e; e0,
i&lt;&gt;v e
We introduce in this Section the formal semantics of the relational algebra
dealing with null values, corresponding (modulo the zero-ary relations) to the
rstorder fragment of SQL.</p>
          <p>
            Let's rst recall the notation of the standard relational algebra RA (see, e.g.,
[
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] for details). Standard relational algebra expressions over a relational schema
R are built according to the inductive formation rules in the boxed expressions of
Figure 3. Also in Figure 3 the semantics of an algebra expression e is inductively
de ned as the transformation of database instances I { with the Standard Name
Assumption { to a set of tuples e(I).
          </p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Null singleton - hNi</title>
          <p>Selection - i=j e - (where ` is the arity of e, and i; j
`)
hNi(I) = f1 7! Ng:
i=j e(I) = fs is a `-tuple j s 2 e(I) ^ s(i) = s(j) ^ s(i) 6= N ^ s(j) 6= Ng:
Derived operators - (where v 2 C, `
i; j; i1; : : : ; i` m, and k1; : : : ; k` n)
min(m; n), m; n are the arities of e; e0,</p>
          <p>We now extend the standard relational algebra to deal with SQL nulls; we
refer to it as Null Relational Algebra (RAN).</p>
          <p>
            Null values have been introduced in the relational model since its inception. In
order to deal with null values, Codd [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] included the special null value in the
domain and adopted a three-valued logic having the third truth value unknown
together with true and false. The comparison expressions in Codd's algebra are
evaluated to unknown if they involve a null value, while in set operations, tuples
otherwise identical but containing null values are considered to be di erent. This
view is exactly what we nd in SQL where three-valued logic is used for the
evaluation of WHERE clauses but independent occurrences of the null value may be
identi ed in set operations { see, e.g., [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
          </p>
          <p>In order to de ne RAN from the standard relational algebra, we adopt the IN
representation of a database instance where null values are explicitly present as
possible elements of tuples, and with the Standard Name Assumption. Then, it
is enough to let equality (and inequality) fail whenever null values are involved,
since its evaluation would be unknown.</p>
          <p>
            All the RA expressions are valid RAN expressions, and maintain the same
semantics, with the only change in the semantics of the selection expressions i=j e
and i&lt;&gt;j e with equality or inequality among elements in a tuple: in these cases
the semantic de nitions make sure that the elements to be tested for equality
(or inequality) are both di erent from the null value in order for the equality (or
inequality) to succeed. The syntax of RAN expressions extends the standard RA
syntax only for the additional null singleton expression hNi (and for the derived
operators involving null values). Figure 4 introduces the syntax and semantics
of RAN expressed in terms of its di erence with RA, including the derived
operators which are added or de ned di erently in RAN. It is easy to show that the
de nition of the null relational algebra exactly matches the (informal) de nition
given to SQL with null values, and it generates the same practical behaviour [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
Given a tuple t and a RAN expression e, we call models of the expression e with
the answer t all the database instances I such that t 2 e(I).
          </p>
          <p>Sometimes an n-ary algebra expression is intended to express a boolean
statement over a database instance in the form of a denial constraint : this is done by
checking whether its evaluation over the database instance returns the empty
n-ary relation or not. An n-ary denial constraint e can always be reduced into
a zero-ary constraint, whose boolean value is meant to be true if its evaluation
contains the zero-tuple fg and false if it is empty, by considering its zero-ary
projection ( ;e).</p>
          <p>Example 1. Consider the relational schema fR=2; S=2g with the following data
and the following constraints expressed in RAN:
{ UNIQUE constraint for R:1: 1=3 26=4 (R R) = ;;
{ NOT-NULL constraint for R:1: isNull(1) R = ;;
{ UNIQUE constraint for S:1: 1=3 26=4 (S S) = ;,
{ FOREIGN KEY constraint from S:2 to R:1:
2 isNotNull(2) S
1 isNotNull(1) R= ;.</p>
          <p>It is easy to see that these constraints are all satis ed: they behave in the same
way as the corresponding SQL constraints with null values.</p>
          <p>Similarly, the query considered in the previous Section ( 1=1 2=2 R), now
behaves as in SQL and it returns correctly fha; big.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>First-order Logic with Null Values</title>
      <p>The Null Relational Calculus (FOL") is a rst-order logic language with an
explicit symbol N representing the null value, and where predicates denote tuples
over subsets of the arguments instead of just their whole set of arguments. It
extends classical rst-order logic in order to take into account the possibility
that some of the arguments of a relation might not exist.</p>
      <p>Given a set of predicate symbols each one associated with an arity together
with the special equality binary predicate =, and a set C of constants { together
forming the relational schema (or signature) R { and a set of variable symbols,
terms of FOL" are variables, constants, and the special null symbol N, and
formulae of FOL" are de ned by the following rules:
1. if t1; : : : ; tn are terms and R is a predicate symbol in R (di erent from the
equality) of arity n, R(t1; : : : ; tn) is an atomic formula;
2. if t1; t2 are terms di erent from N, =(t1; t2) is an atomic formula;
3. if ' and '0 are formulae, then :', ' ^ '0 and ' _ '0 are formulae;
4. if ' is a formula and x is a variable, then 9x' and 8x' are formulae.
Note that the syntax of FOL" corresponds to a classical rst-order logic with
equality, with the only addition of the special null symbol N which may appear
in atomic formulae (but not in equalities). As usual, a formula is ground if it
does not contain any variable symbol.</p>
      <p>7</p>
      <p>The semantics of FOL" formulae is given in terms of database instances of
type I", also called interpretations. As usual, an interpretation I" includes an
interpretation domain and it associates each relation symbol R of arity n in
the signature to a set of n-tuples I"(R) { i.e., a set of partial functions with
range in { and each constant symbol in C to a domain value in (we do
not consider here the Standard Name Assumption). The equality predicate is
interpreted as the classical equality over the domain . It is easy to see that if
n-tuples were just total functions, then an interpretation I" would correspond
exactly to a classical rst-order interpretation.</p>
      <p>The de nition of satisfaction and entailment in FOL" is exactly the same as
in classical rst-order logic with equality over the signature R, with the only
di erence lying on the truth value of atomic formulae which involve partial
functions instead of total functions because of the possible presence of null values
in atomic formulae. As usual, an interpretation I" satisfying a formula is called
a model of the formula.</p>
      <p>An interpretation I" and a valuation function for variable symbols satisfy
an atomic formula { written I"; j=FOL" R(t1; : : : ; tn) { i there is an n-tuple
2 I"(R) such that for each i 2 [1 n]: (i) = I"(c) if ti is a constant symbol
c, (i) = (ti) if ti is a variable symbol, and (i) is unde ned if ti = N.
It is easy to see that the satis ability of a FOL" formula without any occurrence
of the null symbol N doesn't depend on partial tuples; so its models can be
characterised by classical rst-order semantics: in each model the interpretation
of predicates would include only tuples represented as total functions.
Example 2. The models of the FOL" formula R(a; b) ^ R(b; N) are all the
interpretations I" such that I"(R) includes at least the tuples f1 7! a; 2 7! bg and
f1 7! bg.
4.1</p>
      <p>Characterisation in classical First-order Logic
Given a signature R, let's consider a classical rst-order logic language with
equality (FOL) over the decomposed signature Re, as it has been de ned in
Section 2; FOL has a classical semantics with models of type I} and it does not
deal with null values directly. In this Section we show that FOL" over R and
FOL over Re are equally expressive, namely that for every formula in FOL" over
the signature R there is a corresponding formula in FOL over the decomposed
signature Re, such that the two formulae have isomorphic models, and that for
every formula in FOL" oovveerr tthhee sdiegcnoamtuproeseRd,sisguncahtutrheatReththeetrweois
faorcmourrlaesephonavdeing formula in FOL
isomorphic models.</p>
      <p>As we discussed before, the isomorphism between the interpretations IN and
I} is based on the fact that each jAj-tuple in the relation I}(ReA) corresponds
exactly to the n-tuple in IN(R) having non-null values only in the positional
arguments speci ed in A, with the same values and in the same relative positional
order.</p>
      <p>In order to relate the two logics, we de ne a bijective translation function
f ( ) (and its inverse f 1( )) which maps FOL" formulae into FOL formulae
(and vice-versa).</p>
      <p>8
De nition 1 (Bijective translation f ).</p>
      <p>f ( ) is a bijective function from FOL" formulae over the signature R to FOL
formulae over the signature Re, de ned as follows.</p>
      <p>{ f (R(t1; : : : ; tn)) = Refi1;:::;ikg(ti1 ; : : : ; tik ),
where R 2 R an n-ary relation, fi1; : : : ; ikg = fj 2 [1 n] j ti is not Ng.
Obviously: f 1(Refi1;:::;ikg(ti1 ; : : : ; tik )) = R(t01; : : : ; t0n),
where fi1; : : : ; ikg [1 n] and t0ij = tj for j = 1; : : : ; k , and t0j is N for
j 2 [1 n] n fi1; : : : ; ikg.</p>
      <p>In both the direct and inverse cases we assume i1; : : : ; ik in ascending order.
{ The translation of equality atoms and non atomic formulae is the identity
transformation inductively de ned on top of the above translation of atomic
formulae.</p>
      <p>It is easy to see that the size of a translated formula (via either the f ( ) or the
f 1( ) translation) is linearly bounded by the size of the input formula, assuming
the signatures R and Re xed; the signature Re can be exponentially larger than
the signature R, the exponent being the maximum arity of the relations in R.
Example 3. The FOL" formula 9x:R(a; x)^R(x; N)^R(N; N) over the signature
R is translated as the FOL formula 9x:Ref1;2g(a; x) ^ Ref1g(x) ^ Refg over the
decomposed signature Re, and vice-versa.</p>
      <p>Let's show now that the above bijective translation preserves the models of
the formulae, modulo the isomorphism among models presented in Section 2.
Theorem 1. Let ' be a FOL" formula over the signature R, and 'e a FOL
formula over the signature Re. Then for any (database) instance I:
I"; j=FOL" '
the formulae ' and"'ea.nSdinFceOsLy,nwtaexsahnodwstehmaatntthicessotaftneomne-nattohmoilcd ffoorrmtuhleaeatiso mthiec
same for both FOL
cases. The case of equality can be easily proved by noticing that N is not allowed
to appear among its arguments.</p>
      <p>We focus on ground atomic formulae, since the valuation function for variable
symbols is the same in the pre- and the post-conditions.
1. Let R(t1; : : : ; tn) be a FOL" ground atomic formula where A = fi1; : : : ; ikg
[1 n] is the set of its positional arguments for which ti is not N. Then
f (R(t1; : : : ; tn)) = Re(ti1 ; : : : ; tik ).</p>
      <p>By the assumption I"; (j)j==FOLI""(Rtj()t1f;o:r: :j; tn2); Ath,earnefdore there is a tuple
2 I"(R) such that is unde ned for
j 62 A. By de nition of I}, ( (i1); : : : ; (ik)) 2 I}(Re) therefore I}; j=
ReA(ti1 ; : : : ; tik ).</p>
      <p>9
2. Let Refi1;:::;ikg(t1; : : : ; tk) be a FOL" ground atomic formula with fi1; : : : ; ikg
[1 n]; then f 1(Refi1;:::;ikg(t1; : : : ; tk)) = R(t01; : : : ; t0n), where fi1; : : : ; ikg
[1 n] and t0ij = tj for j = 1; : : : ; k and t0j = N for j 2 [1 n] n fi1; : : : ; ikg.
By assumption I}; j= Refi1;:::;ikg(t1; : : : ; tk), therefore (I}(t1); : : : ; I}(tk)) 2
I}(Refi1;:::;ikg). By de nition of I" there is a tuple 2 I"(R) s.t. (ij ) =
I}(tj ) for j 2 fi1; : : : ; ikg and unde ned otherwise. Therefore I"; j=FOL"
R(t01; : : : ; t0n). tu</p>
      <p>Domain Independent fragment of FOL" with Standard Names
In this Section we introduce the domain independent fragment of FOL" with the
Standard Name Assumption, and we analyse its properties.</p>
      <p>De nition 2 (Domain Independence). A FOL" closed formula ' is domain
independent if for every two interpretations I = h I ; I( )i and J = h J ; J ( )i,
which agree on the interpretation of relation symbols and constant symbols { i.e.
I( ) = J ( ) { but disagree on the interpretation domains I and J :
I j= '
The domain independent fragment of FOL" includes only the domain
independent formulae of FOL".</p>
      <p>
        It is easy to see that the domain independent fragment of FOL" can be
characterised with the safe-range syntactic fragment of FOL": intuitively, a formula
is safe-range if and only if its variables are bounded by positive predicates or
equalities { for the exact syntactical de nition see, e.g., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Due to the strong
semantic equivalence expressed in Theorem 1 and due to the fact the the
bijection f ( ) preserves the syntactic structure of the formulae, we can reuse the
results about safe-range transformations and domain independence usually
holding for classical FOL. To check whether a formula is safe-range, the formula is
transformed into a logically equivalent safe-range normal form and its range
restriction is computed according to a set of syntax based rules; the range
restriction of a formula is a subset of its free variables, and if it coincides with the
free variables then the formula is said to be safe-range [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In addition to that,
the following theorem on domain independence holds.
      </p>
      <p>Theorem 2. Any safe-range FOL" formula is domain independent, and any
domain independent FOL" formula can be transformed into a logically equivalent
safe-range FOL" formula.</p>
      <p>We focus now on the domain independent fragment of FOL" with the
Standard Name Assumption. We observe that an interpretation is a model of a
formula in the domain independent fragment of FOL" with the Standard Name
Assumption if and only if the interpretation which agrees on the interpretation
of relation and constant symbols but with the interpretation domain equal to
the set of standard names C is a model of the formula. Therefore, in the following
when dealing with the domain independent fragment of FOL" with the Standard
10
Name Assumption we can just consider interpretations with the interpretation
domain equal to C.</p>
      <p>We now show that we can weaken the Standard Name Assumption by just
assuming Unique Names, without changing the certain answers. An
interpretation I satis es the Unique Name Assumption if I(a) 6= I(b) for any di erent
a; b 2 C. We observe that an interpretation is a model of a FOL" formula with
the Standard Name Assumption if and only if the interpretation obtained by
homomorphically transform the standard names with arbitrary domain elements
is a model of the formula; this latter interpretation clearly satis es the Unique
Name Assumption. This observation allows us to freely interchange the Standard
Name and the Unique Name Assumptions; this is of practical advantage, since
we can easily encode the Unique Name Assumption in a classical rst-order logic
reasoner, and most description logics reasoners do have a native Unique Name
Assumption.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Equivalence of Algebra and Calculus</title>
      <p>In this Section we prove the strong relation between RAN and FOL".
Theorem 3. The RAN relational algebra with nulls and the domain independent
fragment of FOL" with the Standard Name Assumption are equally expressive.</p>
      <p>The notion of equal expressivity is captured by the following two propositions.
Proposition 1. Let e be an arbitrary RAN expression of arity n, and t an
arbitrary n-tuple as a total function with values taken from the set C [ fNg. There
is a function (e; t) translating e with respect to t into a closed safe-range FOL"
formula, such that for any instance I with the Standard Name Assumption:
t 2 e(IN) if and only if</p>
      <p>I" j=FOL"
(e; t):
Proposition 2. Let ' be an arbitrary safe-range FOL" closed formula. There
is a RAN expression e, such that for any instance I with the Standard Name
Assumption:</p>
      <p>I" j=FOL" '</p>
      <p>if and only if e(IN) 6= ;:</p>
      <p>This means that there exists a reduction from the membership problem of
a tuple in the answer of a RAN expression over a database instance with null
values into the satis ability problem of a closed safe-range FOL" formula over the
same database (modulo the isomorphism among database instances presented
in Section 2); and there exists a reduction from the satis ability problem of a
closed safe-range FOL" formula over a database instance with null values into
the emptiness problem of the answer of a RAN expression over the same database
(modulo the isomorphism among database instances).</p>
      <p>Example 4. Given some database instance, checking whether the tuple ha; bi or
the tuple hb; Ni are in the answer of the RAN query 1=1 2=2 R (discussed in
Section 1) corresponds to check the satis ability over the database instance of
11
the FOL" closed safe-range formula R(a; b) in the former case, or of the formula
false in the latter case. You can observe that, as expected, also when translated
in rst-order logic, it turns out that the tuple hb; Ni is not in the answer of the
query 1=1 2=2 R. A constructive way to get the FOL" formulae is discussed
in Section 5.1.</p>
      <p>Example 5. Let's consider the "UNIQUE constraint for R:1" from Example 1
expressed in RAN as 1=3 26=4 (R R) = ;. The RAN expression is translated
as the closed safe-range FOL" formula: 8x; y; z: R(x; y) ^ R(x; z) ! y = z, which
corresponds to the classical way to express a unique constraint in rst-order
logic. A constructive way to get the FOL" formula is discussed in Section 5.1.
Example 6. The safe-range closed FOL" formula 9x:R(x; N) is translated as the
zero-ary RAN statement isNotNull(1) isNull(2) R 6= ;. A constructive way to get
the RAN statement is discussed in Section 5.2.</p>
      <p>Example 7. Let's consider what would be the classical way to express the
\FOREIGN KEY from S:2 to R:1" constraint from Example 1 in rst-order logic:
8x; y: S(x; y) ! 9z: R(y; z) :9y: (9x: S(x; y)) ^ :(9z: R(y; z)).
The formula is translated as the RAN statement: 2 isNotNull(2) S 1 isNotNull(1) R =
;, which is the RAN statement we considered in Example 1. A constructive way
to get the RAN statement is discussed in Section 5.2.
5.1</p>
      <p>From Algebra to Calculus
To show that the safe-range fragment of FOL" with the Standard Name
Assumption captures the expressivity of RAN queries, we rst de ne explicitly a
translation function which maps RAN expressions into safe-range FOL" formulae.
De nition 3 (From RAN to safe-range FOL"). Let e be an arbitrary RAN
expression, and t an arbitrary tuple of the same arity as e, as a total function
with values taken from the set C [ fNg [ V, where V is a countable set of variable
symbols. The function (e; t) translates e with respect to t into a FOL" formula
according to the following inductive de nition:
{ for any R=` 2 R,</p>
      <p>(=(t(1); v) if t(1) 6= N
{ (hvi; t) ;
false otherwise</p>
      <p>(T; t) ; R(t(1); : : : ; t(`))
{
{
{
(hNi; t) ;
( i = v e; t) ;
( i = j e; t) ;
(false if t(1) 6= N
true otherwise
( (e; t) ^ =(t(i); v) if t(i) 6= N</p>
      <p>false otherwise
( (e; t) ^ =(t(i); t(j)) if t(i); t(j) 6= N
false otherwise</p>
      <p>12
{
{</p>
      <p>( i1 ik e; t) ; 9x1 xn WH f1 ngnfi1 ikg (e; tH )
where xi are fresh variable symbols and tH is a sequence of n terms de ned
as:</p>
      <p>: &lt;&gt;8t(i) if i 2 fi1; : : : ; ikg
tH (i) = N if i 2 H</p>
      <p>&gt;:xi otherwise
(e1 e2; t) ; (e1; t) ^ (e2; t0)
where n1; n2 are the arity of e1; e2 respectively,
and t0 is a n2-ary tuple function s.t. t0(i) = t(n1 + i) for 1
i
n2
(e1 [ e2; t) ;
(e1 e2; t) ;
(e1; t) _ (e2; t)
(e1; t) ^ : (e2; t)</p>
      <sec id="sec-4-1">
        <title>We can now proceed with the proof of Proposition 1.</title>
        <p>Proof (Proposition 1). We prove the proposition by induction on the expression
e that t 2 e(IN) i I" j=FOL" (e; t). For the sake of brevity we demonstrate
the arguments for two of the signi cant cases.</p>
        <p>Let e be R 2 R of arity n and t a tuple s.t. t 2 R(IN) where fi1; : : : ; ikg are
the indexes for which t(i) = N. By De nition 3, (R; t) ; R(t). Since t 2 R(IN),
then t 2 IN(R), so there is a partial function t0 2 I"(R) s.t. t0(i) is unde ned
for i 2 fi1; : : : ; ikg and t0(i) = t(i) otherwise. Therefore I" j=FOL" R(t). On the
other hand, if I" j=FOL" R(t) then there is a partial function t0 2 I"(R) s.t.
t0(i) is unde ned for i 2 fi1; : : : ; ikg and t0(i) = I"(t(i)) otherwise. Since I is an
interpretation with the Standard Name Assumption, I"(t(i)) = t(i); therefore
t 2 IN(R) and t 2 R(IN).</p>
        <p>Let t 2 ( i1 ik e)(IN), then there is t0 of arity n s.t. t0 2 e(IN) and t(j) =
t0(ij ) for i = 1 : : : k. By the recursive hypothesis, I" j=FOL" (e; t0). Let H0
[1 n] n fi1; ; ikg be the sets of indexes for which t0( ) is N; then I" j=FOL"
9x1 xn (e; tH0 ), so I" j=FOL" 9x1 xn WH f1 ngnfi1 ikg (e; tH ) because
H0 is one of these disjuncts. For the other direction, let us assume that I" j=FOL"
( i1 ik e; t), then I" j=FOL" 9x1 xn WH [1 n]nfi1 ikg (e; tH ), so there is
H0 [1 n] n fi1 ikg and assignment for variables in tH0 s.t. I" j=FOL"
(e; (tH0 )). By the inductive hypothesis there is a tuple t0 corresponding to
(tH0 ) s.t. t0 2 e(IN); by construction of tH0 , t(i) = (tH0 (i)) for i 2 fi1; : : : ; ikg
therefore t 2 i1 ik e (IN). tu
5.2</p>
        <p>From Calculus to Algebra
To show that RAN queries capture the expressivity of the safe-range fragment of
FOL" with the Standard Name Assumption, we rst establish two intermediate
results supporting that FOL" queries can be expressed in (standard) relational
algebra.</p>
        <p>Lemma 1. Let e be an expression in the standard relational algebra RA over the
e
decomposed relational schema Re. There is a RAN expression e over the relational
schema R, such that for any instance I:
ee(I}) = e(IN)</p>
        <p>13
Proof. The RAN expression e is the same expression ee where each basic relation
Refi1;:::;ikg with R of arity n is substituted by the expression</p>
        <p>Refi1;:::;ikg ;</p>
        <p>i1;:::;ik ( isNotNull(fi1;:::;ikg) ( isNull([1 n]nfi1;:::;ikg) R))
where isNull(i1;:::;ik) e is a shorthand for isNull(i1) : : : isNull(ik) e.</p>
        <p>The proof is by induction on the expression e where the basic cases are the
unary singleton - which is the same as RAN since there are no nulls - and any
basic relation Refi1;:::;ikg where R 2 R an n-ary relation. Compound expressions
satisfy the equality because of the inductive assumption and the fact that the
algebraic operators (in absence of null values) in RAN and RA have the same
de nition.</p>
        <p>Therefore we just need to show that</p>
        <p>Refi1;:::;ikg(I}) = i1;:::;ik ( isNotNull(i1;:::;ik) ( isNull(j1;:::;j`) R))(IN)
however, is true by the de nition of I} and IN as shown in Section 2:
Lemma 2. Let be a safe-range FOL closed formula. There is an RA
expression e of arity zero, such that for any instance I} with the Standard Name
e
Assumption:</p>
        <p>I} j=FOL</p>
        <p>
          if and only if ee(I}) 6= ;
Proof. Since null values are not present neither in FOL nor RA, the lemma
derives from the equivalence between safe-range relational calculus queries and
relational algebra (see e.g. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). tu
        </p>
        <p>Now we can prove Proposition 2 using the above lemmata.</p>
        <p>Proof (Proposition 2). Given a safe-range FOL" closed formula ', by Theorem 1,
there is a safe-range FOL formula 'e such that I" j=FOL" ' i I} j=FOL 'e.
By restricting to instances with the Standard Name Assumption, we can use
Lemma 2 to conclude that there is an RA expression ee of arity zero such that
I} j=FOL 'e i ee(I}) 6= ;; therefore I" j=FOL" ' i ee(I}) 6= ;
Finally we use Lemma 1 to show that there is an RAN expression e such that
ee(I}) 6= ; i e(IN) 6= ;; which enables us to conclude that I" j=FOL" ' i
e(IN) 6= ;. tu
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        Since their inception, SQL null values have been at the centre of long discussions
about their real meaning and their formal semantics (see, e.g., the recent thread
14
in SIGMOD Record [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ]). The vast majority of logic based approaches consider
nulls as values with an unknown interpretation (i.e., a value exists but it is not
known) and they model them as existential variables (e.g. naive tables [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
all the works inspired by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). In spite of the fact that these works have their
merits and provide a well founded characterisation of incomplete information in
databases, they diverge from SQL standard so they are outside the scope of our
approach. We have shown that null values { when de ned as in SQL with the
three-valued logic { should be interpreted as nonexistent values, and that such
null values do not introduce any incompleteness in the data.
      </p>
      <p>To the best of our knowledge this is the rst proposal to characterise the
SQL semantics of null values by means of relational calculus. Moreover, we do
properly extend Codd's theorem, stating the equivalence of the relational algebra
with the relational calculus, to the case in which null values with the original
SQL semantics are added to the languages.</p>
      <p>
        It is worth to notice that the use of sub-relations to model the absence of
values (shown in Section 4.1) is similar in spirit to the horizontal decomposition
summarised e.g. in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, classically horizontal decomposition is more a
modelling paradigm and must be performed \by hand". In our case, the
decomposition is transparent to the modeller/user; in fact it is rather a technical mean
in order to reduce our novel logic FOL" to standard rst-order logic FOL.
      </p>
      <p>
        The SQL semantics for null values is also re ected in the satis ability of
integrity constraints (e.g. foreign keys), expressed usually as denial constraints
using the query algebra itself. However, SQL:1999 introduced the possibility of
specifying alternative semantics for the constraints (not based on the actual
evaluation of SQL queries). In particular, foreign keys may be given optionally a
semantic where null values would be interpreted as unknown values as opposed
to nonexistent (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). The formalisation of this alternative semantics of null
values only within constraints (but not queries) will require some more work.
Note that commercial relational database systems do not support this extension.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Date</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darwen</surname>
          </string-name>
          , H.:
          <article-title>Database Explorations: Essays on the Third Manifesto</article-title>
          and
          <string-name>
            <given-names>Related</given-names>
            <surname>Topics</surname>
          </string-name>
          .
          <source>Tra ord Publishing</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Imielinski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipski</surname>
          </string-name>
          , Jr., W.:
          <article-title>Incomplete information in relational databases</article-title>
          .
          <source>J. ACM</source>
          <volume>31</volume>
          (
          <issue>4</issue>
          ) (
          <year>September 1984</year>
          )
          <volume>761</volume>
          {
          <fpage>791</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Abiteboul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hull</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vianu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          : Foundations of Databases. Addison-Wesley (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Codd</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          :
          <article-title>Extending the database relational model to capture more meaning</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .
          <volume>4</volume>
          (
          <issue>4</issue>
          ) (
          <year>1979</year>
          )
          <volume>397</volume>
          {
          <fpage>434</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Date</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>An Introduction to Database Systems. 8 edn</article-title>
          . Addison-Wesley (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rubinson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Nulls, three-valued logic, and ambiguity in SQL: critiquing Date's critique</article-title>
          .
          <source>SIGMOD Record</source>
          <volume>36</volume>
          (
          <issue>4</issue>
          ) (
          <year>December 2007</year>
          )
          <volume>13</volume>
          {
          <fpage>17</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Grant</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Null values in SQL</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>37</volume>
          (
          <issue>3</issue>
          ) (
          <year>September 2008</year>
          )
          <volume>23</volume>
          {
          <fpage>25</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Database relations with null values</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          <volume>28</volume>
          (
          <issue>1</issue>
          ) (
          <year>1984</year>
          )
          <volume>142</volume>
          {
          <fpage>166</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Turker,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Gertz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Semantic integrity support in SQL:1999 and commercial (object-)relational database management systems</article-title>
          .
          <source>The VLDB Journal</source>
          <volume>10</volume>
          (
          <issue>4</issue>
          ) (
          <year>2001</year>
          )
          <volume>241</volume>
          {
          <fpage>269</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>