Introduction

Lossless Selection Views under Constraints

Ingo Feinerer

feinerer@logic.at 1

Enrico Franconi

Paolo Guagliardo

guagliardog@inf.unibz.it 0 0 Free University of Bozen-Bolzano 1 Vienna University of Technology

The problem of updating a database through a set of views consists in propagating updates of the views to the base relations over which the view relations are de ned, so that the changes to the database re ect exactly those to the views. This is a classical problem in database research, known as the view update problem [1], which in recent years has received renewed attention. View updates can be consistently propagated in an unambiguous way under the condition that the mapping between database and view relations is lossless, and that each database relation can be expressed in terms of the view relations by means of a query, in much the same way the latter are de ned from the former [3]. In such a context, database decompositions play an important role, in that their losslessness is associated with the existence of an explicit reconstruction operator that prescribes how a database relation can be rebuilt from the pieces into which it has been decomposed. In the case of horizontal decomposition, in which a relation is split into sub-relations containing each a subset of its rows, the reconstruction operator is the union. The study of horizontal decomposition in the literature has mostly focused on settings where data values can only be compared for equality. However, most realworld applications make use of data values coming from domains with a richer structure (e.g., ordering) on which a variety of other restrictions besides equality can be expressed (e.g., that of being within a range or above a threshold). It is therefore of practical interest to consider a scenario where some of the attributes in the database schema are interpreted over a speci c domain, such as the reals or the integers, on which a set of special predicates (e.g., smaller/greater than) and functions (e.g., addition/subtraction) are de ned, according to a rst-order language C. We consider horizontal decomposition in such a setting and investigate its losslessness in the presence of integrity constraints. The only restriction we impose on the language C is that of being closed under negation.

Introduction

We consider a source relation symbol R under so-called conditional domain constraints (CDCs) of the form 8x; y : R(x; y) ^ (x) ! (y) ; where (x) is a Boolean combination of equalities x = a between a variable from x and a constant, and (y) 2 C. The variables in x and y are associated with non-interpreted and interpreted positions (that is, attributes) of R, respectively. By means of formulae in C, CDCs restrict the admissible values at interpreted positions, whenever a certain condition holds on the non-interpreted ones.

We consider a decomposed schema V consisting of view symbols having the same arity as R, each of which is de ned by a formula of the form 8x; y : V (x; y) $ R(x; y) ^ (x) op (y) ; ( 2 ) where (x) is as in ( 1 ), (y) 2 C, and op is ^. A set of formulae of the form ( 2 ), one for each V 2 V, speci es a horizontal decomposition of R. 3

Losslessness

Lossless horizontal decomposition under CDCs can be characterised in terms of unsatis ability in C, which gives a decision procedure for losslessness whenever the satis ability of sets of formulae in C is decidable. The main idea is as follows: 1) With each equality between a variable and a constant a we associate a propositional variable pia, whose truth-value indicates whether the value at (noninterpreted) position i is a. 2) For a given valuation of such propositional variables, we consider the set consisting of the C-formulae in the r.h.s. of all the CDCs that are \applicable" (i.e., the propositional representation of their l.h.s. is true) under and the negation of any C-formula (y) appearing in some selection in where the propositional representation of (x) is satis ed by . 3) Checking losslessness is equivalent to checking that there exists no valuation for which the above set of C-formulae is satis able.

The class of Unit Two-Variable Per Inequality formulae (UTVPIs) is a fragment of linear arithmetic over the integers for which the satis ability problem is decidable in polynomial time [4]. UTVPI have the form ay +by0 d, where y and y0 are integer variables, a; b 2 f 1; 0; 1g and d 2 Z. We refer to Boolean combinations of UTVPIs as BUTVPIs; the satis ability problem for sets of BUTVPIs is NP-complete [5]. It can be shown that, when C is the language of either UTVPIs or BUTVPIs, the problem of deciding losslessness is co-NP-complete. 4

Separability

We also investigated lossless horizontal decomposition under CDCs in combination with traditional integrity constraints, showing that functional dependencies (FDs) do not interact with CDCs and can therefore be allowed without any restriction, whereas this is not the case for unary inclusion dependencies (UINDs).

The main technical tool we employ in our study is the notion of separability : a class of constraints is separable from CDCs if, after making explicit the result of their interaction, captured by a suitable nite set S of sound inference rules, we can focus solely on CDCs and disregard the other constraints, as far as lossless horizontal decomposition is concerned. Given a horizontal decomposition and a combination of CDCs with other constraints that are separable from them w.r.t. S, we can check whether is lossless under as follows: 1) compute the deductive closure of w.r.t. S, which makes explicit the interaction between CDCs and the other constraints in by adding entailed constraints; 2) by using the technique of Section 3, check whether is lossless under the set cdc( ) obtained from by discarding all of the constraints that are not CDCs.

A UIND R[i] R[j] is satis ed if, for every source instance I, every value in the i-th column of the extension RI of R under I appears also in the j-th column of RI . Let yi and yj be the variables associated with the interpreted positions i and j of R, respectively; by means of the following domain propagation rule > ! (yj )

R[i]

R[j]

; (yi) > ! it is possible to derive a set of CDCs that fully captures the interaction between a given set of UINDs and opportunely restricted CDCs w.r.t. lossless horizontal decomposition, which makes possible to employ the general technique for deciding losslessness also in the presence of UINDs. The results on losslessness and separability brie y reported in this short paper are under revision for journal publication; a preliminary version appeared in [2]. We remark that our technique for checking losslessness can also be applied when op in ( 2 ) is _.

We are currently investigating the possibility of generalising the separability results for UINDs to arbitrary inclusion dependencies (INDs). We also strongly believe that equalities between non-interpreted variables (i.e., from x) in ( 1 ) and ( 2 ) can be allowed and our technique extended by representing such equalities by means of additional propositional variables and by adding suitable propositional axioms to handle transitivity and symmetry.

1. Bancilhon , F. , Spyratos , N.: Update semantics of relational views . ACM Trans. Database Syst . 6 ( 4 ), 557 {575 (Dec 1981 )

2. Feinerer , I. , Franconi , E. , Guagliardo , P. : Lossless horizontal decomposition with domain constraints on interpreted attributes . In: Big Data { Proc. of BNCOD 2013 , LNCS , vol. 7968 , pp. 77 { 91 . Springer ( 2013 )

3. Franconi , E. , Guagliardo , P.: E ectively updatable conjunctive views . In: Proc. of AMW 2013. CEUR Workshop Proceedings , vol. 1087 . CEUR-WS.org ( 2013 )

4. Schutt , A. , Stuckey , P.J.: Incremental satis ability and implication for UTVPI constraints . INFORMS Journal on Computing 22 ( 4 ), 514 { 527 ( 2010 )

5. Seshia , S.A. , Subramani , K. , Bryant , R.E.: On solving boolean combinations of UTVPI constraints . JSAT 3 ( 1-2 ), 67 { 90 ( 2007 )