Lossless Selection Views under Constraints Ingo Feinerer1 , Enrico Franconi2 , and Paolo Guagliardo2 1 Vienna University of Technology feinerer@logic.at 2 Free University of Bozen-Bolzano {franconi,guagliardo}@inf.unibz.it 1 Introduction The problem of updating a database through a set of views consists in propagat- ing updates of the views to the base relations over which the view relations are defined, so that the changes to the database reflect exactly those to the views. This is a classical problem in database research, known as the view update prob- lem [1], which in recent years has received renewed attention. View updates can be consistently propagated in an unambiguous way under the condition that the mapping between database and view relations is lossless, and that each database relation can be expressed in terms of the view relations by means of a query, in much the same way the latter are defined from the former [3]. In such a context, database decompositions play an important role, in that their losslessness is associated with the existence of an explicit reconstruction operator that prescribes how a database relation can be rebuilt from the pieces into which it has been decomposed. In the case of horizontal decomposition, in which a relation is split into sub-relations containing each a subset of its rows, the reconstruction operator is the union. The study of horizontal decomposition in the literature has mostly focused on settings where data values can only be compared for equality. However, most real- world applications make use of data values coming from domains with a richer structure (e.g., ordering) on which a variety of other restrictions besides equality can be expressed (e.g., that of being within a range or above a threshold). It is therefore of practical interest to consider a scenario where some of the attributes in the database schema are interpreted over a specific domain, such as the reals or the integers, on which a set of special predicates (e.g., smaller/greater than) and functions (e.g., addition/subtraction) are defined, according to a first-order language C. We consider horizontal decomposition in such a setting and invest- igate its losslessness in the presence of integrity constraints. The only restriction we impose on the language C is that of being closed under negation. 2 Fundamentals We consider a source relation symbol R under so-called conditional domain con- straints (CDCs) of the form ∀x, y . R(x, y) ∧ λ(x) → δ(y) , (1) where λ(x) is a Boolean combination of equalities x = a between a variable from x and a constant, and δ(y) ∈ C. The variables in x and y are associated with non-interpreted and interpreted positions (that is, attributes) of R, respectively. By means of formulae in C, CDCs restrict the admissible values at interpreted positions, whenever a certain condition holds on the non-interpreted ones. We consider a decomposed schema V consisting of view symbols having the same arity as R, each of which is defined by a formula of the form  ∀x, y . V (x, y) ↔ R(x, y) ∧ λ(x) op σ(y) , (2) where λ(x) is as in (1), σ(y) ∈ C, and op is ∧. A set Σ of formulae of the form (2), one for each V ∈ V, specifies a horizontal decomposition of R. 3 Losslessness Lossless horizontal decomposition under CDCs can be characterised in terms of unsatisfiability in C, which gives a decision procedure for losslessness whenever the satisfiability of sets of formulae in C is decidable. The main idea is as follows: 1) With each equality between a variable and a constant a we associate a pro- positional variable pai , whose truth-value indicates whether the value at (non- interpreted) position i is a. 2) For a given valuation α of such propositional variables, we consider the set consisting of the C-formulae in the r.h.s. of all the CDCs that are “applicable” (i.e., the propositional representation of their l.h.s. is true) under α and the negation of any C-formula δ(y) appearing in some selection in Σ where the propositional representation of λ(x) is satisfied by α. 3) Checking losslessness is equivalent to checking that there exists no valuation α for which the above set of C-formulae is satisfiable. The class of Unit Two-Variable Per Inequality formulae (UTVPIs) is a frag- ment of linear arithmetic over the integers for which the satisfiability problem is decidable in polynomial time [4]. UTVPI have the form ay+by 0 ≤ d, where y and y 0 are integer variables, a, b ∈ {−1, 0, 1} and d ∈ Z. We refer to Boolean combin- ations of UTVPIs as BUTVPIs; the satisfiability problem for sets of BUTVPIs is NP-complete [5]. It can be shown that, when C is the language of either UTVPIs or BUTVPIs, the problem of deciding losslessness is co-NP-complete. 4 Separability We also investigated lossless horizontal decomposition under CDCs in combina- tion with traditional integrity constraints, showing that functional dependencies (FDs) do not interact with CDCs and can therefore be allowed without any re- striction, whereas this is not the case for unary inclusion dependencies (UINDs). The main technical tool we employ in our study is the notion of separability: a class of constraints is separable from CDCs if, after making explicit the result of their interaction, captured by a suitable finite set S of sound inference rules, we can focus solely on CDCs and disregard the other constraints, as far as lossless 2 horizontal decomposition is concerned. Given a horizontal decomposition Σ and a combination ∆ of CDCs with other constraints that are separable from them w.r.t. S, we can check whether Σ is lossless under ∆ as follows: 1) compute the deductive closure ∆∗ of ∆ w.r.t. S, which makes explicit the interaction between CDCs and the other constraints in ∆ by adding entailed constraints; 2) by using the technique of Section 3, check whether Σ is lossless under the set cdc(∆∗ ) obtained from ∆∗ by discarding all of the constraints that are not CDCs. A UIND R[i] ⊆ R[j] is satisfied if, for every source instance I, every value in the i-th column of the extension RI of R under I appears also in the j-th column of RI . Let yi and yj be the variables associated with the interpreted positions i and j of R, respectively; by means of the following domain propagation rule > → δ(yj ) R[i] ⊆ R[j] , (dp) > → δ(yi ) it is possible to derive a set of CDCs that fully captures the interaction between a given set of UINDs and opportunely restricted CDCs w.r.t. lossless horizontal decomposition, which makes possible to employ the general technique for decid- ing losslessness also in the presence of UINDs. 5 Discussion The results on losslessness and separability briefly reported in this short paper are under revision for journal publication; a preliminary version appeared in [2]. We remark that our technique for checking losslessness can also be applied when op in (2) is ∨. We are currently investigating the possibility of generalising the separability results for UINDs to arbitrary inclusion dependencies (INDs). We also strongly believe that equalities between non-interpreted variables (i.e., from x) in (1) and (2) can be allowed and our technique extended by representing such equalities by means of additional propositional variables and by adding suitable propositional axioms to handle transitivity and symmetry. References 1. Bancilhon, F., Spyratos, N.: Update semantics of relational views. ACM Trans. Database Syst. 6(4), 557–575 (Dec 1981) 2. Feinerer, I., Franconi, E., Guagliardo, P.: Lossless horizontal decomposition with domain constraints on interpreted attributes. In: Big Data – Proc. of BNCOD 2013, LNCS, vol. 7968, pp. 77–91. Springer (2013) 3. Franconi, E., Guagliardo, P.: Effectively updatable conjunctive views. In: Proc. of AMW 2013. CEUR Workshop Proceedings, vol. 1087. CEUR-WS.org (2013) 4. Schutt, A., Stuckey, P.J.: Incremental satisfiability and implication for UTVPI con- straints. INFORMS Journal on Computing 22(4), 514–527 (2010) 5. Seshia, S.A., Subramani, K., Bryant, R.E.: On solving boolean combinations of UTVPI constraints. JSAT 3(1-2), 67–90 (2007) 3