=Paper= {{Paper |id=Vol-1087/paper9 |storemode=property |title=Efficient Updates of Uncertain Databases |pdfUrl=https://ceur-ws.org/Vol-1087/paper9.pdf |volume=Vol-1087 |dblpUrl=https://dblp.org/rec/conf/amw/HubmerPSS13 }} ==Efficient Updates of Uncertain Databases== https://ceur-ws.org/Vol-1087/paper9.pdf
      Efficient Updates of Uncertain Databases
                            Extended Abstract

 Andreas Hubmer, Reinhard Pichler, Vadim Savenkov, and Sebastian Skritek

                         Vienna University of Technology
                      andreas.hubmer@alumni.tuwien.ac.at,
                 {pichler|savenkov|skritek}@dbai.tuwien.ac.at



1   Introduction

Uncertain databases have evolved as an active area of database research, sur-
veyed, for example, in [10].The possible worlds semantics [1] is commonly used
to deal with uncertain data. Several representation systems for uncertain data-
bases have been proposed to provide efficient storage and query facilities for a
potentially big number of possible worlds, see, e.g., [14,4,8]. Antova et al. intro-
duced U-relations [2], which guarantee polynomial data complexity for queries
of positive relational algebra (RA, for short). U-relations have been implemented
and are available in the MayBMS system [6].
    As in the case of certain databases, we clearly also want to update uncer-
tain databases and pose queries of unrestricted RA. In [3], an API for uncertain
databases, which also covers updates, is presented. The paper mentions that,
for updates, decompression of the succinct representation of the possible worlds
may be necessary. However, it leaves open the details and also the question if
decompression could at least partly be avoided by some extension of the rep-
resentation system. While the evaluation of positive RA queries on uncertain
databases has been studied extensively, there has been little work beyond posi-
tive RA, like queries with having clauses [7] and queries with one level of anti-join
(not-exists [13]). Fink et al. [5] describe an approach for unrestricted RA queries,
but in a formalism that does not exhibit polynomial time data complexity for
computing possible answers.
    The goal of this work is to start the investigation of the update problem
of U-relations and to tackle the problem of evaluating queries with anti-joins
over this formalism. To this end, we will first define the anti-join operator for
U-relations and show how to use it to model updates. It will turn out that
the decompression of the succinct representation of possible worlds may indeed
be a performance problem. We will therefore introduce an extension of the U-
relation representation system and show that our new formalism may lead to
an exponential decrease in the representation of an updated uncertain database.
Our main contributions will be as follows.
Formal definition. We first present a formal definition of the anti-join operator
on U-relations such that the result is again a U-relation. Using the anti-join, we
then define the semantics of the update operator on U-relations.
New representations. Non-monotonous operators and updates can seriously com-
promise the space efficiency of U-relations. To address this problem, we intro-
duce two generalizations of U-relations: (i) Ui -relations, which additionally allow
inequalities in the descriptors of possible worlds; and (ii) Uint -relations, which
allow intervals to specify ranges of the variables in descriptors of possible worlds.
We prove that by using Ui -relations or Uint -relations, the required storage and
the complexity of query answering can drop exponentially.
Complexity of optimization of world sets. We point out the intractability of a
basic optimization problem for U-relations, Ui -relations, and Uint -relations.


2    Preliminaries
Propositional Logic of Finite Domains. The formalism of U-relations uses
the logic known in the literature as propositional logic of finite domains, PLFD
[12]. It is an extension of propositional logic, in which each variable x is associated
with a fixed set of constants called the domain of x, denoted dom(x). The atomic
formulas in PLFD have the form x = v, where x is a variable and v ∈ dom(x) is a
value from the domain of x. A PLFD interpretation (·)i assigns to each variable
a value from its domain: (x = v)i = > iff xi = v. The interpretations of complex
formulas using the usual Boolean connectives ∧, ∨, →, ¬ are defined as usual.
Possible worlds and U-relations. In the possible worlds semantics, an un-
certain database can have multiple possible states called possible worlds. In the
sequel, we will write P w to denote the state of the relation P in the possible
world w. Two ideas are crucial for U-relations: (i) possible worlds are described
by conjunctions of PLFD atoms, and (ii) vertical decomposition of relations may
allow one to exponentially reduce the size of the representation of the possible
worlds.
    Given an uncertain database B with a relational schema S, one can find a
set of PLFD variables W with accordingly chosen domains, so that each total
valuation (·)i assigning a value to every variable in W corresponds to exactly
one possible world in B. U-relations split each relation R = (T , A) ∈ S with
key T in one or more vertical partitions U1 , . . . , Um with the schema (D, T , Ai ),
i = 1 . . . m where Ai is a subset of the attributes in A, and D stores the world-set
descriptors, or ws-descriptors, for short. In the original definition of the formal-
ism of U-relations [2], the world-set descriptors in D are simply conjunctions
of PLFD atoms. Since possible worlds of B are associated with total valuations
for variables in W , each world-set descriptor φ defines a set of possible worlds:
namely, the worlds associated with the valuations that turn φ to >. The relation
R can be recombined from the partitions U1 , . . . , Um by joining the tuples with
consistent ws-descriptors over T . The semantics of the usual positive relational
operators is a straightforward extension of the semantics of queries over a ver-
tically partitioned database schema, projecting ws-descriptors along with the
primary key columns: see [2] for a complete specification. The main difference
is in the binary operators like cartesian product and join: they only allow com-
bining tuples whose ws-descriptors are not contradictory. Importantly, unary


                                          2
                    Let Ui := JQi K with schema [Di , Ti , Ai ], for i ∈ {1, 2}
       neg dnf clause(U ) := {(d0 , a) | (d, a) ∈ U, d0 is a clause in the DNF of ¬d}
                                                                   
            negate(U, A) := neg dnf clause A GW D πD,A (U ) .
                                                                          
      JQ1 . Q2 K := σD 6|=⊥ πD1 ∧D2 as D,T1 ,A1 U1 n
                                                   o negate(U2 , A2 )          ∪ (U1 . U2 ),
                 where the condition of anti-join . and join n
                                                             o operators is A1 = A2 .

                    Fig. 1: Translation of anti-join for U-relations.


operators do not alter ws-descriptors, whereas positive binary operators take
a conjunction of the ws-descriptors of the relations passed as operands. Thus,
positive relational operators on U-relations can be efficiently implemented.
Notation of Relational Operators. We use the extended form of the pro-
jection operator π which supports computed attributes: e.g., πA1 +A2 as A0 (U ) is
a unary relation whose attribute A0 is the sum of the attributes A1 and A2 in
U . The anti-join operator (“where not exists”) is denoted by the symbol .. The
operator A1 Gf (A2 ) (U ) applies the aggregate function f to the set of attributes
A2 of U , whereby the tuples of U are grouped by the values of attributes A1 .


3    Anti-joins and updates on U-Relations
As pointed out in the previous section, U-relations allow for a natural imple-
mentation of positive relational operators. However, the non-monotonic opera-
tors over U-relations have not yet been considered. One such operator especially
well-suited for U-relations, which must contain tuple ids, is the anti-join.

Definition 1. The anti-join R . P is the uncertain relation defined, for each
possible world w, by the expression Rw \ (Rw >< P w ).

Theorem 1. The translation in Fig. 1 satisfies Definition 1.

Example 1. Consider the U-relation U = (D, T, A) consisting of the following tu-
ples: hx = 1, t1 , ai, hx = 2, t1 , bi, hy = 1, t2 , ai, hy = 2, t2 , ci, hy = 3, t3 , ai. The re-
sult U. of the relational expression σT =t1 (U ) .A σT 6=t1 (U ) is found as follows.
U. contains all possible t1 -tuples of U that do not match any possible t2,3 -
tuples. This corresponds to the (U1 . U2 ) branch in Fig. 1. Such a tuple is
hx = 2, t1 , bi. Additionally, for t1 -tuples that have matches in t2 or t3 , the trans-
formation of ws-descriptors using negate is needed. Such tuple is hx = 1, t1 , ai.
The call negate(σT 6=t1 (U ), A) first aggregates ws-descriptors of tuples that sat-
isfy the condition A = a using ∨ and then passes the relation hy = 1 ∨ y = 3, ai
to neg dnf clause. The latter function negates the aggregate ws-descriptor and
transforms it to the DNF: provided that dom(y) = {1, 2, 3}, the resulting DNF
has a single clause y = 2. Thus, the result of negate(σT 6=t1 (U ), A) is {hy = 2, ai}.


                                                3
The join (σT =t1 (U ) o
                      nA negate(σT 6=t1 (U ), A)) followed by merging ws-descriptors
with ∧ thus gives hx = 1 ∧ y = 2, t1 , ai. Since (x = 1 ∧ y = 2) 6|= ⊥ holds, the
result of the anti-join query U. is {hx = 1 ∧ y = 2, t1 , ai, hx = 1, t1 , ai}.   

   Important on its own right, the anti-join operator allows implementing up-
dates of U-relations. Consider a general form GU of the update operator:

                  GU : update R set Ak = Q.Expr from Q o
                                                       nR

Such update queries are supported by all major RDBMS’s. GU updates an at-
tribute Ak of a relation R = (A1 , . . . , Ak , . . . , An ) and allows to join R with some
query Q serving as a source for the new value Expr of Ak . It is essential that
the update be unambiguous, i.e. if a tuple in R has more than one join partner
in Q then Expr must return the same value for all join partners. The update GU
can be expressed as a query QGU , which is a union of two queries, in which the
former selects the updated tuples and the latter selects the unchanged ones:

                                                          n Q) ∪ (R . Q)
                 QGU : πA1 ,...Ak−1 ,Expr,Ak+1 ,...,An (R o

It is not difficult to show that no positive relational expression can express QGU .


4     Extended U-relations
4.1    Ws-descriptors with inequalities and intervals
As follows from the definition in Fig. 1 and from Example 1, the performance
of anti-join on U-relations depends crucially on the function negate. In compli-
ance with the definition of U-relations [2], this function produces ws-descriptors
being conjunctions of PLFD atoms. In the presence of non-monotonic queries
or updates such restricted form of ws-descriptors may prevent U-relations from
delivering on the promise of representing possible worlds succinctly.

Example 2. Consider a schema R = (T, A1 , . . . , An ), where T is the key at-
tribute, split into n U-relations Ui = (D, T, Ai ). Let B be an uncertain instance
of relation R, and let the tuple t in R have possible values constructed by pick-
ing an arbitrary value from some fixed set of constants V for each attribute
t.Ai . Using the vertical partitioning in U-relations, the |V |n possible values of t
can be represented by total n · |V | tuples in the U-relations Ui , . . . , Un . E.g., for
n = 3 and V = {a, b, c}, the following tuples in the U-relations are needed. U1 :
hx = 1, t, ai, hx = 2, t, bi, hx = 3, t, ci, U2 : hy = 1, t, ai, hy = 2, t, bi, hy = 3, t, ci,
U3 : hz = 1, t, ai, hz = 2, t, bi, hz = 3, t, ci.
    Now, consider the result of the relational expression B . {R(t, a, a, a)}. The
entries with T = t in the U-relations should represent the remaining 26 possible
values of t: V ×V ×V \{a, a, a}. After recombining R from U1,2,3 and subtracting
{hx = 1 ∧ y = 1 ∧ z = 1, t, a, a, ai} according to the definition of ., U1 contains
tuples hx = 1 ∧ y = 2 ∧ z = 1, t, ai, hx = 1 ∧ y = 3 ∧ z = 1, t, ai, hx = 1 ∧ y =
1 ∧ z = 2, t, ai, hx = 1 ∧ y = 1 ∧ z = 3, t, ai, hx = 2, t, bi, hx = 3, t, ci. U2 and

                                              4
U3 need to be updated accordingly. That is, 6 tuples with T = t per U-relation
are now needed to represent the result of anti-join operator, instead of the 3
tuples per U-relation in B. One can show that for R with n attributes split into
n U-relations, additional m · (|V |n − |V |) entries in each U-relation are needed
to represent the deletion of one possible world with m tuples from B.           

We address this issue by allowing more expressive ws-descriptors.
• Iws-descriptors allow for inequalities along with equalities to be present in the
conjunctive formulas. Thereby the PLFD formula x 6= vi is a shorthand for (x =
v1 ) ∨ . . . ∨ (x = vi−1 ) ∨ (x = vi+1 ) ∨ . . . ∨ (x = vm ), where dom(x) = {v1 , . . . , vm },
which explains the source of increased space-efficiency of the new representation.
The resulting formalism is called Ui -relations .
• Intws-descriptors replace equalities and inequalities with intervals of values
which the given variable can take. An intws-descriptor is a set of triples (v, lo, hi)
that consist of a variable v with a lower bound lo and an upper bound hi, such
that lo ≤ hi, lo, hi ∈ dom(v) and such that each v occurs at most once in the set.
It is assumed that the domain of v is 0 . . . n, for some integer n. The respective
modification of U-relations is called Uint -relations. For instance, for vi 6∈ {0, n},
we can represent the formula x 6= vi by two triples (x, 0, vi − 1) and (x, vi + 1, n).

Theorem 2. There exist ws-descriptors whose negation represented as sets of
iws- resp. intws-descriptors is exponentially more succinct than if represented as
sets of ws-descriptors.

For instance, using Ui -relations the result of the anti-join B . {R(t, a, a, a)} from
Example 2 can be expressed by exactly the same number of tuples as in B:
   U1 : hx = 1 ∧ y 6= 1, t, ai, hx = 2, t, bi, hx = 3, t, ci,
   U2 : hy = 1 ∧ x 6= 1 ∧ z 6= 1, t, ai, hy = 2, t, bi, hy = 3, t, ci,
   U3 : hz = 1 ∧ x 6= 1 ∧ y 6= 1, t, ai, hz = 2, t, bi, hz = 3, t, ci.

Extending ws-descriptors preserves the favorable properties of U-relations:

Theorem 3. Positive relational algebra queries can be evaluated on U i - resp.
Uint -relational databases in polynomial time data complexity.

Proof (Sketch). Evaluating positive RA operators on extended U-relations only
differs from evaluating these operators on U-relations of [2] in the consistency
test of the binary operators: the tuples in two relations are only combined if the
respective ws-descriptors are mutually consistent. That is, if the conjunction of
two ws-descriptors is satisfiable. The satisfiability test is feasible in polynomial
time in all three cases of U-relations, Ui -relations and Uint -relations. Therefore,
the claim of the theorem follows from the result in [2] claiming that the problem
of evaluating positive RA operators on U-relations has polynomial data com-
plexity.                                                                           


                                               5
                                 6                                                                                             6.5
                                             ws                                                                                             ws
                                5.5         iws                                                                                 6          iws




                                                                                               time for updates (in seconds)
 time for select (in seconds)
                                 5        intws                                                                                          intws
                                                                                                                               5.5
                                4.5
                                 4                                                                                              5

                                3.5                                                                                            4.5
                                 3                                                                                              4
                                2.5
                                                                                                                               3.5
                                 2
                                1.5                                                                                             3

                                 1                                                                                             2.5
                                      2       3   4      5      6      7      8   9   10                                             2       3   4      5      6      7      8   9   10
                                                  maximum alternatives per ﬁeld                                                                  maximum alternatives per ﬁeld



                                              Fig. 2: Performance impact of the extensions of U-relations.

4.2                                   Optimality of ws-descriptors

The performance of queries and updates depends on the choice of the concrete
expression representing the set of possible worlds. Below, we show that optimiz-
ing sets of ws-descriptors is intractable even without the extensions proposed
above. Obviously, this intractability result also holds for Ui -relations and Uint -
relations.

Proposition 1. Let S be a set of ws-descriptors occurring in a given U-relation
U . Let ω(S) denote a set of all valuations which turn at least one ws-descriptor
in S to >. It is Σ2P -complete to decide whether a ws-descriptor S 0 exists, such
that |S 0 | < |S| and ω(S) = ω(S 0 ).

Proof (Sketch). We show hardness by a reduction from the DNF minimization
problem Min-DNF(φ, k) [9] defined as follows:
Given a propositional formula φ in DNF and an integer k, is there a DNF
formula equivalent to φ that contains k or fewer occurrences of literals?
This problem has been shown to be Σ2P -complete by Umans [11]. We encode an
instance of Min-DNF(φ, k) as a problem in the claim of the theorem. Thereby
each propositional variable x in φ gives rise to a fresh PLFD variable ẋ with
domain {1, 2}, and each conjunctive clause in φ is encoded into a ws-descriptor
by replacing the positive propositional literal x with the PLFD atom ẋ = 1 and
each negative literal x with the PLFD atom ẋ = 2.
    For membership, consider the following algorithm: guess a set of ws-descrip-
tors S 0 and check in polynomial time whether it contains at most k equalities
(PLFD atoms). With a coNP oracle we check that there does not exist a valuation
w such that w ∈ ω(S) and w ∈   / ω(S 0 ) or vice versa. The check itself is feasible
in polynomial time.                                                               


5                                Conclusion

In this paper we have introduced two features of U-relations missing so far: the
non-monotonous operator of anti-join and updates. Moreover, we have shown


                                                                                           6
how the formalism of U-relations can be extended in order to improve the per-
formance of these operations. We have also implemented our extensions on top of
the open-source MayBMS system [6] and experimentally verified their efficiency.
For example, Fig. 2 shows the impact of extended U-relations from Section 4 on
the performance of both select and update queries, for various limits of possible
values which uncertain tuple attributes can take.
    To deal with the high worst-case complexity of optimizing U-relations, we
have incorporated some easy heuristics that often lead to better representations
(which are not guaranteed to be optimal, though). Our experiments show the
effectiveness of these heuristics. A systematic investigation of such methods and
further experiments with extending U-relations, enabling efficient support of
updates and non-monotonic operators, remains a subject of future research.
Acknowledgements. This work has been supported by the Austrian Science
Fund (FWF): P25207-N23.

References
 1. Serge Abiteboul, Paris Kanellakis, and Gösta Grahne. On the representation and
    querying of sets of possible worlds. Theor. Comput. Sci., 78:159–187, 1991.
 2. Lyublena Antova, Thomas Jansen, Christoph Koch, and Dan Olteanu. Fast and
    simple relational processing of uncertain data. In Proc. of ICDE ’08, pages 983–
    992. IEEE, 2008.
 3. Lyublena Antova and Christoph Koch. On APIs for probabilistic databases. In
    Proc. of QDB/MUD ’08, pages 41–56, 2008.
 4. Jihad Boulos, Nilesh N. Dalvi, Bhushan Mandhani, Shobhit Mathur, Christopher
    Ré, and Dan Suciu. MYSTIQ: A system for finding more answers by using prob-
    abilities. In Proc. of SIGMOD ’05, pages 891–893. ACM, 2005.
 5. Robert Fink, Dan Olteanu, and Swaroop Rath. Providing support for full relational
    algebra in probabilistic databases. In Proc. of ICDE ’11, pages 315–326. IEEE.
 6. Christoph Koch. MayBMS: A system for managing large uncertain and probabilis-
    tic databases. In Managing and Mining Uncertain Data. Springer, 2008.
 7. Christopher Ré and Dan Suciu. The trichotomy of having queries on a probabilistic
    database. The VLDB Journal, 18:1091–1116, 2009.
 8. Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambr-
    usch, and Rahul Shah. Orion 2.0: native support for uncertain data. In Proc. of
    SIGMOD ’08, pages 1239–1242. ACM, 2008.
 9. Larry J. Stockmeyer. The polynomial-time hierarchy. Theoretical Computer Sci-
    ence, 3(1):1–22, 1976.
10. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Data-
    bases. Morgan & Claypool, 2011.
11. Christopher Umans. The minimum equivalent DNF problem and shortest impli-
    cants. In Proc. of FOCS ’98, pages 556–563. IEEE, 1998.
12. Andrey Voronkov. Propositional logic of finite domains. Chapter 12 of the “Logic
    and Modelling” course script, University of Manchester. http://www.voronkov.
    com/lm_doc.cgi?what=chapter&n=12, Accessed: March 22, 2013.
13. Ting-You Wang, Christopher Ré, and Dan Suciu. Implementing NOT EXISTS
    predicates over a probabilistic database. In Proc. of QDB/MUD ’08, pages 73–86.
14. Jennifer Widom. Trio: A system for data, uncertainty, and lineage. In Managing
    and Mining Uncertain Data. Springer, 2008.



                                          7