New New Closure Closure Operators Operators and and Lattice Lattice RepresentationsRepresentations for Multivalued Dependencies and Related for Multivalued Expressions Dependencies and Related Expressions Jaume Baixeries1 and José Luis Balcázar1 Dept. Baixeries Jaume Llenguatges 1 i Sistemes Informàtics 1 and José Luis Balcázar Universitat Politècnica de Catalunya 1 c/ Jordi Girona, 1-3Informàtics Dept. Llenguatges i Sistemes Universitat08034 Barcelona Politècnica de Catalunya {jbaixer, balqui}@lsi.upc.edu c/ Jordi Girona, 1-3 08034 Barcelona {jbaixer, balqui}@lsi.upc.edu Abstract. In Database Theory, Multivalued Dependencies are the main tool to define the Fourth Normal Form and, as such, their inference prob- lem has been deeply studied; two related notions appearing in that study are a syntactical analog in propositional logic and a restriction that main- tains to this logic the same relationship as Functional Dependencies do to Horn logic. We present semantic, lattice-theoretic characterizations of such multivalued dependencies that hold in a given relation, as well as similar results for the related notions just mentioned. Our charac- terizations explain better some previously known facts by providing a unifying framework that is also consistent with the studies of Functional Dependencies. 1 Introduction Functional Dependencies (FDs) and Multivalued Dependencies (MVDs) are im- portant notions in Database Theory. They help to formalize the necessary con- ditions on a given relation to ensure, up to various degrees, a reduced amount of redundancy; and they indicate decomposition (more properly, normalization) operations to be performed to obtain such irredundant (normal form) relations. The most widely studied specific problems about these notions concern how to axiomatize, with a limited number of dependencies, a large set of them, or how to test as efficiently as possible whether a given dependency is a logical consequence of others ([7], [8], [14], [18], [19]). The less famous notion of Degenerate Multivalued Dependencies (DMVDs) does not play such an important role in database normalization, and has been usually neglected by database design practitioners, but has been studied by theorists. The interest of DMVDs is that they share some features with FDs (they are equality-generating instead of tuple-generating) and some features with MVDs (they have a similar form and analogous syntactical inference rules); then, their study may clarify relationships between FDs and MVDs. In particular, the highly more complex combinatorics of MVDs compared to FDs makes it Radim Bělohlávek, Václav Snášel (Eds.): CLA 2005, pp. 22–33, ISBN 80–248–0863–3. Multivalued Dependencies and Related Expressions 23 2 J. Baixeries and J.L. Balcázar nontrivial to extend results for FDs into the MVD realm, and then DMVDs may offer a useful intermediate stepping stone. In some ways, dependencies (functional or multivalued) behave as certain fragments of propositional logic. The main interest of studying this analogy is double: first, to propose alternative ways to solve inference problems, either for sets of propositional clauses or for sets of dependencies; and, second, to offer alternative semantic proofs of the syntactical correspondences of the (fully anal- ogous) inference calculi available for FDs and MVDs and for their propositional counterparts. This approach thus allows one to simplify, to some extent, the combinatorics contained in the dependencies. More precise explanations of how functional dependencies and multivalued dependencies are related to proposi- tional clauses are given in the next section, and additional details can be found in [18], [19]. Recent progresses in [3] makes it appropriate to revisit the somewhat unsatis- factory results in [6], trying to obtain a better understanding of the comparison- based binarization. We considered there MVD clauses and stated a restricted form of closure under intersection that characterizes the theories axiomatized by such clauses in a form similar to Horn clauses (a full proof can be found in [4]); whereas that result gave a clear characterization for these propositional theories, it did not provide a lattice form due to the fact that, sometimes, the intersection operation may not be of the restricted form that ensures remaining within the theory. Likewise, the lattice form for MVDs did not consist only of propositional models, but they were labeled with the specific pairs of original tuples that gave rise to them, and the intersection had to be extended to the label through a sort of join. Thanks to the additional knowledge obtained in [1], [2] and [3], we are now able to refine these facts into somewhat more satisfactory lattice-theoretic char- acterizations. Indeed, the purpose of this paper is twofold: on one hand, to provide a lattice-theoretic characterization of a set of MVD clauses that hold in a propositional theory; it generalizes that of [6] and parallels that of [5] for Horn clauses. On the other hand, to clarify the relationship between MVD lattices and MVD clauses for a larger-than-two-tuples world, by exactly identifying the nodes that must be removed from the binarization in order to get the lattice corresponding to MVDs. This result complements the discussion started in [18] about the fragment of propositional logic induced by MVDs; provides a new method to characterize the set of MVDs that hold in a relation; and contributes a new view of how close the MVD lattice is to the DMVD lattice, with the pre- cise understanding of where the difference lies. We expect this progress to allow for further advances in the study of binary representations for more sophisti- cated dependencies that fully lack, so far, lattice-theoretic characterizations. A global longer-term aim guiding this research is to elucidate whether there are deep implications between two properties of dependencies, seen in a general view: some of them have Armstrong relations, others do not; and some of them ad- mit lattice-theoretic characterizations, whereas it appears like others do not. We 24 Jaume Baixeries, José Luis Balcázar New Closure Operators and Lattices for MVD’s and Related Expressions 3 believe that these two properties have a deep interconnection that is worth to investigate and bring into light. 2 Basic Definitions Since we will be dealing with sets of data that have different domains (mul- tivalued and binary) for a set of attributes R, we will denote a set of tuples or a relation r = {t1 , t2 . . . tn } when the attributes are defined over a multi- valued domain, following the usual notation in database research. Likewise we denote a theory, that is, a set of propositional models, each a binary tuple, as T = {m1 , m2 . . . mn } when the domain is binary. Definition 1. The binarization of r, BIN (r), is the set of binary tuples (the- ory) that result from comparing all the pairs of tuples attributewise. For instance, given a set of tuples r = {{a, b, c, d}, {a, b, d, c}, {a, c, c, d}, {a, c, d, c}}, it would yield BIN (r) = {{1, 0, 0, 0}, {1, 0, 1, 1}, {1, 1, 0, 0}} Ob- serve that, being a set, BIN (r) has no duplicates. 2.1 Partitions of attributes A main combinatorial ingredient of our contributions is given by partitioning the set of attributes into equivalence classes. Definition 2. The partition of attributes P ART (mi ) that results from a model mi is the partition of R such that each attribute that has a true value in mi , forms a singleton class, and all the attributes with false value are together in the same class. For instance, if mi = {11001}, then P ART (mi ) = {A|B|CD|E}. We freely say that P ART (T ) is the set of partitions for each model of T , with no du- plications, of course, even if the same partition arises from several models. For instance, if T = {1101, 1011}, then P ART (T ) would be {A|B|C|D}. The set of all the partitions is denoted ℘. The key operation on partitions will be here a standard meet operation cor- responding to the lattice of partitions; the corresponding join operation is also a usual operation on partitions. Namely, given two partitions, their join is formed by all the classes that can be obtained by pairwise intersecting one class from each, that is, the coarsest partition that is finer than both; whereas the meet is the finest partition that is coarser than both, so that the partial ordering asso- ciated to the lattice is the natural ordering where P ≤ P 0 whenever P 0 is finer than P . More details on alternative ways to define the meet operation can found in [3]. Moreover, through the operation P ART , we obtain a way of computing a meet operation on models that is different from the standard bitwise intersec- tion. For instance, 1110 ∧ 0011 = 0011, because P ART (1110) = {a|b|c|d} and Multivalued Dependencies and Related Expressions 25 4 J. Baixeries and J.L. Balcázar P ART (0011) = {ab|c|d}, and {a|b|c|d} ∧ {ab|c|d} = {ab|c|d} corresponding to the model 0011. Note that the operator ∧ may not be applicable to some models, in the sense that the meet partition might have more than one nonsingleton class and then there would be no model corresponding to the meet partition (e.g., for 11100 and 00111). Either among partitions or among models, we employ the following notation: Definition 3. For any given set S, if the operator ∧ is defined on that set, the closure under meet of a subset Q ⊆ S, denoted [Q]∧ , is the smallest subset of S that contains Q and is closed under ∧. 2.2 Clauses and Dependencies All our expressions have two parts, an antecedent and a consequent, each being a set of attributes or of propositional variables, denoted X, Y , or the like; juxta- position as in XY denotes union. It is customary to separate them by some sort of double arrow. For now on, we use only the word “attributes” to mean “at- tributes or propositional variables” whenever this is the necessary interpretation. The various expressions, and their semantics, are as follows. Definition 4. A multivalued dependency clause (MVDcl) X ⇒⇒ Y holds in a theory T if for each mi ∈ T if mi |= X then mi |= Y or mi |= R \ XY . It is, if the X attributes are true, then, the Y attributes are true or the R \ XY attributes are true. Definition 5. A degenerate multivalued dependency (DMVD) X 7→ 7→ Y holds in r if whenever two tuples ti , tj ∈ r, ti [X] = tj [X] implies that ti [Y ] = tj [Y ] or ti [R \ XY ] = tj [R \ XY ]. Definition 6. A multivalued dependency (MVD) X →→ Y holds in r if whenever two tuples ti , tj ∈ r coincide in X, ti [X] = tj [X], it implies that the tuples t0i , t0j exist in r, where t0i [X] = t0j [X] = ti [X], t0i [Y ] = ti [Y ], t0j [Y ] = tj [Y ], and conversely t0i [R \ XY ] = tj [R \ XY ] and t0j [R \ XY ] = ti [R \ XY ]. Definition 7. The syntactically equivalent MVD (DMVD) of a MVDcl X ⇒⇒ Y is X → → Y (X 7→ 7→ Y ); and similarly between MVDs and DMVDs. It is immediate to check that, if a degenerate multivalued dependency holds in a relation r, so does the syntactically equivalent multivalued dependency. Some common properties for DMVDs, MVDs and MVD clauses are the following: 1. Complementation: If X → Y holds, then X → R \ Y holds. 2. Reflexivity: If Y ⊂ X, then X → Y holds. 3. Union of right-hand side: If X → Y and X → Y 0 hold, then X → Y Y 0 holds. where X → Y can be a DMVD, a MVD or a MVD clause. For a proof of these properties the reader can refer to [21] or [2]. For each left-hand side of a MVDcl, a DMVD or a MVD, we can define its dependency basis as follows: 26 Jaume Baixeries, José Luis Balcázar New Closure Operators and Lattices for MVD’s and Related Expressions 5 Definition 8. Given a theory T (a relation r), the MVDcl dependency basis (DMVD dependency basis, MVD dependency basis) for a set of attributes X, DEP (X), is a partition of the set of attributes such that for all MVDcl (DMVD, MVD) that have X in their left hand side, they hold in T (in r) if and only if their right hand side may be formed by the union of classes of DEP (X). We note that, for a given set of attributes X, all these attributes will appear as singletons in DEP (X) regardless of the kind of dependency basis. This is so because of the reflexivity property that holds in all these dependencies; they are often omitted in this generalized dependency. Additionally, the unions Y mentioned at the end include, as particular cases, the individual classes of the partition DEP (X). The definition of a dependency basis allows us to group all the MVDcl, DMVDs or MVDs that have the same left-hand side in one generalized MVDcl, DMVD or MVD: Definition 9. All the MVDcl (DMVD, MVD) that have X in their lefthand side may be represented by a generalized MVDcl (DMVD, MVD) X ⇒⇒ DEP (X) (X 7→ 7→ DEP (X), X → → DEP (X).). In [2] and [3], we defined closure operators on partitions, characterizing all the DMVDs (MVDs respectively) that hold in a relation; the characterizations have the same overall form as Theorem 1 below. They also were proved to derive Armstrong relations for that set of DMVDs (MVDs). An Armstrong relation for a set of dependencies is a relation such that all those dependencies hold in this relation, as well as all the dependencies that are a logical consequence of this set, but the rest of dependencies do not hold. It can be used as an alternative way to characterize a set of dependencies. In this present paper, those closure operators will be called ΓDM V D and ΓM V D respectively. Another relevant notion about partitions of attributes is that of matching, which varies depending whether the partition is considered to be modelling a set of DMVDs or a set of MVDs. We provide here a brief description of this property (more details can be found in [2] and [3]): we say that a partition of attributes P = {P1 |P2 | . . . |Pn } matches a set of tuples C (when modelling a set of DMVDs) if all the tuples in C differ only in one class of attributes. If this same partition is considered to be modelling a set of MVDs, we say that P matches a set of tuples C if C = P r(P1 , C) × P r(P2 , C) × · · · × P r(Pn , C), where P r(X, C) is the projection of C on the attributes X, that is, the set of all tuples obtained from C by removing the values corresponding to attributes not in X. 3 Lattice Characterization of Multivalued Dependency Clauses It is easy to prove that, given a relation r, a DMVD X 7→ 7→ Y holds in r if and only if the syntactically equivalent MVDcl X ⇒⇒ Y holds in BIN (r), as the next proposition states: Multivalued Dependencies and Related Expressions 27 6 J. Baixeries and J.L. Balcázar Proposition 1. Let be r a relation, then X 7→ 7→ Y holds in r if and only if X ⇒⇒ Y holds in BIN (r). According to this result, and due to the fact that the operator ΓDM V D as defined in [2] provides a complete characterization of a set of DMVDs, the lattice that characterizes the MVD clauses that hold in BIN (r) must be the same. However, given a theory T whose lattice we wish to construct, we would need to start by going back into a relation r such that BIN (r) = T and operate on the DMVDs of r; and care must be exercised to make sure that BIN (r) does not produce additional models. We deem useful to provide a construction of the lattice that goes on directly from T , by means of a new closure operator appropriate to work on models. We indeed provide now an operator that, given a theory, calculates a lattice that characterizes the set of all MVD clauses that hold in that theory, which is the following: Definition 10. Fix a theory T . Let ΓM V Dcl : ℘ 7→ [P ART (T )]∧ a map such that given a partition of attributes P , returns a partition of attributes P 0 such that P ≤ P 0 and for all P 00 ∈ [P ART (T )]∧ , if P ≤ P 00 ≤ P 0 implies that P 00 = P 0 . Note that it is well-defined since P ≤ P 0 and P ≤ P 00 implies that P ≤ P ∧ P 00 , and the codomain is closed under meet. 0 Proposition 2. ΓM V Dcl is a closure operator. Proof. We prove the three axioms for closure operators. 1. P ≤ ΓM V Dcl (P ). Obvious since ΓM V Dcl always returns a P 0 ∈ [P ART (T )]∧ such that P ≤ P 0 . 2. ΓM V Dcl (P ) = ΓM V Dcl (ΓM V Dcl (P )). Let P 0 = ΓM V Dcl (P ) and, besides, Q = ΓM V Dcl (P 0 ) so that P 0 ≤ Q. For all P 00 ∈ [P ART (T )]∧ , P 0 ≤ P 00 ≤ Q implies that P 00 = Q, and since P 0 ∈ [P ART (T )]∧ and P 0 ≤ P 0 ≤ Q, we have that Q = P 0 . 3. P ≤ P 0 ⇒ ΓM V Dcl (P ) ≤ ΓM V Dcl (P 0 ). Since P ≤ P 0 , we have that P ≤ ΓM V Dcl (P 0 ), and also P ≤ ΓM V Dcl (P ). We then have that P ≤ ΓM V Dcl (P ) ∧ ΓM V Dcl (P 0 ) ≤ ΓM V Dcl (P ) By the closure under meet, ΓM V Dcl (P ) ∧ ΓM V Dcl (P 0 ) ∈ [P ART (T )]∧ ; by the last condition of the definition, we get ΓM V Dcl (P ) ∧ ΓM V Dcl (P 0 ) = ΓM V Dcl (P ) or, equivalently, ΓM V Dcl (P ) ≤ ΓM V Dcl (P 0 ) as desired. Note that P ART (mi ) can only express a partition that has only one class that is not a singleton. Then, all the partitions that belong to P ART (T ) will have only one class that is not a singleton. Partitions with more than one nonsingleton can be obtained through the meet operation. The next theorem proves that ΓM V Dcl characterizes all the MVD clauses that hold in a theory T : 28 Jaume Baixeries, José Luis Balcázar New Closure Operators and Lattices for MVD’s and Related Expressions 7 Theorem 1. A generalized MVD clause X ⇒⇒ Y1 | . . . |Ym holds in T if and only if for P = {X1 | . . . |Xn |Y1 | . . . |Ym } and P 0 = {X1 | . . . |Xn |Y1 . . . Ym } we have that ΓM V Dcl (P 0 ) = P , where the Xj are the attributes in X. Proof. (⇒) We first note that the fact that a generalized MVD clause X ⇒⇒ Y1 | . . . |Ym holds in T means that these Yi constitute the finest possible right hand side for a generalized MVD clause that has X in its left hand side. For each i, each model in T must satisfy X ⇒⇒ Yi ∨ Y 0 , where Y 0 stands for the union of the remaining classes of attributes. This implies that all the zeros of the model are in the same class Yi , since otherwise it is easy to check that this last condition becomes violated. Conversely, it is immediate to see that, if all the models satisfying X have all their zeros together in one single Yi , then T satisfies the dependency clause. We must prove that P 0 = {X1 | . . . |Xn |Y1 | . . . |Ym } belongs to [P ART (T )]∧ , and that no other partition in [P ART (T )]∧ exists in 0 V between P and P . Consider 0 the theory T = {m ∈ T m |= X}, and let Q = m∈T 0 P ART (m). Clearly all Xi are singletons in all P ART (m) form m ∈ T 0 , and, as just described, all of them have their zeros (the only nonsingleton in P ART (m)) included in one of the Yi ’s, so that for allVsuch m we have P 0 ≤ P ART (m), and the properties of the meet ensure P 0 ≤ m∈T 0 P ART (m) = Q. We argue that actually P 0 = Q so that indeed P 0 ∈ [P ART (T )]∧ : assume that P 0 < Q, that is, some class Yi of P 0 is split in Q into more than one class. Fix two attributes A and B that belong to Yi that are in different classes of Q, say ZA and ZB respectively; from P 0 ≤ Q as just proved, ZA ∪ ZB ⊆ Yi . Noting the definition of Q, we see that no model in T 0 has zeros both in ZA and in ZB , since otherwise the meet operation would have joined them into a single class. Then, the observation we made in the previous paragraph tells us that the multivalued dependency clause X ⇒⇒ ZA is true in T , and ZA should be obtained as a union of Yi ’s, which is not the case. Therefore P 0 = Q, so that P 0 ∈ [P ART (T )]∧ . It remains to prove that there is no other partition P 00 ∈ [P ART (T )]∧ with P ≤ P 00 ≤ P 0 , except, of course, for P 0 itself, and this will imply that ΓM V Dcl (P )V= P 0 , as we wish. Assume P 00 ∈ [P ART (T )]∧ , and let T 00 ⊆ T such that P 00 = m∈T 00 P ART (m). Again all the Xi must be singletons in P 00 due to P ≤ P 00 , which means that the models 00 V in T satisfy X, or,V equivalently, T 00 ⊆ T 0 . It is immediate to see, then, that m∈T 0 P ART (m) ≤ m∈T 00 P ART (m) be- cause, due to the way the meet operation works, having at least as many tuples in T 0 makes the partition at least as coarse as the one corresponding to T 00 . That is, we see that P 0 ≤ P 00 ; and from the assumption P 00 ≤ P 0 we get P 00 = P 0 . Taken together, all our consequents prove that ΓM V Dcl (P ) = P 0 . (⇐) We prove that if ΓM V Dcl (P 0 ) = P , then X ⇒⇒ Y1 | . . . |Yn holds in T . We use the same simple observation as in the first paragraph of the converse direction. Pick any model m ∈ T that satisfies X: we will prove that all the zeros of m are in the same Yi , and this implies that the MVD clause is true in T . This is not difficult now, since whenever m satisfies X all the singletons of P are also singletons of m, that is, P ≤ P ART (m), and because P ART (m) belongs obviously to the lattice, the closure of P , namely P 0 under our assumption, Multivalued Dependencies and Related Expressions 29 8 J. Baixeries and J.L. Balcázar must also obey P 0 ≤ P ART (m). The only way for this to hold is that the only nonsingleton class of P ART (m) is fully included in one of the classes of P 0 , that is, all the zeros of m are in one single Yi , and this implies that the MVD clause is true. In fact, completely analogous theorems for DMVDs and MVDs, using the corresponding closure operators instead, were proved in [2] and [3] respectively. We define LM V Dcl (T ) as the set of closed partitions of attributes that are closed under ΓM V Dcl for T . Since ΓM V Dcl is a closure operator, the set LM V Dcl (T ) is closed under the operator meet. We have seen how to create a lattice to repre- sent the MVD clauses that hold in a theory T and, as we previously mentioned, since the equivalence of a DMVD that holds in a relation and a MVDcl that holds in the binarization of that relation is direct, then we can conclude that the lattices generated by ΓM V Dcl and ΓDM V D must be the same. Theorem 1 could have been proved instead by the following argumentation: By Proposition 1, we have that ΓM V Dcl generates the same set of closed parti- tions in BIN (r) as ΓDM V D in r. Then, to prove Theorem 1 we may also derive it from the following theorem, whose proof resorts heavily to our previous work: Theorem 2. Let P = {P1 | . . . |Pn } be a partition of attributes. P = ΓM V Dcl (P ) ⇔ P = ΓDM V D (P ). Proof. For this proof, we will use the Armstrong relation that can be derived from a set of closed partitions as described in [2]. ⇒) P = ΓM V Dcl (P ) implies that in the Armstrong relation for each Pi ∈ P there will be a pair of tuples that will only differ in Pi , and each pair will be different one from each other. It implies that if BIN is performed between the two tuples of each pair (otherwise it will result in a all zeros model), it will result in a binary model such that all the attributes in Pi will be set to zero, and the rest will be set to one. It induces a set of partitions such that all the attributes not in Pi will be singleton, and those in Pi will be in the same partition. Since it happens for all Pi ∈ P , the meet for all those partitions will be P , which proves that it will be closed under ΓDM V D . ⇐) P = ΓDM V D (P ) implies that for each Pi ∈ P that is not a singleton, there is a P 0 such that all the attributes are singleton except those that are in Pi . For all those models, it would imply that there is a pair of tuples that disagree only in Pi , which constitues a set of tuples that imply that P = ΓM V Dcl (P ). Theorem 1 and the correspondence between MVD clauses and DMVD’s gives us an alternative way to construct a lattice and derive the DMVDs of a relation r: binarizing the tuples in r, and closing under meet the resulting partitions of attributes. This process parallels that of the construction of Armstrong relations for a set of FD’s [15] and that of constructing an Armstrong relation for a set of DMVDs in [2]. Corollary 1. [P ART (BIN (r))]∧ = LM V Dcl (BIN (r)) = LDM V D (r). 30 Jaume Baixeries, José Luis Balcázar New Closure Operators and Lattices for MVD’s and Related Expressions 9 4 Multivalued Dependency Clauses and Multivalued Dependencies We just proved that [P ART (BIN (r))]∧ = LDM V D (r), it is, that binarizing a relation, then transforming these models into partitions of attributes and then closing the resulting models under meet, the set of partitions of attributes so constructed characterizes the DMVDs that hold in r. Our previous paper [2] pro- vided such a construction by a Galois connection operating on partitions and re- lying strongly on the “matching” condition defined before: now we see that bina- rization offers a different, more natural way for both obtaining the set of DMVDs that hold in a relation and performing the implication test for DMVDs. Equipped with these facts, we move now to the most relevant notion, MVDs proper: in this section we compare LM V D (r) and [P ART (BIN (r))]∧ . First consider the possibility of having an equivalence of the form LM V D (r) = [P ART (BIN (r))]∧ . This correspondence holds in relations were the sets of attributes that have some values in common are reduced to sets of two tuples (a two-tuples world) as it is proved in [19]. Proposition 3 next states that the MVD lattice is actually in- cluded in the DMVD lattice; therefore, if the inclusion is an equality, then the equality [P ART (BIN (r))]∧ = LM V D holds, otherwise, it does not. Proposition 3. Let r be a relation, and let LDM V D (r) and LM V D (r) the lat- tices that characterize the DMVDs and MVDs respectively that hold in r. Then, LM V D (r) ⊆ LDM V D (r). Proof. Fixed r, we prove that if partition P is not in LDM V D (r) then P is not in LM V D (r) either. We use the facts, proved in [2] and [3] respectively, that par- allel theorem 1 for DMVDs and MVDs: each application of the corresponding closure operator to a partition P that gives a different partition corresponds to a DMVD, respectively MVD, that holds in the relation given. If P ∈ / LDM V D (r) then a nontrivial degenerate multivalued dependency with singletons from P as left hand side holds in r; as indicated just after the definition, if the degenerate multivalued dependency holds then its syntactically equivalent multivalued de- pendency holds as well; the syntactical equivalence implies that its left hand side is the same and therefore singletons of P , so that the closure operator for MVDs also maps P into a different partition and thus P is not closed in LM V D (r) either. Our next result provides a postprocessing method that eliminates the appro- priate models in BIN (r) to obtain the MVD lattice. The method consists of traversing the DMVD lattice top-down, and testing, for each node, the condi- tions stated in the theorem below; at that point, it is necessary to assume that all the nodes above the current node have been already tested, since we need for the proof the meet of all the nodes in the MVD lattice that sit above the node undergoing the test. In fact, it is immediate to see that, upon considering a node P , if we take the meet of all the nodes above it in the MVD lattice, we obtain a node Q ≥ P Multivalued Dependencies and Related Expressions 31 10 J. Baixeries and J.L. Balcázar which is, in fact, one of them due to the closure under meet; and, of course, to maintain the closure property, in case Q = P then P also belongs to the MVD lattice. Thus, assume from now on that P < Q; since all nodes above P are already correctly placed in the MVD lattice, we know that either P belongs to it, or its closure is exactly Q. Thus, we consider a generalized dependency on the basis of P and Q, and we know from [3] that P is in the lattice if and only if the dependency does not hold. Namely, assuming that P = {X1 | . . . |Xn |Y } with X = X1 . . . Xn and Q = {X1 | . . . |Xn |Y1 | . . . |Ym }, P must be removed from the lattice if and only if r |= X →→ Yi for each i. We prove that considering information local to P and Q suffices. Theorem 3. Let m ∈ BIN (r) and P = P ART (m), where P = {X1 | . . . |Xn |Y } with X = X1 . . . Xn , and Q = {X1 | . . . |Xn |Y1 | . . . |Ym } is the meet of all the nodes above P in LM V D (r); assume P < Q. For each Yi , consider the related models m1 and m2 : m1 has true the true variables of m and those corresponding to Yi , and m2 has all variables true except those in Yi . It holds that P ∈ LM V D (r) if and only if there is a Yi (with associated m1 and m2 ) and a pair of tuples t1 and t2 whose comparison results in m, for which there are no pairs t3 and t4 so that the comparisons of t1 with t3 and of t2 with t4 yield m1 and the comparisons of t1 with t4 and of t2 with t3 yield m2 . Proof. One direction is easy. If indeed there is Yi and the pair of tuples, then it can be readily checked from the properties of m1 and m2 that this pair witness that r 6|= X → → Yi , so the generalized dependency constructed from P and Q does not hold; it follows from the results in [3] that Q cannot be the closure of P . But ΓM V D (P ) is in the MVD lattice and, if it was above P , the definition of Q would imply P < Q < ΓM V D (P ) which contradicts the properties of the closure operator. Thus the only possibility is ΓM V D (P ) = P so that P ∈ LM V D (r). For the converse, in fact all the steps in the previous paragraph are “if and only if”, up to the point that r 6|= X →→ Yi However, this does not complete the proof since there could be pairs of tuples violating the dependency somewhere else: it remains to argue that t1 and t2 violating the implication can be picked so that their comparison is m. In fact we prove more: we prove that all these pairs are there. Fix any pair of tuples whose comparison gives m0 6= m: we will prove that they satisfy the conditions for X →→ Yi . Note that what we want to prove, that the X → → Yi are satisfied for all Yi in this case, is the same as proving that the generalized dependency X → → Y1 | . . . |Ym is satisfied: it is the one that would correspond to P in case its closure was Q. Since P = P ART (m) and X is the union of the singletons of P , that is, the ones of m, the result is immediate whenever m0 has a zero among these, because the pair of tuples would not satisfy the antecedent. Thus, we only need to consider m0 > m. Let P 0 = P ART (m0 ), which implies that P 0 has at most one class that is not a singleton and that P ≤ P 0 , and suppose first that Q ≤ P 0 . Then, Q being coarser, the zeros of m0 must be all in the same class in Q and the pair of tuples differ only in one class of Q, which implies that they trivially satisfy the generalized dependency. 32 Jaume Baixeries, José Luis Balcázar New Closure Operators and Lattices for MVD’s and Related Expressions 11 Therefore, we suppose now that Q 6≤ P 0 , that is, P ≤ P 0 ∧ Q < Q, which implies that P 0 ∈ / LM V D (r) since otherwise, due to the closure under meet, Q would not have been the meet of all the part of the MVD lattice above P . This implies that a dependency holds in r associated to P 0 and ΓM V D (P 0 ). Its left hand side X 0 is formed by the singletons of P 0 , which include those of P , say X1 , . . . , Xk , and some more, say Xk+1 , . . . , Xn ; call the union of this set W = W1 W2 according to whether they come from Yi or from its complement. Regarding its right hand side, made up by the rest of the classes of ΓM V D (P 0 ), we know that P ≤ P 0 ≤ ΓM V D (P 0 ), which surely is in the MVD lattice, so that Q ≤ ΓM V D (P 0 ) by the definition of Q. Thus, we find that Yi is W1 (from the left hand side X 0 ) union a union of classes of the right hand side. A pair of tuples corresponding to m0 must coincide in X1 , . . . , Xn = XW1 W2 , and from the fact that the generalized dependency associated to P 0 holds we easily obtain the tuples needed to prove that the pair fulfills X → → Yi as well. According to this process we can identify one by one, proceeding from the top down, those nodes in LDM V D that must remain in LM V D and distinguish them from those that must be deleted to construct LM V D from LDM V D . References 1. Baixeries J. A Formal Concept Analysis framework to model functional dependen- cies. Mathematical Methods for Learning (2004). 2. Baixeries J., Balcázar, J.L. Characterization and Armstrong relations for Degen- erate Multivalued Dependencies using Formal Concept Analysis. The Third Inter- national Conference on Formal Concept Analysis 2005. 3. Baixeries J., Balcázar, J.L. A Lattice Representation of Relations, Multivalued De- pendencies and Armstrong Relations. 13th International Conference on Conceptual Structures (ICCS’05). 4. Balcázar, J.L. Query learning of Horn formulas revisited.. To appear in: Com- putability in Europe 2005.. 5. Balcázar, J.L., Baixeries J. Discrete Deterministic Data Mining as Knowledge Compilation. Workshop on Discrete Mathematics and Data Mining in SIAM In- ternational Conference on Data Mining (2003). 6. Balcázar, J.L., Baixeries J. Characterizations of multivalued dependencies and re- lated expressions. Discovery Science 2004. 7. Beeri C., Fagin R., Howard J. H. A complete axiomatization for functional and mul- tivalued dependencies in database relations. Jr. Proc. 1977 ACM SIGMOD Sym- posium, ed. D. C. P. Smith, Toronto, pp. 47-61. 8. Beeri C. On the Membership Problem for Functional and Multivalued Dependencies in Relational Databases. ACM Transactions on Database Systems, Vol 5, No. 3, September 1980, pages 241-259. 9. Davey B.A., Priestley H.A. Introduction to Lattices and Order. Second edition. Cambridge University Press, 1990, 2002. 10. Day A. A lattice interpretation of database dependencies. Semantics of Program- ming Languages and Model Theory (M. Droste and Yu. Gurevich, eds), Gordon and Breach, London. Multivalued Dependencies and Related Expressions 33 12 J. Baixeries and J.L. Balcázar 11. Demetrovics J., Hencsey G., Libkin L., Muchnik I. Normal Form Relation Schemes: A New Characterization. Acta Cybernetica. 10(3): 141-164 (1992). 12. Demetrovics J., Libkin L., Muchnik I. Functional Dependencies in Relational Databases: A Lattice Point of View. Discrete Applied Mathematics 40(2): 155-185 (1992). 13. Guigues J. L., Duquenne V. Familles minimales d’implications informatives resul- tant d’un tableau de données binaires. Math. Sci. hum., 24(95):5–18, 1986. 14. Fagin R., Vardi Y. V. The theory of data dependencies: a survey. Mathematics of Information Processing, Proceedings of Symposia in Applied Mathematics, AMS, 1986, vol. 34, pp. 19-72. 15. Fagin R. Armstrong databases. Invited paper, Proc. 7th IBM Symposium on Math- ematical Foundations of Computer Science, Kanagawa, Japan, May 1982. 16. Ganter, B., Wille R. Formal Concept Analysis. Mathematical Foundations. Springer, 1999. 17. Lopes, S., Petit J-M., Lakhal L. Functional and approximate dependency mining: database and FCA points of view. Special issue of Journal of Experimental and Theoretical Artificial Intelligence (JETAI) on Concept Lattices for KDD, 14(2- 3):93-114, Taylor and Francis, 2002. 18. Sagiv Y. An algorithm for inferring multivalued dependencies with an application to propositional logic. Journal of the ACM, 27(2):250-262, April 1980. 19. Sagiv Y., Delobel C., Parker S., Fagin R. An Equivalence Between Relational Database Dependencies and a Fragment of Propositional Logic. Journal of the ACM, Vol 28, No 3, July 1981, 435-453. 20. Thalheim B. Conceptual Treatment of Multivalued Dependencies. 22nd Interna- tional Conference on Conceptual Modeling, Chicago, IL, USA, Lecture Notes in Computer Science Springer 2003, pp 363 - 275. 21. Ullman J.D. Principles of Database and Knowledge-Base Systems. Computer Sci- ence Press, Inc. 1988.