From an implicational system to its corresponding D-basis Estrella Rodrı́guez-Lorenzo1 , Kira Adaricheva2 , Pablo Cordero1 , Manuel Enciso1 , and Angel Mora1 1 University of Málaga, Andalucı́a Tech, Spain, e-mail: {estrellarodlor,amora}@ctima.uma.es, pcordero@uma.es, enciso@lcc.uma.es 2 Nazarbayev University, Kazakhstan e-mail: kira.adaricheva@nu.edu.kz Abstract. Closure system is a fundamental concept appearing in several areas such as databases, formal concept analysis, artificial intelligence, etc. It is well-known that there exists a connection between a closure operator on a set and the lattice of its closed sets. Furthermore, the closure system can be replaced by a set of implications but this set has usually a lot of redundancy inducing non desired properties. In the literature, there is a common interest in the search of the mini- mality of a set of implications because of the importance of bases. The well-known Duquenne-Guigues basis satisfies this minimality condition. However, several authors emphasize the relevance of the optimality in order to reduce the size of implications in the basis. In addition to this, some bases have been defined to improve the computation of closures relying on the directness property. The efficiency of computation with the direct basis is achieved due to the fact that the closure is computed in one traversal. In this work, we focus on the D-basis, which is ordered-direct. An open problem is to obtain it from an arbitrary implicational system, so it is our aim in this paper. We introduce a method to compute the D-basis by means of minimal generators calculated using the Simplification Logic for implications. 1 Introduction Discovering knowledge and information retrieval are currently active issues where Formal Concept Analysis (FCA) provides tools and methods for data analysis. The notions around the concept lattice may be considered as the main attractions in Formal Concept Analysis and they are strongly connected to the notion of closure. Closure system is a fundamental concept appearing in several areas such as database theory, formal concept analysis, artificial intelligence, etc. It is well- known that there exists a connection between a closure operator on a set and the lattice of its closed sets. Furthermore, the closure system can be presented, dually, as a set of attribute implications, namely an implicational system but this set has usually a lot of redundancy inducing non-desired properties. c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 217–228, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 218 Estrella Rodríguez-Lorenzo et al. We can not fail to mention the relevance of the role of the implication notion in different areas. It was the main actor of the normalization theory in database area; it has an outstanding character in Formal Concept Analysis and it was prominently used in Frequent Set Mining and Learning Spaces, see the survey of M. Wild [10]. The latter is devoted to mathematical theory of implications and the different faces of the concept of an implication. Implications linked data represented in several forms going from the relationship between itemsets in transactions (Frequent Set Mining) to the boolean functions (Horn Theory). Nonetheless, as V. Duquenne says in [6] “it is surprising if not hard to ac- knowledge that we did not learn much more on their intimacy in the meantime, despite many interesting papers using or revisiting them”. We believe there is a long way to go, and a deeper theory on properties of implications with automated and efficient methods to manipulate them can be developed. In this paper, we are focused in the Formal Concept Analysis area and the fundamental notions are assumed (see [7]). The task of information retrieval carried out by the tools in FCA conduits to infer concepts from the data set, i.e. to deduce (in an automated way) a set of objects that may be precisely characterized by a set of attributes. Such concepts inherit an order relation induced by attribute set inclusion, providing a lattice structure of the concept set. Here implications are retrieved from a binary table (formal context) representing the relationship between a set of objects and a set of attributes. Implications represent an alternative way for the underlying information contained in the formal context. Many applications must massively compute closures of sets of attributes and any improvement of execution time is relevant. In [9] the author establishes the necessity of obtaining succinct representation of closure operators to achieve an efficient computational usage. In this direction, properties associated to implica- tions are studied to render equivalent sets fulfilling desired properties, directness and optimality. An important matter in FCA is to transform implicational systems in canon- ical forms for special proposals in order to provide an efficient further man- agement. Hence, some alternative definitions have been established: Duquenne- Guigues basis, direct optimal basis, D-basis, etc. In this work we focus on the last one [1], because it combines, in a balanced way, a brief representation (it has a small number of elements) and a efficient computation of closures (it is computed in just one traversal). To this end, D-basis proposes an order in which implications will be attended. The major issue is that the execution of the D-basis in one iteration is more efficient that the execution of a shorter, but un-ordered one, for instance the canonical basis of Duquenne and Guigues. K. Adaricheva et.al prove in [1] that one can extract the D-basis from any direct unit basis Σ in time polynomial of size of Σ, and it takes only linear time of the number of implications of the D-basis to put it into a proper order. In [5] we have proposed a method to calculate all the minimal generators from a set of implications as a way to remove redundancy in the basis. The method From an implicational system to its corresponding D-basis 219 to compute all the minimal generators is based on the Simplification Logic for implications [8]. Using this logic we are able to remove redundancy in the impli- cations [4] and following the same style of application of the Simplification Rule to the set of implications we can obtain all the minimal generators. Currently the retrieval of the D-basis from an arbitrary implicational sys- tem is an open problem, so it becomes our aim in this paper. We introduce a method to compute the D-basis by means of minimal generators calculated us- ing the Simplification Logic for implications. The relationship among minimal generators, covers, minimal covers and D-basis is presented and an algorithm to calculate D-basis from an arbitrary set of implications is proposed. Section 2 presents the main notions necessary to the understanding of the new method: closure operators, the D-basis, Simplification Logic and the method to calculate minimal generators. In Section 3, the relationships between covers and generators are presented. In Section 4, the new method to obtain the D- basis from a set of implications is shown, and some conclusions and extensions are proposed in Section 5. 2 Background 2.1 Closure systems Given a non-empty set M and the set1 2M of all its subsets, a closure operator is a map φ : 2M → 2M that satisfies the following, for all X, Y ∈ 2M : (1) increasing: X ⊆ φ(X); (2) isotone: X ⊆ Y implies φ(X) ⊆ φ(Y ); (3) idempotent: φ(φ(X)) = φ(X). We will refer to the pair hM, φi of a set M and a closure operator on it as a closure system. In the next two subsections we will follow the introduction of the implica- tional system based on the minimal proper covers 2 given in [1], which was named there the D-basis. We will call closure system reduced, if φ({x}) = φ({y}) → x = y, for any x, y ∈ M 3 . If the closure system hM, φi is not reduced, one can modify it to produce an equivalent one that is reduced, see [1] for more details. We will now define a closure operator φ∗ , which is associated with a given operator φ. hM, φi be a closure system. Define φ∗ as a self-map on 2M Definition 1. Let S ∗ such that φ (X) = x∈X φ(x), X ⊆ 2M . It is straightforward to verify that 1 In the FCA framework, that set M can be thought a set of attributes of a context. 2 Although in [1] it was introduced as minimal cover, here we name it minimal proper cover because in this paper we generalize the notion of cover in Section 3. 3 To clarify the notation φ({x}) will be represented as φ(x) if no risk of confusion. 220 Estrella Rodríguez-Lorenzo et al. Lemma 1. φ∗ is a closure operator on M . Given a closure system hM, φi, we introduce several important concepts. Definition 2 ([1]). For x ∈ M we call a subset X ⊆ M a proper cover for x if x ∈ φ(X) \ φ∗ (X). If X is a proper cover for x, it will be denoted as x ∼p X. 2.2 The D-basis In this subsection, we briefly summarize the introduction of the D-basis in [1]. Its definition is strongly based on the notion of a minimal proper cover: Definition 3. A proper cover Y for x is called minimal, if, for any other proper cover Z for x, Z ⊆ φ∗ (Y ) implies Y ⊆ Z. The existence of several proper covers for the same element induces the need to introduce the notion of minimality. Lemma 2. If x ∼p X, then there exists Y such that x ∼p Y , Y ⊆ φ∗ (X) and Y is a minimal proper cover for x. In other words, every proper cover can be reduced to a minimal proper cover under the subset relation added with the φ∗ operator. These ideas bring to the following definition of the implicational system defin- ing the reduced closure system by means of the minimal proper covers. Definition 4. Given a reduced closure system hM, φi, define the D-basis ΣD as a union of two subsets of implications: 1. {y → x : x ∈ φ(y) \ y, y ∈ M } (such implications are called binary); 2. {X → x : X is a minimal proper cover for x}. Note that the D-basis belongs to the family of the unit bases, i.e. implica- tional sets where each implication A → b has a singleton b ∈ M as a consequent. Lemma 3. ΣD generates hM, φi. 2.3 Ordered direct set of implications Here we recall the notion of the ordered direct basis introduced in [1], which is designed for a quick computation of the closures based on some fixed order of implications. First we recall the definition of the ordered iteration of implications. Definition 5. Suppose the set of implications Σ is equipped with some linear order, or equivalently, the implications are indexed as Σ = {s1 , s2 , . . . , sn }. De- fine a mapping ρΣ : 2M → 2M associated with this ordering as follows. For any X ⊆ M , let X0 = X. If Xk is computed and implication sk+1 is A → B, then  Xk ∪ B, if A ⊆ Xk , Xk+1 = Xk , otherwise. Finally, ρΣ (X) = Xn . We will call ρΣ an ordered iteration of Σ. From an implicational system to its corresponding D-basis 221 The concept of the ordered iteration is central for the definition of the ordered direct basis. For any given set of implications Σ on set M , by φΣ we understand the closure operator on M defined by Σ. Equivalently, the fixed points of φΣ are exactly subsets X ⊆ M which are stable for all implications A → B in Σ: if A ⊆ X, then B ⊆ X. Definition 6. The set of implications with some linear ordering on it, hΣ, a , ...Yk... a , ...Yk... .... .... .... .... Implicational System Set of (non trivial) Closed sets Set of Covers Set of Minimal Covers and Minimal Generators Fig. 1. Stages of D-basis algorithm Thus, let a be an attribute and mg be the set of minimal generators such that its closure contains a. We write this association as a pair ha, mgi. Let Φ be a set of such pairs of attributes with their generators. We define the Function Add which builds the set of covers produced in Stage 2 as follows: Add(ha, mgi, Φ) = {ha, {g ∈ mg|a 6∈ g} ∪ {mga }i : ha, mga i ∈ Φ} Then, in stage 3, the algorithm picks up the set of minimal covers from the set obtained in stage 2 using the Function MinimalCovers. The method ends with the Function OrderedComp which applies Composition Rule at the same time that it orders the implications in the following sense: the first implications in the D-basis are the binary ones (those with the left-hand side being a singleton). Algorithm 1: D-basis input : An implicational system Σ on M output: The D-basis ΣD on M begin MinGen:=MinGen0 (M , Σ) C:= ∅ foreach hC, mg(C)i ∈MinGen do foreach a ∈ C do C:=Add(ha, mg(C)i, C) ΣD := ∅ foreach ha, mga i ∈ C do mga :=MinimalCovers(mga ) foreach g ∈ mga do ΣD := ΣD ∪ {g → a} ; OrderedComp(ΣD ) return ΣD Example 5. Algorithm 1 returns the following D-basis from the input implica- tional system of Example 3: ΣD = {a → d, bce → ad, ab → ce, ae → bc, bde → ac, cd → abe} From an implicational system to its corresponding D-basis 227 We emphasize that although ac is a minimal generator, it is not a minimal cover, thus an implication with ac in the left-hand side is redundant (deduced from inference axioms) and hence should not appear in the D-basis. A detailed illustrative example In the conclusion of this section we show the execution of the method, in all its stages, on a set of implications from [3], which was used later to illustrate the D-basis definition in [1]. Σ = {5 → 4, 23 → 4, 24 → 3, 34 → 2, 14 → 235, 25 → 134, 35 → 124, 15 → 24, 123 → 45} As a first step in the algorithm, MinGen0 renders the following set of pairs of closed sets and its non-trivial minimal generators, see Figure 2: {h12345, {123, 14, 15, 25, 35}i, h234, {23, 24, 34}i, h45, {5}i, h∅, ∅i} 5⟶4, 2 3⟶4, 2 4⟶3, 3 4⟶2, 1 4⟶2 3 5, 2 5⟶1 3 4, 3 5⟶1 2 4, 1 5⟶2 4, 1 2 3⟶4 5 ø⟶5 ø⟶2 4 ø⟶1 4 ø⟶3 5 ø⟶1 2 3 4 5 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1⟶2 3, 2⟶1 3, 3⟶1 2 1⟶5, 5⟶1 ø ø ø {1 4} {3 5} {1 2 3} {5} {2 4} ø⟶1 ø⟶5 ø⟶2 5 ø⟶1 5 1 5 1 5 1 2 3 4 5 1 2 3 4 5 ø⟶1 ø⟶2 ø⟶3 ø ø ø ø {1 2 4} {2 4 5} 1 2 3 1 2 3 {2 5} 1 2 3 {1 5} ø ø ø {1 5} {2 5} {3 5} ø⟶2 3 ø⟶3 4 2 3 4 2 3 4 1⟶5, 5⟶1 1⟶5, 5⟶1 {2 3} {3 4} ø⟶1 ø⟶5 ø⟶1 ø⟶5 1 5 1 5 1 5 1 5 ø ø ø ø {1 2 3} {2 3 5} {1 3 4} {3 4 5} Fig. 2. MinGen0 Execution Then, for each closed set and each of its elements, our algorithm renders the following set of pairs of elements and covers: {h1, {25, 35}i, h2, {14, 15, 35, 34}i, h3, {14, 15, 25, 24}i, h4, {123, 15, 25, 35, 5, 23}i, h5, {123, 14}i} For each element, the Function MinimalCovers picks up its minimal covers: {h1, {25, 35}i, h2, {14, 34}i, h3, {14, 24}i, h4, {5, 23}i, h5, {14, 123}i} Finally, at the last step, the algorithm turns these pairs into implications and applies ordered composition resulting in the D-basis. ΣD = {5 → 4, 23 → 4, 24 → 3, 34 → 2, 14 → 235, 25 → 1, 35 → 1, 123 → 5} 228 Estrella Rodríguez-Lorenzo et al. 5 Conclusion and future works In this work we have presented a way to obtain the D-basis from any implica- tional system. In [1] the algorithm was proposed to compute the D-basis from any direct basis, but the computation from any implicational system was left open. There exists also an efficient algorithm for the computation of the D-basis from the context using the method of finding the minimal transversals of the associated hypergraphs [2], but this assumes the different input for the closure system which is outside the scope of this paper. The Function MinimalCovers renders the D-basis within the framework of the closure systems without the need of any transformation. A key point of our work is the connection between covers and generators. Using minimal gener- ators, the D-basis is obtained by reducing the set of minimal generators and transforming it into a set of minimal covers. As future work, we propose to develop an algorithm which computes the D-basis with better integration of the minimal generator computation to render the minimal covers in a more direct way. In addition, we are planning to design an empirical study and to make a comparison between this algorithm and other techniques proposed in previous papers. Acknowledgment Supported by Grants TIN2011-28084 and TIN2014-59471-P of the Science and Innovation Ministry of Spain. References 1. K. Adaricheva and J. B. Nation and R. Rand, Ordered direct implicational basis of a finite closure system, Discrete Applied Mathematics, 161 (6): 707–723, 2013. 2. K. Adaricheva and J.B. Nation, Discovery of the D-basis in binary tables based on hypergraph dualization, http://arxiv.org/abs/1504.02875, 2015. 3. K. Bertet, B. Monjardet, The multiple facets of the canonical direct unit implica- tional basis, Theor. Comput. Sci., 411(22-24): 2155–2166, 2010. 4. P. Cordero, A Mora, M. Enciso, I.Pérez de Guzmán, SLFD Logic: Elimination of Data Redundancy in Knowledge Representation, LNCS, 2527: 141–150, 2002. 5. P. Cordero, M. Enciso, A Mora, M. Ojeda-Aciego, Computing Minimal Generators from Implications: a Logic-guided Approach, CLA 2012: 187–198, 2012. 6. V. Duquenne, Some variations on Alan Day’s Algorithm for Calculating Canonical Basis of Implications, CLA 2007: 192–207, 2007. 7. B. Ganter, Two basic algorithms in concept analysis, Technische Hochschule, Darmstadt, 1984. 8. A. Mora, M. Enciso, P. Cordero, I. Fortes, Closure via functional dependence sim- plification, International Journal of Computer Mathematics, 89(4): 510–526, 2012. 9. S. Rudolph, Some Notes on Managing Closure Operators, LNCS, 7278: 278–291, 2012. 10. M. Wild, The joy of implications, aka pure Horn functions: mainly a survey, http: //arxiv.org/abs/1411.6432, 2014.