-

Characterizing Covers of Functional Dependencies using FCA

Victor Codocedo

victor.codocedo@inf.utfsm.cl 3 5

Jaume Baixeries

4 5

Mehdi Kaytoue

1 5

Amedeo Napoli

2 5 0 290, Department of Computer Science, Palacky University Olomouc , 2018. Copying permitted only for private and academic purposes 1 Infologic , 99 Avenue de Lyon, F-26500, Bourg-l`es-Valence , France 2 LORIA (CNRS - Inria Nancy Grand Est - Universit ́e de Lorraine) , B.P. 239, F-54506, Vandoeuvre-l`es-Nancy 3 Universidad T ́ecnica Federico Santa Mar ́ıa. Campus San Joaqu ́ın , Santiago de Chile 4 Universitat Polit`ecnica de Catalunya.

08032, Barcelona. Catalonia

5 c paper author(s), 2018. Proceedings volume published and copyrighted by its editors. Paper published in Dmitry I. Ignatov , Lhouari Nourine (Eds.): CLA 2018, pp. 279

279 290

Functional dependencies (FDs) can be used for various important operations on data, for instance, checking the consistency and the quality of a database (including databases that contain complex data). Consequently, a generic framework that allows mining a sound, complete, non-redundant and yet compact set of FDs is an important tool for many different applications. There are different definitions of such sets of FDs (usually called covers). In this paper, we present the characterization of two different kinds of covers for FDs in terms of pattern structures. The convenience of such a characterization is that it allows for an easy implementation of efficient mining algorithms which can later be easily adapted to other kinds of similar dependencies. Finally, we present empirical evidence that the proposed approach can perform better than a state-of-the-art FD miner in large databases.

Functional Dependencies (FDs) are a keystone of the relational database model, since they allow checking the consistency, maintaining the quality of a database [ 8, 10, 9 ], and guiding its design [ 20 ]. In addition, they have been used to study information integration in the Web of data with varying degrees of quality [ 24, 25 ], or to check the data completeness in DBpedia [ 1 ]. Therefore, the computation of a succinct representation of a set of FDs (usually refered to as a cover), is of interest to various fields of knowledge discovery and representation, specially, if this computation can easily be extended to other kinds of dependencies (e.g. relaxed versions of FDs [ 5 ]).

The computation of FD covers is a popular topic in the database literature. As a reference, in [ 21 ], seven algorithms to mine FDs and compute their covers are reviewed and grouped into three families: lattice transversal algorithms, difference/agree sets algorithms, and dependency induction algorithms. The characterization of FDs with FCA and pattern structures is also presented in [ 2 ] and a generalization to other types of FDs is given in [ 6 ]. For a detailed review on the characterization of FDs and FCA, see [ 3 ].

In this paper, we characterize the representations of FD covers using pattern structures, an extension of FCA dealing with complex object representations [ 18 ].

On the one hand, we adapt the definition of a canonical direct basis of implications with proper premises [ 4, 22 ] to the formalism of pattern structures, in order to prove that this basis is equivalent to a reduced non-redundant set of FDs, better known as the canonical cover (Section 3.1). We show that the canonical cover can be characterized using the arrow relation (Ö) between an attribute and a pattern defined over a partition pattern structures (PPS). On the other hand, we discuss on the relation between the Stem Base of implications [ 12 ] with the Minimal Cover of dependencies, a sound, complete, non-redundant set of FDs that has minimum cardinality w.r.t. any other equivalent cover (Section 3.2). We show that the latter can be characterized by pseudo-extents of a PPS.

Finally, in Section 4 we empirically compare these two ways of computing FD covers with the algorithm TANE [ 16 ]. This algorithm is one of the most efficient FD mining algorithms and according to [ 21 ], it is the base for the family of “lattice transversal algorithms” serving as the baseline to validate our approach with a state-of-the-art FD miner. 2

Theoretical Background

Let U be an ordered set of attributes, and let Dom be a set of values (a domain). For the sake of simplicity, we assume that Dom is a numerical set. A tuple t is a function t : U Ñ Dom, and a table T is a set of tuples. Usually a table is represented as a matrix, as in Table 1, where the set of tuples (or objects) is T “ t t1, t2, . . . , t7 u with attributes U “ t a, b, c, d, e u. We use table, dataset, set of tuples as equivalent terms. We overload the functional notation of a tuple in such a way that, given a tuple t P T , we say that tpX Ď U ) is a tuple with the values of t in the ordered attributes xi P X defined as tpXq “ xtpx1q, tpx2q, . . . , tpxnqy. For example, we have that t2pt a, c uq “ xt2paq, t2pcqy “ x2, 1y. In this article, the set notation is dropped: instead of t a, b u we use ab. 2.1

Functional Dependencies and their Covers Definition 1 ([23]). Let T be a set of tuples, and X, Y Ď U . A functional

dependency (FD) X Ñ Y holds in T if:

For instance, the functional dependency d Ñ e holds in T (Table 1), whereas the functional dependency e Ñ d does not hold since t4peq “ t5peq but t4pdq ‰ t5pdq.

For a given set of functional dependencies F , we can use the three Armstrong’s axioms (reflexivity, augmentation and transitivity) to derive a larger set of FDs [ 20 ]. We will call F ˚ the set of FDs derived from F by reflexivity and augmentation, and F ` the set of FDs derived by reflexivity, augmentation and transitivity. Two sets of FDs F and H are said to be equivalent F ” H ðñ F ` “ H`.

Let F be a set of FDs from a database T , F is said to be sound if all FDs in F hold in T . In addition, F is said to be complete if all FDs that hold in T can be derived from F . Let X Ñ Y be any FD in F , then F is said to be non-redundant if F ztX Ñ Y u ı F , and non-redundant w.r.t. augmentation iff pF ztX Ñ Y uq˚ ‰ F ˚

A set F is said to be left-reduced if for all X Ñ Y P F and Z Ĺ X we have that pF ztX Ñ Y uq Y tZ Ñ Y u ı F . Dually, it is said to be right-reduced if for all X Ñ Y P F and Z Ĺ Y we have that pF ztX Ñ Y uq Y tX Ñ Zu ı F . F is said to be reduced if it is simultaneously left and right-reduced.

Let F be a reduced set of FDs, then G “ tX Ñ y | X Ñ Y P F, y P Y u is the splitting of F and G ” F . Let F be a reduced set, its splitting is called a canonical cover. A canonical cover is a left-reduced, non-redundant w.r.t. augmentation set of FDs with a single element in their right hand side (a split set) [ 19 ]. A different definition presents the canonical cover in a saturated version requiring uniqueness in their left hand side [ 17 ] losing the single element condition in the right hand side. For the sake of compatibility with FCA implication covers we will favor the first definition. Notice that canonical covers can be redundant w.r.t. transitivity. For example the canonical cover of Table 1 contains tc Ñ b, c Ñ e, d Ñ e, bd Ñ c, be Ñ cu where bd Ñ c would be redundant w.r.t. transitivity as bd Ñ bde Ñ c.

Finally, a set F is said to be a minimum cover if it has as few FDs as any other equivalent set of FDs. For example, the minimum cover of Table 1 contains FDs tc Ñ be, d Ñ e, be Ñ cu. Notice that the minimum cover is not restricted to be reduced, so it is not presented with split sets. Secondly, the minimum cover contains exactly one fewer FD than the canonical cover, namely bd Ñ c. 2.2

Formal Concept Analysis, Implication Systems and FDs

For the sake of brevity we do not provide a description of the Formal Concept Analysis (FCA) framework. The notation used in this article follows [ 13 ] where K “ pG, M, Iq is a formal context of objects G, attributes M and incidence relation I, with formal concepts pA, Bq where A1 “ B and B1 “ A.

Implications are relations established between attribute sets from a formal context K. Implications are analogous to FDs and they can be used to characterize them [ 2 ]. Because of this, we will notate an implication similarly to an FD. Implication systems (sets of implications) can also characterize FD covers [ 4 ].

An implication X Ñ Y holds in K for X, Y Ď M if Y Ď X2. Let T be a set of tuples and U , a set of attributes in a table (such as the one in Table 1). We define the set P airpT q “ `T2 ˘ set of all subsets T with exactly two elements, and the incidence set I such that ppti, tj q, xq P I ðñ tirxs “ tj rxs, @x P U , @ti, tj P T . It can be shown that an FD X Ñ Y holds in the database if and only if X Ñ Y is an implication of the formal context K “ pP airpT q, U , Iq [ 13 ]. K is called the binary codification of table T . For example, Table 3 contains the binary codification of Table 1. In Table 3 we can observe the implication d Ñ e which can be verified as an FD in Table 1.

The previous statement entails that FDs and implications in K are in 1-1 correspondence. Moreover, the corresponding definition of a canonical cover of FDs is equivalent to that of a canonical-direct unitary basis of implications as shown in [ 4 ] where the equivalent left-minimal basis is described as containing minimal functional dependencies, i.e. those in a canonical cover. δpaq δpbq ˆ δpeq ˆ δpaq [ δpbq ˆ ˆ Ö Ö δpaq [ δpeq ˆ ˆ δpdq [ δpeq ˆ ˆ δpaq [ δpdq [ δpeq ˆ Ö Ö ˆ ˆ δpbq [ δpcq [ δpeq ˆ ˆ ˆ δpaq [ δpbq [ δpcq [ δpeq ˆ ˆ ˆ Ö ˆ δpbq [ δpcq [ δpdq [ δpeq Ö ˆ ˆ ˆ ˆ δpaq [ δpbq [ δpcq [ δpdq [ δpeq ˆ ˆ ˆ ˆ ˆ A Partition Pattern Structure (PPS) is a type of pattern structure [ 14 ] that deals with, as the name suggests, object representations in the form of set partitions. PPS have shown to be useful for mining biclusters [ 7 ] and, more importantly, relations between partition pattern concepts have been used to characterize FDs of different kinds [ 3 ].

The formalization of a database T with attributes U as a PPS is as follows. A partition d of T is a set d Ď ℘pT q (powerset of T ) of disjoint non-empty subsets of T such that for any two different elements Ki, Kj P d we have that Ki X Kj “ H and ŤKPd K “ T . Let D be the set of all possible partitions of T , then any two partitions d1, d2 P D can be ordered by a coarser/finer relation denoted d1 Ď d2 (d1 is finer than d2) iff p@ Ki P d1qpD Kj P d2q such that Ki Ď Kj . The similarity operator is defined as d1 [ d2 “ tKi X Kj | Ki P d1, Kj P d2u. From this, it follows that pD, Ďq is a complete lattice with supremum and infimum defined respectively as J “ ttT uu and K “ tttu | t P T u.

The set of attributes U is mapped onto D through a function δ which for a given attribute x P U yields a partition δpxq P D representing the equivalence relations over T for values of attribute x. With this, we can configure the PPS pU , pD, Ďq, δq with derivation operators X˝ “ d δpxq denoting the equivalence xPX relations implied by attributes in X, and d˝ “ tx P U | δpxq Ď du denoting all attributes with associated equivalence relations finer than d. pX, dq is a partition pattern concept when X˝ “ d and d˝ “ X.

Theorem 1 (Proposition 2 in [3]). Let pP airpT q, U , Iq be the binary codifi

cation of a table within a database, and pU , pD, Ďq, δq its PPS representation: pW, Xq P pP airpT q, U , Iq ðñ pX, dq P pU , pD, Ďq, δq

The proof of Theorem 1 can be found in [ 3 ]. Theorem 1 presents an important property of the PPS that states that X Ď U is an extent in pU , pD, Ďq, δq if and only if it is also an intent in pP airpT q, U , Iq (the relation between the set of tuple pairs W Ď P airpT q and the partition d P D can be formalized as well but it is of no interest to our development). Theorem 1 is very important since it entails that the lattices derived from pP airpT q, U , Iq and pU , pD, Ďq, δq are isomorphic. Consequently, implication X Ñ Y in pP airpT q, U , Iq can be found as a relation between extents in pU , pD, Ďq, δq (extent implication) such that Y Ď X˝˝. 3

Covers and Pattern Structures

In this section we present two different kinds of covers for FDs: canonical covers (Section 3.1) and minimal covers (Section 3.2), as well as their characterization in terms of Pattern Structures. This section uses existing well-known results in FCA, which are reviewed here for the sake of readability.

Section 3.1 presents how a canonical cover for FDs can be computed using PPS, according to the results in [ 4, 22 ]. Section 3.2 introduces a novel characterization of the minimum cover of FDs by means of pseudo-extents of PPS [ 13 ]. The interest of these results is not limited to computing FD covers, but also for generalizations of FDs [ 3, 6 ]. 3.1

Characterizing a Canonical Cover of FDs

The characterization of a canonical cover of FDs using FCA is straightforward. In a nutshell, a canonical cover of FDs is analogous to a canonical direct unitary basis of implications [ 4 ] as presented in Section 2.2. In this section we recall some of these ideas and show how they can be simply adapted to the framework of PPS.

Firstly, let pU , pD, Ďq, δq be a PPS, we define D “ td P D | d˝˝ “ du as the set of all closed partition patterns in D. The formal context pU , D, J q with pd, xq P J ðñ d Ď δpxq is called a representation context of pU , pD, Ďq, δq and their corresponding concept lattices are isomorphic [ 18 ]. For the sake of readability of the following definitions, we define the representation context as pD, U , J q instead of pU , D, J q (Table 2). By transitivity of equivalence, it is clear that pD, U , J q is isomorphic to pP airpT q, U , Iq as defined in Section 2.3 and as such, implications in pD, U , J q correspond to FDs. For example, Table 2 (not considering elements Ö) contains the representation context pD, U , J q of the PPS derived from Table 1. Notice that objects are closed intersections of object representations, e.g. δpdq does not appear since δpdq˝˝ “ δpdq [ δpeq. With this, the canonical direct basis of implications in pD, U , J q (and thus, canonical cover of FDs) is determined by the set of proper premises of elements in U .

Theorem 2 (Corollary 1 in [22]). X Ď U ztyu is a premise of y P U iff X is

a hypergraph transversal of HyÖ defined as :

HyÖ “ tppU ztyuqqzd1 | d P D, d Ö yu The set of all proper premises of y is the minimum hypergraph transversal T rpHyÖq.

A detailed description on the development of Theorem 2 can be found in [ 22 ]. Providing a formal definition of hypergraph transversals is out of the scope of this article, however we briefly mention that this formalization can also be made considering set collections (instead of hypergraphs) and minimum hitting sets (instead of minimum hypergraph transversals) [ 11 ]. This problem is also analogous to the vertex cover problem [ 12 ].

Theorem 2 provides a formal description for the proper premises of a given attribute y P U that in turn yields the canonical cover of functional dependencies. However this approach requires the creation of the representation context which is a middle step in the overall calculation process. Actually, by analyzing the arrow relation between d and y we can observe that the representation context is not necessary. Consider that in pD, U , J q, d1 “ tx P U | pd, xq P J p ðñ d Ď δpxqqu and thus d1 is equivalent to d˝ for any d P D. Moreover, in pU , pD, Ďq, δq, d˝ Ĺ h˝ ðñ h Ĺ d since h “ h˝˝ and d “ d˝˝. With this, we can rewrite the arrow definition as follows.

d Ö y ðñ pd, yq R J and if d1 Ĺ h1 then ph, yq P J ðñ d Ę δpyq and if d˝ Ĺ h˝ then h Ď δpyq ðñ d Ę δpyq and if h Ĺ d then h Ď δpyq The last result shows that d Ö y in pD, U , J q can be defined directly over the PPS. Intuitively, this definition corresponds to y Õ d in pU , D, J q and thus, in pU , pD, Ďq, δq. With these elements we can finally propose a characterization for the canonical cover of functional dependencies in pU , pD, Ďq, δq as follows. Corollary 1. Let pU , pD, Ďq, δq be a partition pattern structure and T rpHq denote the hypergraph transversal of H, then with

Lcc “ tX Ñ y | y P U , X P T rpHyÕqu

HyÕ “ tppU ztyuqqzd1 | d P D, y Õ du y Õ d ðñ d Ę δpyq and if h Ĺ d then h Ď δpyq Lcc is a canonical cover of functional dependencies.

For the running example, let us calculate the proper premises of attribute c using the arrow relations in Table 3. We have c Õ pδpaq [ δpbqq and c Õ pδpaq [ δpdq [ δpeqq. With this, we have the hypergraph HcÕ “ ttd, eu, tbuu for which the minimum transversal hypergraph is T rpHcÕq “ ttb, du, tb, euu. Correspondingly, we have the FDs bd Ñ c and be Ñ c which are included in the canonical cover. 3.2

Characterizing a Minimal Cover of FDs

We introduce a novel characterization of a minimal cover of FDs by means of pseudo-intents, and its generalization using pseudo-extents of a PPS.

The stem base of implications, or Duquenne-Guigues basis [ 15 ], is a sound, complete and non-redundant basis which also has minimum cardinality among the sets of implications for a given formal context. We show how this can be used to characterize a minimal cover of FDs in a rather simple manner. Prior to introducing the stem base, let us define pseudo-closed sets [ 13 ].

Definition 2. (Pseudo-closed sets) Let P ÞÑ P 2 be a closure operator over a set M, then P is a pseudo-closed set if and only if:

P ‰ P 2 Q Ĺ P and Q is a pseudo-closed set ùñ Q2 Ď P

Given a formal context pG, M, Iq, pseudo-closed sets A Ď G are called pseudo-extents, while pseudo-closed sets B Ď M are called pseudo-intents. A stem base of implications, or Duquenne-Guigues basis, can be defined as follows: Theorem 3 ([ 13 ]). (Duquenne-Guigues Basis) The set of implications:

L “ tP Ñ P 2 | P is a pseudo-intentu is sound, complete and non-redundant.

Theorem 4 ([ 13 ]). Every complete set of implications contains an implication X Ñ Y with X2 “ P 2 for every pseudo-intent P of pG, M, Iq

Theorem 4 entails that the stem base of implications has minimum cardinality with respect to any equivalent set of implications of pG, M, Iq. With this and the previous observation that FDs are in 1-1 correspondence with implications in pP airpT q, U , Iq, we can derive the following corollary.

Corollary 2. Let pP airpT q, U , Iq be the binary codification of a table with tuples T and attributes U , the set of FDs L “ tP Ñ P 2 | P is a pseudo-intentu is a minimal cover.

Corollary 2 provides a novel characterization of the minimal cover of FDs through pseudo-intents of a formal context. Given the existing relation between pP airpT q, U , Iq and pU , pD, Ďq, δq, we can generalize the characterization of the minimal cover over the latter. Observe that in the PPS pU , pD, Ďq, δq, we maintain the notion of pseudo-extents for a pseudo-closed set X Ď U with X ÞÑ X˝˝. (1) (2) (3) Proposition 1. Let pU , pD, Ďq, δq be the PPS representation of a database, then the set of functional dependencies Lmc defined below is a minimal cover.

Lmc “ tX Ñ X˝˝ | X is a pseudo-extentu (4)

The proof of Proposition 1 follows from Theorem 1 and the fact that for a set X P U the closure operator X ÞÑ X2 is exactly equivalent to X ÞÑ X˝˝ and consequently, the set of pseudo-intents in pP airpT q, U , Iq is the same as the set of pseudo-extents in pU , pD, Ďq, δq. Thus, because of Corollary 2, Proposition 1 holds.

Experimental Evaluation

In this section we present a brief experimental comparison of both introduced approaches versus TANE [ 16 ], a state-of-the-art FD miner. TANE is a highly optimized apriori -based algorithm that generates a canonical cover of FDs. The goal of this evaluation is to study the comparative benefits of using FCA versus a traditional approach such as TANE. TANE was re-implemented for our experiments5. For the sake of repeatibility, the code used to run the experiments was made available at GitHub. Both the minimal cover miner (MCM) 6, and the canonical cover miner (CCM)7 were implemented using Python’s pipy FCA library fca8.

Experiments were performed over 12 datasets extracted from the UCI Machine Learning repository9, the JASA Data Archive10 and Metanome’s repeatability Web page11. Details on the number of rows and columns for each dataset are provided in the first two columns of Table 4. In addition to these datasets, we created synthetic versions by multiplying the rows or the columns of a given dataset. Experiments were performed on a virtual machine with 4 cores running at 2.7 Ghz and equipped with 32 GB of RAM memory. 4.1

Results & Discussion

5 https://github.com/codocedo/tane/tree/cla18 6 https://github.com/codocedo/fd_miner/tree/cla18 7 https://github.com/codocedo/uis_miner/tree/cla18 8 https://pypi.org/project/fca/ 9 https://archive.ics.uci.edu/ml/index.php 10 http://lib.stat.cmu.edu/jasadata/ 11 https://hpi.de/naumann/projects/repeatability/data-profiling/fds.html the lower-left region of the figure, representing small datasets. FCA-based approaches perform better in datasets in all other regions, including the upper-right which contains datasets with many rows and columns.

Synthetic datasets in Table 4 show evidence that FCA scales better when duplicating the dataset. When duplicating attributes the difference is particularly dramatic since TANE is over 13 times slower while our approach, only 3. To study this further, we created two sets of synthetic datasets. The first set (vertical set) was created by incrementally multiplying vertically the Diagnostics datasets (with 8 attributes and 120 tuples) generating 19 versions of 240, 360, 480 tuples, up to a dataset containing 2400 tuples. The second set (horizontal set) was created using the same idea but in a horizontal manner generating 19 versions of 16, 24, 32, up to 160 attributes. Since most of the datasets of the second set were too big for TANE, they were trunked to 40 tuples.

Figure 2 depicts the increasing time for CCM, MCM and TANE on the vertical set, i.e. when increasing the number of tuples. We can observe that all three approaches scale linearly w.r.t. the number of tuples, even when CCM and MCM seem to have a more stable behavior. Vertical multiplication of datasets yield the same number of FDs than the original set, since the relation between attributes remains unchanged. Thus, we can assume that other algorithms based on TANE could achieve a similar performance to CCM or MCM provided some optimizations.

On the other hand, this do not seem to be the case for the horizontal set. Figure 3 shows that CCM and MCM remain very stable when varying the number of attributes, while TANE’s execution time grows exponentially. Indeed, this great difference in performance is due to the way in which we use FCA to find FDs which differs from TANE’s strategy. Using FCA we calculate closures which quickly group attribute copies avoiding unnecessary intersections. Instead, TANE computes each attribute combination rendering the exponential growth in the computation time. We stress that this is not simply an extreme case from which our approach takes advantage, but actually a very good illustration of the benefits of using a closure operator to navigate the space of FDs. Closures enable CCM and MCM to avoid unnecessary computations not only when we have redundant attributes, but also whenever possible in the lattice of the powerset of attributes. Finally, we do not discuss on the differences between CCM and MCM strategies as these are detailed in [ 22 ]. 5

Conclusions

We have presented a new characterization of a minimal cover of functional dependencies (FDs) by means of the stem base (or Duquenne-Guigues basis) of a partition pattern structure. We have presented the mechanisms through which this characterization can be exploited to efficiently mine the minimal cover. Furthermore, we have described the algorithms that implement these mechanisms and show empirical evidence that our characterization performs better than a state-of-the-art FD miner, namely TANE, in larger databases containing many rows and columns.

Acknowledgments. This research work has been supported by the SGR2014-890 (MACDA) project of the Generalitat de Catalunya, and MINECO project APCOM (TIN2014-57226-P).

1. Alam , M. , Buzmakov , A. , Codocedo , V. , Napoli , A. : Mining definitions from RDF annotations using formal concept analysis . In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015 ,

Buenos

Aires , Argentina, July 25-31 , 2015 . pp. 823 - 829 . AAAI Press ( 2015 )

2. Baixeries , J. , Kaytoue , M. , Napoli , A. : Computing Functional Dependencies with Pattern Structures . In: Concept Lattices and their Applications . pp. 175 - 186 ( 2012 )

3. Baixeries , J. , Kaytoue , M. , Napoli , A. : Characterizing Functional Dependencies in Formal Concept Analysis with Pattern Structures . Annals of Mathematics and Artificial Intelligence 72 ( 1-2 ), 129 - 149 ( Oct 2014 )

4. Bertet , K. , Monjardet , B. : The multiple facets of the canonical direct unit implicational basis . Theoretical Computer Science 411 ( 22 - 24 ), 2155 - 2166 ( 2010 )

5. Caruccio , L. , Deufemia , V. , Polese , G.: Relaxed Functional Dependencies - A Survey of Approaches . IEEE Transactions on Knowledge and Data Engineering 28 ( 1 ), 147 - 165 ( 2016 )

6. Codocedo , V. , Baixeries , J. , Kaytoue , M. , Napoli , A. : Characterization of Orderlike Dependencies with Formal Concept Analysis . In: Concept Lattices and Their Applications . vol. 1624 , pp. 123 - 134 ( 2016 )

7. Codocedo , V. , Napoli , A. : Lattice-based biclustering using Partition Pattern Structures . In: Proceedings of ECAI 2014. Frontiers in Artificial Intelligence and Applications , vol. 263 , pp. 213 - 218 . IOS Press ( 2014 )

8. Fan , W.: Dependencies revisited for improving data quality . In: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008 , June 9-11, 2008 , Vancouver, BC, Canada. pp. 159 - 170 ( 2008 )

9. Fan , W. : Data quality: From theory to practice . SIGMOD Record 44 ( 3 ), 7 - 18 ( 2015 )

10. Fan , W. , Geerts , F. : Foundations of Data Quality Management . Synthesis Lectures on Data Management , Morgan & Claypool Publishers ( 2012 )

11. Gainer-Dewar , A. , Vera-Licona , P.: The minimal hitting set generation problem: Algorithms and computation . SIAM Journal on Discrete Mathematics 31 ( 1 ), 63 - 100 ( 2017 )

12. Ganter , B. , Obiedkov , S. : Conceptual Exploration. Springer, Berlin ( 2016 )

13. Ganter , B. , Wille , R.: Formal Concept Analysis . Springer, Berlin ( 1999 )

14. Ganter , B. , Kuznetsov , S.O. : Pattern Structures and their projections . In: Delugach, H.S. , Stumme , G . (eds.) Conceptual Structures: Broadening the Base . LNCS , vol. 2120 , pp. 129 - 142 . Springer ( 2001 )

15. Guigues , J.L. , Duquenne , V. : Familles minimales d'implications informatives r´esultant d'un tableau de donn´ees binaires . Math´ematiques et Sciences Humaines 95 , 5 - 18 ( 1986 )

16. Huhtala , Y. , Ka¨rkka¨inen, J., Porkka , P. , Toivonen , H.: TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies . Computer Journal 42 ( 2 ), 100 - 111 ( 1999 )

17. Korth , H.F. , Silberschatz , A. : Database System Concepts . McGraw-Hill , Inc., New York, NY, USA ( 1986 )

18. Kuznetsov , S.O. : Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research . In: Formal Concept Analysis, Foundations and Applications . pp. 196 - 225 . Lecture Notes in Computer Science 3626, Springer ( 2005 )

19. Maier , D. : Theory of Relational Databases . Computer Science Press ( 1983 )

20. Mannila , H. , Ra¨ih¨a, K .J.: The Design of Relational Databases . Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA ( 1992 )

21. Papenbrock , T. , Ehrlich , J. , Marten , J. , Neubert , T. , Rudolph , J.P. , Scho¨nberg, M. , Zwiener , J. , Naumann , F. : Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms . Proceedings of VLDB Endowment 8 ( 10 ), 1082 - 1093 ( 2015 )

22. Ryssel , U. , Distel , F. , Borchmann , D. : Fast algorithms for implication bases and attribute exploration using proper premises . Annals of Mathematics and Artificial Intelligence 70 ( 1 ), 25 - 53 ( 2014 )

23. Ullman , J.: Principles of Database Systems and Knowledge-Based Systems, volumes 1-2 . Computer Science Press, Rockville (MD), USA ( 1989 )

24. Yu , Y. , Heflin , J.: Extending functional dependency to detect abnormal data in RDF graphs . In: The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference , Bonn, Germany, October 23-27 , 2011 , Proceedings, Part I. pp. 794 - 809 ( 2011 )

25. Yu , Y. , Li , Y. , Heflin , J.: Detecting abnormal semantic web data using semantic dependency . In: Proceedings of the 5th IEEE International Conference on Semantic Computing (ICSC 2011 ), Palo Alto, CA, USA, September 18 - 21 , 2011 . pp. 154 - 157 . IEEE Computer Society ( 2011 )