-

Distributed Closed Pattern Mining in Multi-Relational Data based on Iceberg Query Lattices: Some Preliminary Results

Hirohisa Seki⋆

seki@nitech.ac.jp 0

Sho-ich Tanimoto

0 0 Dept. of Computer Science, Nagoya Inst. of Technology , Showa-ku, Nagoya 466-8555 , Japan

115 126

We study the problem of mining frequent closed patterns in multi-relational databases in a distributed environment. In multirelational data mining (MRDM), relational patterns involve multiple relations from a relational database, and they are typically represented in datalog language (a class of first order logic). Our approach is based on the notion of iceberg query lattices, a formulation of MRDM in terms of formal concept analysis (FCA), and we apply it to a distributed mining setting. We assume that a database considered contains a special predicate called key , which determines the entities of interest and what is to be counted, and that each datalog query contains an atom key, where variables in a query are linked to a given target object corresponding to the key. We show that the iceberg query lattice in this case can be defined similarly in the literature. Next, given two local databases (horizontal partitions) and their sets of closed patterns (concepts), we show that the subposition operator, which constructs a global Galois (concept) lattice from the direct product of two lattices studied in the literature, can be utilized to generate the set of closed patterns in the global database. The correctness of our algorithm is shown, and some preliminary experimental results using a MapReduce framework are also given.

Multi-relational data mining (MRDM) has been extensively studied for more than a decade (e.g., [ 5, 6 ] and references therein). The research topics discussed in the conventional data mining have been considered in this more expressive framework of MRDM, where data and patterns (or queries) are represented in the form of logical formulae such as Datalog (a class of first order logic). In contrast to the traditional data mining dealing with rather simple patterns such as itemsets, the expressive formalism of MRDM allows us to use more complex and structured data in a uniform way, including trees and graphs in particular, and multi-relational patterns in general. c 2012 by the paper authors. CLA 2012, pp. 115–126. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–84–695–5252–0,

Universidad de M´alaga (Dept. Matem´atica Aplicada), Spain. 116 Hirohisa Seki and Sho-ich Tanimoto

On the other hand, Formal Concept Analysis (FCA) has been developed as a field of applied mathematics based on a clear mathematization of the notions of concept and conceptual hierarchy [ 7 ]. It has attracted much interest from various application areas including, among others, data mining, knowledge acquisition and software engineering (e.g., [ 8 ]).

Stumme [ 20 ] has proposed the notion of iceberg query lattices, which combines the notions of the above two fields, i.e., MRDM and FCA; Iceberg query lattices combine the notion of frequent datalog queries in MRDM with iceberg concept lattices (or frequent closed itemsets) in FCA. Then, it has been shown that we can apply the “full arsenal” of FCA-based methods to frequent queries, thereby allowing us to mine and visualize relational association rules. Condensed representations such as closed patterns and free patterns in MRDM have been also studied in c-armr [ 4 ], and in RelLCM2 [ 9 ].

In this paper, we study the problem of mining closed queries in multirelational data based on these precursors. We apply the notion of iceberg query lattices to a distributed mining setting. The assumption that a given dataset is distributed and stored in different sites is reasonable, because we will not be able to move local datasets into a centralized site due to too much data size and/or privacy concerns. We also assume that a database considered contains a special predicate called key (e.g., [ 3, 4 ]), and that each datalog query is supposed to contain an atom key, where variables in a query are linked to a given target object corresponding to the key. Using an key atom, we can define the notion of the frequency of a datalog query, since the key atom determines the entities of interest and what is to be counted. We show that the iceberg query lattice in this case can be defined similarly in the literature. Next, given two local databases (horizontal partitions) and their sets of closed queries (concepts), we show that the we can construct the set of closed queries in the global database, by using subposition operator [ 7, 23 ], which constructs a global Galois (concept) lattice from the direct product of two lattices. We also present some preliminary experimental results using a distributed framework of MapReduce [ 2 ].

The organization of the rest of this paper is as follows. After summarizing some basic notations and definitions in FCA in Sect. 2, we reconsider the notion of iceberg query lattices with key in Sect. 3. We then explain our approach to distributed closed query mining in MRDB in Sect. 4. In Section 5, we show the effectiveness of our method by some preliminary experimental results. Finally, we give a summary of this work in Section 6. 2

Preliminaries: Formal Concept Analysis

We assume that the reader is familiar with the basic notions of Formal Concept Analysis (FCA), which are found in [ 7 ]. However, we recall some of the important definitions and notations.

Definition 1. A (formal) context K = (O, A, I) consists of a set O of objects, a set A of attributes, and a binary relation I ⊆ O × A.

The mapping f : P(O) → P(A) is given by f (X) = {a ∈ A | ∀o ∈ X : (o, a) ∈ I}. The mapping g : P(A) → P(O) is given by g(Y ) = {o ∈ O | ∀a ∈ Y : (o, a) ∈ I}.

If it is clear from the context whether f or g is meant, then we abbreviate both f (·) and g(·) just by ′. In particular, Y ′′ stands for f (g(Y )).

A (formal) concept is a pair (X, Y ) with X ⊆ O, Y ⊆ A, X′ = Y, and Y ′ = X. X is called extent , and Y is called intent of the concept. The set CK of all concepts of K together with the partial order (X1, Y1) ≤ (X2, Y2) ↔ X1 ⊆ X2 (which is equivalent to Y1 ⊇ Y2) is called the concept lattice of K. ✷

In FCA [ 7 ], a set of context-oriented operators has been studied, including apposition/subposition operators, and they are extensively studied by Valtchev and Missaoui [ 23, 24 ]. The following definitions and lemma are due to [ 23 ]. Definition 2. Let K1 = (O1, A, I1) and K2 = (O2, A, I2) be two contexts with the same set of attributes A. Then the context K = (O1∪· O2, A, I1 ∪·I2) is called the subposition of K1 and K2, denoted by K = KK21 .

Usually, the extent of K is set to the disjoint union (denoted by ∪· ) of the involved context extents, and this constraint is suitable for our current study.

Let Ki (i = 1, 2) be a context, and Li the corresponding lattice. The direct product of a pair of lattices L1 and L2, denoted by L× = L1 × L2, is itself a lattice L× = hCK× , ≤×i, where CK× = CK1 × CK2 , and (c1, c2) ≤× (c1, c2) ⇔ c1 ≤L1 c1 and c2 ≤L2 c2.

Any concept of L can be projected upon the concept lattice, L1 (L2) by restricting its extent to the set of “visible” objects, e.g., those in O1 (O2), respectively. The resulting mapping constitutes an order homomorphism between L and the direct product [ 7 ].

Definition 3. The function ϕ : CK → CK× maps a concept from the global lattice into a pair of concepts of the partial lattices by splitting its extent over the partial context object sets O1 and O2:

ϕ((X, Y )) = ((X ∩ O1, (X ∩ O1)′), (X ∩ O2, (X ∩ O2)′)).

From the above definitions, we have the following property [ 23 ]: Lemma 1. [ 23 ] For any global concept c = (X, Y ) and its image ϕ(c) = ((X1, Y1), (X2, Y2)), it holds that X = X1 ∪ X2 and Y = Y1 ∩ Y2. ✷ Example 1. Consider a context K in Fig. 1 (upper left). The concept lattice CK derived from K is shown in the right. Let K1, K2 (lower right of the figure) be a horizontal decomposition of K, where O1 = {1, 2} and O2 = {3, 4}. Then, K is the subposition of K1 and K2, i.e., K = KK21 . The concept lattices CK1 (CK2 ) derived from K1 (K2) are shown in the right, respectively.

Consider a global concept c = (123, d) in CK. Then, ϕ(c) = ((12, bd), (3, acd)), and we have from Lemma 1 that {123} = {12} ∪ {3} and {d} = {bd} ∩ {acd}. ✷

FCA provides a framework for frequent itemset mining (FIM), where the intent of a concept corresponds to a closed itemset. The subposition operator will be readily used for mining frequent closed itemsets (FCIs) in a global transaction database D from the local FCIs from two disjoint (horizontal) partitions D1 and D2, provided that we mine all the partitions with an (absolute) support being set to 1, i.e. when we consider as frequent any itemset which occur at least once in D. In fact, Lucchese et al. [ 15 ] show the following property: Theorem 1 (Lucchese et al. [ 15 ]). Let D be transaction database, and D1, D2 two disjoint (horizontal) partitions of D. Let C be the set of FCIs of D, and C1 (C2) the set of local FCIs of D1 (D2), respectively. Then, C is computed from C1 and C2 as C = (C1 ∪ C2) ∪ {C1 ∩ C2 | (C1, C2) ∈ (C1 × C2)}. ✷

Namely, C is obtained by collecting the closed itemsets contained in C1 and C2, and intersecting them to obtain further ones. It is easy to see that this exactly corresponds to Lemma 1 based on the subposition operator. In the following, we will apply the subposition operator to a more expressive framework of MRDM. 3 3.1

Iceberg Query Lattices in Multi-Relational DM

Multi-Relational Data Mining

In the task of frequent pattern mining in multi-relational databases, we assume that we have a given database r, a language of patterns, and a notion of frequency which measures how often a pattern occurs in the database. We use Datalog

Customer

key allen carol diana fred

Parent

SR. allen allen carol diana fred fred JR. bill jim bill eve eve hera Buys key allen carol diana fred item pizza pizza cake cake

Male

person bill jim

Female

person eve hera to represent data and patterns. We assume some familiarity with the notions of logic programming (e.g., [ 14, 16 ]), although we introduce some notions and terminology in the following.

An atom (or literal ) is an expression of the form p(t1, . . . .tn), where p is a predicate (or relation) of arity n, denoted by p/n, and each ti is a term, i.e., a constant or a variable.

A substitution θ = {X1/t1, . . . , Xn/tn} is an assignment of terms to variables. The result of applying a substitution θ to an expression E is the expression Eθ, where all occurrences of variables Vi have been simultaneously replaced by the corresponding terms ti in θ. The set of variables occurring in E is denoted by Var (E).

A pattern is expressed as a conjunction of atoms (literals) l1 ∧· · ·∧ln, denoted simply by l1, . . . , ln. A pattern is sometimes called a query. Let C be a pattern (i.e., a conjunction) and θ a substitution of Var (C). When Cθ is logically entailed by a database r, we write it by r |= Cθ. Let answerset (C, r) be the set of substitutions satisfying r |= Cθ. We will represent conjunctions in list notation, i.e., [l1, . . . , ln]. For a conjunction C and an atom p, we denote by [C, p] the conjunction that results from adding p after the last element of C.

In multi-relational data mining, one of predicates is often specified as a key (or target ), which determines the entities of interest and what is to be counted. Example 2. Let r be a multi-relational DB in Fig. 2, which consists of five relations, including Customer, Parent, Buys and so on. For each relation, we introduce a corresponding predicate, e.g., customer for relation Customer.

Let P be a pattern of the form: customer (X), parent (X, Y ), buys(X, pizza). P θ is logically entailed by r, if there exists a tuple (a1, a2) such that a1 ∈ Customer, (a1, a2) ∈ Parent, and (a1, pizza) ∈ Buys. Then, answerset (P, r) = {{X/allen, Y /bill }, {X/allen, Y /jim}, {X/carol , Y /bill }}. ✷

As explained in Sect. 1, in a typical task of MRDM, a user is usually expected to specify a special predicate key (or target ) (e.g., [ 3, 4 ]). The key is an atom which determines the entities of interest and what is to be counted. The key (target) is thus to be present in all patterns considered. In Example 2, the key is predicate customer .

A pattern containing a key is not always meaningful to be mined. For example, let C = [customer (X), parent (X, Y ), buys(Z, pizza)] be a conjunction in Example 2. Variable Z in C is not linked to variable X in key atom customer (X); an object represented by Z will have nothing to do with key object X. It will be inappropriate to consider such a conjunction as an intended pattern to mine. In ILP, the following notion of linked literals [ 10 ] is a standard one to specify the so-called language bias.

Definition 4 (Linked Literal). [ 10 ] Let key(X) be a key atom and l a literal. l is said to be linked to key(X), if either X ∈ Var (l) or there exists a literal l1 such that l is linked to key(X) and Var (l1) ∩ Var (l) 6= ∅. ✷

Given a database r and a key atom key(X), we assume that there are predefined finite sets of predicate (resp. variables; resp. constant symbols), and that, for each literal l in a conjunction C, it is constructed using the predefined sets. Moreover, each pattern C of conjunctions to be mined satisfies the following conditions: key(X) ∈ C and, for each l ∈ C, l is linked to key(X). In the following, we denote by Q the set of queries (or patterns) satisfying the above bias condition.

Let r be a database and Q be a query containing a key atom key(X). Then, the support (or frequency) of C, denoted by supp(Q, r, key), is defined as: supp(Q, r, key) = |{θkey | θ ∈ answerset (Q, r)}| |answerset (key(X), r)| where θkey is the restriction of θ = {X/t, . . . } w. r. t. key(X), defined by θkey = {X/t} for some term t. The numerator in the above formula is called the support count (or absolute support ). Q is said to be frequent , if supp(Q, r, key) is no less than some user defined threshold min sup. 3.2

Iceberg Query Lattices with Key

We now consider the notion of a formal context in MRDM, following [ 20 ]. Definition 5. [ 20 ] Let r be a datalog database and Q a set of datalog queries. The formal context associated to r and Q is defined by Kr, Q = (Or, Q, Ar, Q, Ir, Q), where Or, Q = {θ | θ is a grounding substitution for all Q ∈ Q}, and Ar, Q = Q, and (θ, Q) ∈ Ir, Q if and only if θ ∈ answerset (Q, r). ✷

Each θ ∈ answerset (Q, r) is often called an occurrence of Q in r. We denote by O(Q; r) the set of the occurrences of Q in r, namely, O(Q; r) = answerset (Q, r).

From this formal context, we can define the concept lattice the same way as in [ 20 ]. We first introduce an equivalence relation ∼r on the set of queries: Two queries Q1 and Q2 are said to be equivalent with respect to database r if and only if answerset (Q1, r) = answerset (Q2, r). Definition 6 (Closed Query). Let r be a datalog database and ∼r the equivalence relation on a set of datalog queries Q. A query (or pattern) Q is said to be closed (w.r. t. r and Q), iff Q is the most specific query among the equivalence class to which it belongs: {Q1 ∈ Q | Q ∼r Q1}. ✷

For any query Q1, its closure is a closed query Q such that Q is the most specific query among {Q ∈ Q | Q ∼r Q1}. Since it uniquely exists, we denote it by Clo(Q1; r). Note that Var (Q1) = Var (Clo(Q1; r)) by definition. We refer to this as the range-restricted condition here.

Stumme [ 20 ] showed that the set of frequent closed queries forms a lattice. In our framework, it is necessary to take our bias condition into consideration. To do that, we employ the well-known notion of the most specific generalization (or least generalization) [ 18, 16 ].

For queries Q1 and Q2, we denote by lg(Q1, Q2) the least generalization of Q1 and Q2. Moreover, the join of Q1 and Q2, denoted by Q1 ∨ Q2, is defined as: Q1 ∨ Q2 = lg(Q1, Q2)|Q, where, for a query Q, Q|Q is the restriction of Q to Q, defined by a conjunction consisting of every literal l in Q which is linked to key(X), i.e., deleting every literal in Q not linked to key(X).

Definition 7. [ 20 ] Let r be a datalog database and Q a set of datalog queries. The iceberg query lattice associated to r and Q for minsupp ∈ [ 0, 1 ] is defined as: Cr, Q = ({Q ∈ Q | Q is closed w.r.t. r and Q, and Q is frequent}, |=), where |= is the usual logical implication. ✷ Theorem 2. Let r be a datalog database and Q a set of datalog queries where all queries contain an atom key and they are linked. Then, Cr, Q is a ∨-semilattice.

Proof. (Sketch) Let Q1, Q2 be frequent closed queries in Q. Then, it is easy to see that their least generalization lg(Q1, Q2) is closed and frequent. However, it might not be linked to key(X). For example, consider that Q1 (Q2) is of the form: Q1 = key(X), p(X, Y ), m(Y ) (Q2 = key(X), q(X, Y ), m(Y )), respectively. Then, lg(Q1, Q2) = key(X), m(Y ), which is not linked to key(X), although it is a closed query. In this case, Q1 ∨ Q2 = lg(Q1, Q2)|Q = key(X), which satisfies the bias condition from the definition. We can show that the resulting Q1 ∨ Q2 is in fact a closed query in the sense of Def. 6. ✷ Example 3. Continued from Example 2. Fig. 3 shows the iceberg query lattice associated to r in Ex. 2 and Q with the support count 1, where each query Q ∈ Q has customer (X) as a key atom, denoted by key(X) for short, Var (Q) ⊆ {X, Y } and the 2nd argument of predicate buys is a constant. ✷ 4

Distributed Closed Pattern Mining in MRDB Our purpose in this work is to mine global concepts in a distributed setting, where a global database is supposed to be horizontally partitioned appropriately, and stored possibly in different sites. Our approach is to first perform the computations of local concepts on each partition of the global DB, and then combine the local concepts by using the subposition operator. We first consider the notion of a horizontal decomposition of a multi-relational DB. Since a multi-relational DB consists of multiple relations, its horizontal decomposition is not immediately clear.

Definition 8. Let r be a multi-relational datalog database with a key predicate key. We call a pair r1, r2 a horizontal decomposition of r, if 1. keyr = keyr1 ∪· keyr2 , i.e., the key relation keyr in r is disjointly decomposed into keyr1 and keyr2 in r1 and r2, respectively, and 2. for any query Q, answerset (Q, r) = answerset (Q, r1) ∪ answerset (Q, r2). ✷ The second condition in the above states that the relations other than keyr are decomposed so that any answer substitution in answerset (Q, r) is computed either in r1 or r2, thereby being preserved in this horizontal decomposition.

Given a horizontal decomposition of a multi-relational DB, we can utilize any preferable concept (or closed pattern) mining algorithm for computing local concepts on each partition, as long as the mining algorithm is applicable to MRDM and its resulting patterns satisfy our bias condition. For example, Stumme [ 20 ] discussed the algorithm called Titanic [ 21 ], which is based on a level-wise approach. We use here an algorithm called ffLCM [ 19 ], which is based on the notion of closure extension due to Pasquier et al. [ 17 ] in FIM, and then elaborated by Uno et al. [ 22 ].

Subposition Operator in MRDM

We now present the counterpart to Lemma 1 in closed pattern mining in MRDB. We first modify the mapping ϕ in Def. 3 suitably for our purpose. Definition 9. Let r be a datalog database, and r1, r2 a horizontal decomposition of r. Let (O(Q; r), Q) be a concept in r, i.e., Q is a closed query and O(Q; r) = answerset (Q, r). Then,

ϕ˜((O(Q; r), Q)) = ((O(Q; r1), Clo(Q; r1)), (O(Q; r2), Clo(Q, r2))).

To give the counterpart to Lemma 1 in MRDM, we need another definition of join. Let Q1 and Q2 be queries which contain the same set V of variables, i.e., Var (Q1) = Var (Q2) = V. We define Q1 ∨RR Q2 = lg(Q1, Q2)|V,Q, where, for a query Q, Q|V,Q is the restriction of Q to V and Q, defined by a conjunction consisting of every literal l in Q such that Var (l) ⊆ V and l is linked to key, i.e., Q|V,Q is constructed from Q by deleting every literal in Q which contains a variable not in V, then deleting every remaining literal not linked to key. Theorem 3. Let r be a datalog database, and r1, r2 a horizontal decomposition of r. For any global concept c = (O(Q; r), Q) in r, and its image ϕ˜(c) = ((O(Q1; r1), Q1), (O(Q2; r2), Q2)), it holds that

O(Q; r) = O(Q1; r1) ∪ O(Q2; r2) and Q = Q1 ∨RR Q2.

Remark 1. We omit the proof here, since we can prove the theorem similarly to [ 23 ]. Instead, we give an example which will be helpful to understand why we need an extra provision for considering the least generalization in this case.

Let Q be a query of the form: key(A), p(A, B). Suppose that Q1 (Q2) is a query of the form: Q1 = key(A), p(A, B), q(A, B) (Q2 = key(A), p(A, B), q(B, A)), respectively. Then, lg(Q1, Q2) = key(A), p(A, B), q(C, D), where C and D are newly introduced variables in the least generalization. In this case, since Var (Q) = Var (Q1) = Var (Q2) = {A, B}, Q1 ∨RR Q2 is key(A), p(A, B), which coincides with Q.

Finally, we note that, in the case of transaction databases, the above theorem coincides with Theorem 1 in Sect. 2. ✷ Example 4. Continued from Example 3. We consider a horizontal decomposition r1, r2 of r such that the key relation keyr (i.e., Customer) in r is decomposed into keyr1 = {allen, carol} and keyr2 = {dian, fred}, and the other relations than Customer are decomposed so that they satisfy the second condition of Def. 8.

Let Q be a pattern of the form: [key(X), parent (X, Y )] in Fig. 3. We have that Q1 = Clo(Q; r1) = [Q, buys(X, pizza), male(Y )], and Q2 = Clo(Q; r2) = [Q, buys(X, cake), female(Y )]. Then, it holds that Q = Q1 ∨RR Q2. ✷

Distributed Mining Using MapReduce Framework Since the computation of local concepts can be done independently, it is expected that our algorithm is amenable to data-parallelism. We have therefore implemented our algorithm using MapReduce framework [ 2 ], although any framework supporting data-parallelism will do for our purpose.

In MapReduce framework, the user expresses the computation in terms of two functions: map and reduce. The map function takes an input key/value pair and produces a set of intermediate key/value pairs. Then, the set of intermediate key/value pairs are passed to the reduce function. The reduce function accepts an intermediate key and a set of values for that key, and it then merges (or aggregate) these values together to form a possibly smaller set of values.

However, our use of MapReduce framework is very simple; We use map operation to each local DB to compute a set of its local concepts. An intermediate key/value pair simply consists of (DB id , Cid), where Cid is the set of local concepts of DB id . We then apply a reduce operation which simply combines the derived results to form an input to the subsequent subposition operator. We thus simply exploited map for computing local concepts independently. We employed Hadoop1,an open source implementation of MapReduce.

We now present some preliminary results of our experiments. We implemented our algorithm by using Java 1.6.0 22. Experiments were performed on 6 PCs with Intel Core i5 processors running at 2.8GHz, 8GB of main memory, and 8MB of L2 cache, working under Ubuntu 11.04. We used Hadoop 0.20.2 using 6 PCs, and 2 mappers working on each PC.

Fig. 4 summarizes the results of the execution time for a test data on the mutagenicity prediction,2 containing 30 chemical compounds. Each compound is represented by a set of facts using predicates such as atom, bond , for example. The size of the set of predicate symbols is 12. The size of key atom (active(X )) is 230, and minimum support min sup = 2310 . We assume that patterns contain at most 4 variables and they contain no constant symbols. The number of the concepts mined is 4, 831.

Fig. 4 shows that the execution times t1 for mining local concepts are reduced almost linearly with the number of partitions from 1 (i.e., no partitioning) to 8. When the number of partitions is 16, the speed-up did not scale well compared to the other cases. This is a reasonable result; Due to the restriction of our current experiment environment, we used 6 PCs. Therefore, at most 12 mappers are simultaneously available. On the other hand, the execution times t2 for merging local concepts to obtain global concepts increase almost linearly with the number p of partitions from 1 (i.e., no partitioning) to 16. This is also reasonable; the number of subposition operators applied is (p − 1) when we have p partitions. 1 Hadoop: Open source implementation of MapReduce.

lucene.apache.org/hadoop/. 2 http://www.comlab.ox.ac.uk/activities/machinelearning/mutagenesis.html http:// 12,000 10,000 ]s 8,000 [ e m i T n6,000 o it u c e xE4,000 2,000 0

Time for Mining Local DBs: t1 Time for Merging: t2

Total Execution Time: t1 + t2

Fig. 4. Execution Time 6

Concluding Remarks

We have studied the problem of mining frequent closed patterns in multi-relational databases in a distributed environment. To do that, we have first reconsidered the notion of iceberg query lattices, where each datalog query contains an atom key, and the variables in a query are linked to the key. We have then proposed the notion of a horizontal decomposition of a given MRDB, and explained how the subposition operator can be utilized to generate the set of closed queries in the global database from the two sets of local closed queries in the two partitions. We have exemplified the effectiveness of our method by some preliminary experimental results using Hadoop.

As discussed in [ 1 ], efficiency and scalability have been major concerns in MRDM. Krajca et al. [ 11, 12 ] have proposed algorithms which allow us to compute search trees for concepts simultaneously either in parallel or in a distributed manner. Since their approaches are orthogonal to ours, it would be beneficial to employ their algorithms for computing local concepts in our method.

In this work, we have confined ourselves to horizontal partitions of a global context. It will be interesting to study vertical partitioning and their mixture in MRDM, where the apposition operator studied by Valtchev et al. [ 24 ] will play an important role. Our future work includes developing an efficient algorithm for handling such a general case, as well as accumulating more experimental results on different MRDBs to confirm the effectiveness of our subposition operator. Acknowledgement The authors would like to thank anonymous reviewers for their useful comments on our paper. The authors are grateful to Seiji Yamazaki for preparing the experiments in this paper.

1. Blockeel , H. , Sebag , M. : Scalability and efficiency in multi-relational data mining . SIGKDD Explorations Newsletter 2003 , Vol. 4 , Issue 2, pp. 1 - 14 ( 2003 )

2. Dean , J. , Ghemawat , S.: MapReduce: simplified data processing on large clusters . Commun. ACM , Vol. 51 , No. 1 , pp. 107 - 113 , 2008 .

3. Dehaspe , L. : Frequent pattern discovery in first-order logic , PhD thesis , Dept. Computer Science, Katholieke Universiteit Leuven, 1998 .

4. De Raedt , L. , Ramon , J.: Condensed representations for Inductive Logic Programming . In: Proc. KR'04 , pp. 438 - 446 ( 2004 )

5. Dzeroski, S.: Multi-Relational Data Mining: An Introduction . SIGKDD Explorations Newsletter 2003 , Vol. 5 , Issue 1, pp. 1 - 16 ( 2003 )

6. Dzeroski, S., Lavraˇc, N. (eds.): Relational Data Mining . Springer-Verlag, Inc . 2001 .

7. Ganter , B. , Wille , R.: Formal Concept Analysis: Mathematical Foundations . Springer, 1999 .

8. Ganter , B. , Stumme , G. , Wille , R.: Formal Concept Analysis, Foundations and Applications. LNCS 3626 , Springer, 2005 .

9. Garriga , G. C. , Khardon , R. , De Raedt , L.: On Mining Closed Sets in MultiRelational Data . IJCAI 2007 , pp. 804 - 809 ( 2007 )

10. Helft , N.: Induction as nonmonotonic inference . In Proc. KR'89 , pp. 149 - 156 , 1989 .

11. Krajca , P. , Vychodil , V. : Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework , IDA '09 , Springer-Verlag, pp. 333 - 344 , 2009 .

12. Krajca , P. , Outrata , J. , Vychodil , V. : Parallel algorithm for computing fixpoints of Galois connections , AMAI , Vol. 59 , No. 2 , pp. 257 - 272 , Kluwer Academic Pub., 2010 .

13. Kuznetsov , S. O. , Obiedkov , S. A.: Comparing performance of algorithms for generating concept lattices . J. Exp. Theor. Artif. Intell. , 14 ( 2-3 ): 189 .216, 2002 .

14. Lloyd , J. W. : Foundations of Logic Programming , Springer, 1987 , Second edition.

15. Lucchese , C. , Orlando , S. , Rergo , R.: Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results . International Workshop on High Performance and Distributed Mining ( 2005 ).

16. Nienhuys-Cheng, S-H., de Wolf, R.: Foundations of Inductive Logic Programming, LNAI 1228 , Springer, 1997 .

17. Pasquier , N. , Bastide , Y. , Taouil , R. , Lakhal , L. : Discovering Frequent Closed Itemsets for Association Rules . Proc. ICDT'99, LNAI 3245 , pp. 398 - 416 ( 1999 )

18. Plotkin , G.D.: A Note on Inductive Generalization. Machine Intelligence , Vol. 5 , pp. 153 - 163 , 1970 .

19. Seki , H. , Honda , Y. , Nagano , S. : On Enumerating Frequent Closed Patterns with Key in Muti-relational Data . LNAI 6332 , pp. 72 - 86 ( 2010 )

20. Stumme , G.: Iceberg Query Lattices for Datalog . In Conceptual Structures at Work, LNCS 3127 , Springer-Verlag, pp. 109 - 125 , 2004 .

21. Stumme , G. , Taouil , R. , Bastide , Y. , Pasquier , N. , Lakhal , L. : Computing Iceberg Concept Lattices with Titanic . J. KDE 42 ( 2 ), 2002 , pp. 189 - 222 .

22. Uno , T. , Asai , T. Uchida , Y. , Arimura , H. : An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases . Proc. DS'04, LNAI 3245 , pp. 16 - 31 ( 2004 )

23. Valtchev , P. , Missaoui , R.: Building Concept (Galois) Lattices from Parts: Generalizing the Incremental Methods . In Proc. of the 9th Int'l. Conf. on Conceptual Structures: Broadening the Base (ICCS '01) , Springer-Verlag, London, UK, 290 - 303 .

24. Valtchev , P. , Missaoui , R. , Pierre

Lebrun

, P. : A Partition-based Approach towards Constructing Galois (Concept) Lattices . Discrete Mathematics 256 ( 3 ): 801 - 829 ( 2002 )