<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Distributed Closed Pattern Mining in Multi-Relational Data based on Iceberg Query Lattices: Some Preliminary Results</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hirohisa Seki⋆</string-name>
          <email>seki@nitech.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sho-ich Tanimoto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, Nagoya Inst. of Technology</institution>
          ,
          <addr-line>Showa-ku, Nagoya 466-8555</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>115</fpage>
      <lpage>126</lpage>
      <abstract>
        <p>We study the problem of mining frequent closed patterns in multi-relational databases in a distributed environment. In multirelational data mining (MRDM), relational patterns involve multiple relations from a relational database, and they are typically represented in datalog language (a class of first order logic). Our approach is based on the notion of iceberg query lattices, a formulation of MRDM in terms of formal concept analysis (FCA), and we apply it to a distributed mining setting. We assume that a database considered contains a special predicate called key , which determines the entities of interest and what is to be counted, and that each datalog query contains an atom key, where variables in a query are linked to a given target object corresponding to the key. We show that the iceberg query lattice in this case can be defined similarly in the literature. Next, given two local databases (horizontal partitions) and their sets of closed patterns (concepts), we show that the subposition operator, which constructs a global Galois (concept) lattice from the direct product of two lattices studied in the literature, can be utilized to generate the set of closed patterns in the global database. The correctness of our algorithm is shown, and some preliminary experimental results using a MapReduce framework are also given.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Multi-relational data mining (MRDM) has been extensively studied for more
than a decade (e.g., [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] and references therein). The research topics discussed
in the conventional data mining have been considered in this more expressive
framework of MRDM, where data and patterns (or queries) are represented in
the form of logical formulae such as Datalog (a class of first order logic). In
contrast to the traditional data mining dealing with rather simple patterns such
as itemsets, the expressive formalism of MRDM allows us to use more complex
and structured data in a uniform way, including trees and graphs in particular,
and multi-relational patterns in general.
c 2012 by the paper authors. CLA 2012, pp. 115–126. Copying permitted only for
private and academic purposes. Volume published and copyrighted by its editors.
Local Proceedings in ISBN 978–84–695–5252–0,
      </p>
    </sec>
    <sec id="sec-2">
      <title>Universidad de M´alaga (Dept. Matem´atica Aplicada), Spain. 116 Hirohisa Seki and Sho-ich Tanimoto</title>
      <p>
        On the other hand, Formal Concept Analysis (FCA) has been developed as a
field of applied mathematics based on a clear mathematization of the notions of
concept and conceptual hierarchy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It has attracted much interest from various
application areas including, among others, data mining, knowledge acquisition
and software engineering (e.g., [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
      </p>
      <p>
        Stumme [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] has proposed the notion of iceberg query lattices, which
combines the notions of the above two fields, i.e., MRDM and FCA; Iceberg query
lattices combine the notion of frequent datalog queries in MRDM with iceberg
concept lattices (or frequent closed itemsets) in FCA. Then, it has been shown
that we can apply the “full arsenal” of FCA-based methods to frequent queries,
thereby allowing us to mine and visualize relational association rules. Condensed
representations such as closed patterns and free patterns in MRDM have been
also studied in c-armr [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and in RelLCM2 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In this paper, we study the problem of mining closed queries in
multirelational data based on these precursors. We apply the notion of iceberg query
lattices to a distributed mining setting. The assumption that a given dataset
is distributed and stored in different sites is reasonable, because we will not be
able to move local datasets into a centralized site due to too much data size
and/or privacy concerns. We also assume that a database considered contains
a special predicate called key (e.g., [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]), and that each datalog query is
supposed to contain an atom key, where variables in a query are linked to a given
target object corresponding to the key. Using an key atom, we can define the
notion of the frequency of a datalog query, since the key atom determines the
entities of interest and what is to be counted. We show that the iceberg query
lattice in this case can be defined similarly in the literature. Next, given two local
databases (horizontal partitions) and their sets of closed queries (concepts), we
show that the we can construct the set of closed queries in the global database,
by using subposition operator [
        <xref ref-type="bibr" rid="ref23 ref7">7, 23</xref>
        ], which constructs a global Galois (concept)
lattice from the direct product of two lattices. We also present some preliminary
experimental results using a distributed framework of MapReduce [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The organization of the rest of this paper is as follows. After summarizing
some basic notations and definitions in FCA in Sect. 2, we reconsider the notion
of iceberg query lattices with key in Sect. 3. We then explain our approach to
distributed closed query mining in MRDB in Sect. 4. In Section 5, we show the
effectiveness of our method by some preliminary experimental results. Finally,
we give a summary of this work in Section 6.
2</p>
      <sec id="sec-2-1">
        <title>Preliminaries: Formal Concept Analysis</title>
        <p>
          We assume that the reader is familiar with the basic notions of Formal Concept
Analysis (FCA), which are found in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. However, we recall some of the important
definitions and notations.
        </p>
        <p>Definition 1. A (formal) context K = (O, A, I) consists of a set O of objects,
a set A of attributes, and a binary relation I ⊆ O × A.</p>
        <p>The mapping f : P(O) → P(A) is given by f (X) = {a ∈ A | ∀o ∈ X :
(o, a) ∈ I}. The mapping g : P(A) → P(O) is given by g(Y ) = {o ∈ O | ∀a ∈
Y : (o, a) ∈ I}.</p>
        <p>If it is clear from the context whether f or g is meant, then we abbreviate
both f (·) and g(·) just by ′. In particular, Y ′′ stands for f (g(Y )).</p>
        <p>A (formal) concept is a pair (X, Y ) with X ⊆ O, Y ⊆ A, X′ = Y, and Y ′ =
X. X is called extent , and Y is called intent of the concept. The set CK of all
concepts of K together with the partial order (X1, Y1) ≤ (X2, Y2) ↔ X1 ⊆ X2
(which is equivalent to Y1 ⊇ Y2) is called the concept lattice of K. ✷</p>
        <p>
          In FCA [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], a set of context-oriented operators has been studied, including
apposition/subposition operators, and they are extensively studied by Valtchev
and Missaoui [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ]. The following definitions and lemma are due to [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
Definition 2. Let K1 = (O1, A, I1) and K2 = (O2, A, I2) be two contexts with
the same set of attributes A. Then the context K = (O1∪· O2, A, I1 ∪·I2) is called
the subposition of K1 and K2, denoted by K = KK21 .
        </p>
        <p>Usually, the extent of K is set to the disjoint union (denoted by ∪· ) of the
involved context extents, and this constraint is suitable for our current study.</p>
        <p>Let Ki (i = 1, 2) be a context, and Li the corresponding lattice. The direct
product of a pair of lattices L1 and L2, denoted by L× = L1 × L2, is itself a
lattice L× = hCK× , ≤×i, where CK× = CK1 × CK2 , and (c1, c2) ≤× (c1, c2) ⇔
c1 ≤L1 c1 and c2 ≤L2 c2.</p>
        <p>
          Any concept of L can be projected upon the concept lattice, L1 (L2) by
restricting its extent to the set of “visible” objects, e.g., those in O1 (O2),
respectively. The resulting mapping constitutes an order homomorphism between
L and the direct product [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Definition 3. The function ϕ : CK → CK× maps a concept from the global
lattice into a pair of concepts of the partial lattices by splitting its extent over
the partial context object sets O1 and O2:</p>
        <p>ϕ((X, Y )) = ((X ∩ O1, (X ∩ O1)′), (X ∩ O2, (X ∩ O2)′)).</p>
        <p>
          From the above definitions, we have the following property [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]:
Lemma 1. [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] For any global concept c = (X, Y ) and its image ϕ(c) =
((X1, Y1), (X2, Y2)), it holds that X = X1 ∪ X2 and Y = Y1 ∩ Y2. ✷
Example 1. Consider a context K in Fig. 1 (upper left). The concept lattice CK
derived from K is shown in the right. Let K1, K2 (lower right of the figure) be
a horizontal decomposition of K, where O1 = {1, 2} and O2 = {3, 4}. Then, K
is the subposition of K1 and K2, i.e., K = KK21 . The concept lattices CK1 (CK2 )
derived from K1 (K2) are shown in the right, respectively.
        </p>
        <p>Consider a global concept c = (123, d) in CK. Then, ϕ(c) = ((12, bd), (3, acd)),
and we have from Lemma 1 that {123} = {12} ∪ {3} and {d} = {bd} ∩ {acd}. ✷</p>
        <p>
          FCA provides a framework for frequent itemset mining (FIM), where the
intent of a concept corresponds to a closed itemset. The subposition operator will
be readily used for mining frequent closed itemsets (FCIs) in a global transaction
database D from the local FCIs from two disjoint (horizontal) partitions D1 and
D2, provided that we mine all the partitions with an (absolute) support being
set to 1, i.e. when we consider as frequent any itemset which occur at least once
in D. In fact, Lucchese et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] show the following property:
Theorem 1 (Lucchese et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]). Let D be transaction database, and D1,
D2 two disjoint (horizontal) partitions of D. Let C be the set of FCIs of D, and
C1 (C2) the set of local FCIs of D1 (D2), respectively. Then, C is computed from
C1 and C2 as C = (C1 ∪ C2) ∪ {C1 ∩ C2 | (C1, C2) ∈ (C1 × C2)}. ✷
        </p>
        <p>Namely, C is obtained by collecting the closed itemsets contained in C1 and
C2, and intersecting them to obtain further ones. It is easy to see that this exactly
corresponds to Lemma 1 based on the subposition operator. In the following, we
will apply the subposition operator to a more expressive framework of MRDM.
3
3.1</p>
        <p>Iceberg Query Lattices in Multi-Relational DM</p>
        <sec id="sec-2-1-1">
          <title>Multi-Relational Data Mining</title>
          <p>In the task of frequent pattern mining in multi-relational databases, we assume
that we have a given database r, a language of patterns, and a notion of frequency
which measures how often a pattern occurs in the database. We use Datalog</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Customer</title>
      <p>key
allen
carol
diana
fred</p>
    </sec>
    <sec id="sec-4">
      <title>Parent</title>
      <p>SR.
allen
allen
carol
diana
fred
fred
JR.
bill
jim
bill
eve
eve
hera
Buys
key
allen
carol
diana
fred
item
pizza
pizza
cake
cake</p>
    </sec>
    <sec id="sec-5">
      <title>Male</title>
      <p>person
bill
jim</p>
    </sec>
    <sec id="sec-6">
      <title>Female</title>
      <p>
        person
eve
hera
to represent data and patterns. We assume some familiarity with the notions
of logic programming (e.g., [
        <xref ref-type="bibr" rid="ref14 ref16">14, 16</xref>
        ]), although we introduce some notions and
terminology in the following.
      </p>
      <p>An atom (or literal ) is an expression of the form p(t1, . . . .tn), where p is a
predicate (or relation) of arity n, denoted by p/n, and each ti is a term, i.e., a
constant or a variable.</p>
      <p>A substitution θ = {X1/t1, . . . , Xn/tn} is an assignment of terms to variables.
The result of applying a substitution θ to an expression E is the expression Eθ,
where all occurrences of variables Vi have been simultaneously replaced by the
corresponding terms ti in θ. The set of variables occurring in E is denoted by
Var (E).</p>
      <p>A pattern is expressed as a conjunction of atoms (literals) l1 ∧· · ·∧ln, denoted
simply by l1, . . . , ln. A pattern is sometimes called a query. Let C be a pattern
(i.e., a conjunction) and θ a substitution of Var (C). When Cθ is logically entailed
by a database r, we write it by r |= Cθ. Let answerset (C, r) be the set of
substitutions satisfying r |= Cθ. We will represent conjunctions in list notation,
i.e., [l1, . . . , ln]. For a conjunction C and an atom p, we denote by [C, p] the
conjunction that results from adding p after the last element of C.</p>
      <p>In multi-relational data mining, one of predicates is often specified as a key
(or target ), which determines the entities of interest and what is to be counted.
Example 2. Let r be a multi-relational DB in Fig. 2, which consists of five
relations, including Customer, Parent, Buys and so on. For each relation, we
introduce a corresponding predicate, e.g., customer for relation Customer.</p>
      <p>Let P be a pattern of the form: customer (X), parent (X, Y ), buys(X, pizza).
P θ is logically entailed by r, if there exists a tuple (a1, a2) such that a1 ∈
Customer, (a1, a2) ∈ Parent, and (a1, pizza) ∈ Buys. Then, answerset (P, r) =
{{X/allen, Y /bill }, {X/allen, Y /jim}, {X/carol , Y /bill }}. ✷</p>
      <p>
        As explained in Sect. 1, in a typical task of MRDM, a user is usually expected
to specify a special predicate key (or target ) (e.g., [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]). The key is an atom
which determines the entities of interest and what is to be counted. The key
(target) is thus to be present in all patterns considered. In Example 2, the key
is predicate customer .
      </p>
      <p>
        A pattern containing a key is not always meaningful to be mined. For
example, let C = [customer (X), parent (X, Y ), buys(Z, pizza)] be a conjunction in
Example 2. Variable Z in C is not linked to variable X in key atom customer (X);
an object represented by Z will have nothing to do with key object X. It will be
inappropriate to consider such a conjunction as an intended pattern to mine. In
ILP, the following notion of linked literals [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a standard one to specify the
so-called language bias.
      </p>
      <p>
        Definition 4 (Linked Literal). [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] Let key(X) be a key atom and l a literal.
l is said to be linked to key(X), if either X ∈ Var (l) or there exists a literal l1
such that l is linked to key(X) and Var (l1) ∩ Var (l) 6= ∅. ✷
      </p>
      <p>Given a database r and a key atom key(X), we assume that there are
predefined finite sets of predicate (resp. variables; resp. constant symbols), and that,
for each literal l in a conjunction C, it is constructed using the predefined sets.
Moreover, each pattern C of conjunctions to be mined satisfies the following
conditions: key(X) ∈ C and, for each l ∈ C, l is linked to key(X). In the
following, we denote by Q the set of queries (or patterns) satisfying the above bias
condition.</p>
      <p>Let r be a database and Q be a query containing a key atom key(X). Then,
the support (or frequency) of C, denoted by supp(Q, r, key), is defined as:
supp(Q, r, key) =
|{θkey | θ ∈ answerset (Q, r)}|
|answerset (key(X), r)|
where θkey is the restriction of θ = {X/t, . . . } w. r. t. key(X), defined by θkey =
{X/t} for some term t. The numerator in the above formula is called the support
count (or absolute support ). Q is said to be frequent , if supp(Q, r, key) is no less
than some user defined threshold min sup.
3.2</p>
      <sec id="sec-6-1">
        <title>Iceberg Query Lattices with Key</title>
        <p>
          We now consider the notion of a formal context in MRDM, following [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
Definition 5. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] Let r be a datalog database and Q a set of datalog queries.
The formal context associated to r and Q is defined by Kr, Q = (Or, Q, Ar, Q, Ir, Q),
where Or, Q = {θ | θ is a grounding substitution for all Q ∈ Q}, and Ar, Q = Q,
and (θ, Q) ∈ Ir, Q if and only if θ ∈ answerset (Q, r). ✷
        </p>
        <p>Each θ ∈ answerset (Q, r) is often called an occurrence of Q in r. We denote by
O(Q; r) the set of the occurrences of Q in r, namely, O(Q; r) = answerset (Q, r).</p>
        <p>
          From this formal context, we can define the concept lattice the same way as
in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. We first introduce an equivalence relation ∼r on the set of queries: Two
queries Q1 and Q2 are said to be equivalent with respect to database r if and
only if answerset (Q1, r) = answerset (Q2, r).
Definition 6 (Closed Query). Let r be a datalog database and ∼r the
equivalence relation on a set of datalog queries Q. A query (or pattern) Q is said to be
closed (w.r. t. r and Q), iff Q is the most specific query among the equivalence
class to which it belongs: {Q1 ∈ Q | Q ∼r Q1}. ✷
        </p>
        <p>For any query Q1, its closure is a closed query Q such that Q is the most
specific query among {Q ∈ Q | Q ∼r Q1}. Since it uniquely exists, we denote it
by Clo(Q1; r). Note that Var (Q1) = Var (Clo(Q1; r)) by definition. We refer to
this as the range-restricted condition here.</p>
        <p>
          Stumme [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] showed that the set of frequent closed queries forms a lattice.
In our framework, it is necessary to take our bias condition into consideration.
To do that, we employ the well-known notion of the most specific generalization
(or least generalization) [
          <xref ref-type="bibr" rid="ref16 ref18">18, 16</xref>
          ].
        </p>
        <p>For queries Q1 and Q2, we denote by lg(Q1, Q2) the least generalization of
Q1 and Q2. Moreover, the join of Q1 and Q2, denoted by Q1 ∨ Q2, is defined
as: Q1 ∨ Q2 = lg(Q1, Q2)|Q, where, for a query Q, Q|Q is the restriction of Q to
Q, defined by a conjunction consisting of every literal l in Q which is linked to
key(X), i.e., deleting every literal in Q not linked to key(X).</p>
        <p>
          Definition 7. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] Let r be a datalog database and Q a set of datalog queries.
The iceberg query lattice associated to r and Q for minsupp ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] is defined as:
Cr, Q = ({Q ∈ Q | Q is closed w.r.t. r and Q, and Q is frequent}, |=), where |=
is the usual logical implication. ✷
Theorem 2. Let r be a datalog database and Q a set of datalog queries where
all queries contain an atom key and they are linked. Then, Cr, Q is a
∨-semilattice.
        </p>
        <p>Proof. (Sketch) Let Q1, Q2 be frequent closed queries in Q. Then, it is easy to
see that their least generalization lg(Q1, Q2) is closed and frequent. However,
it might not be linked to key(X). For example, consider that Q1 (Q2) is of the
form: Q1 = key(X), p(X, Y ), m(Y ) (Q2 = key(X), q(X, Y ), m(Y )), respectively.
Then, lg(Q1, Q2) = key(X), m(Y ), which is not linked to key(X), although it is
a closed query. In this case, Q1 ∨ Q2 = lg(Q1, Q2)|Q = key(X), which satisfies
the bias condition from the definition. We can show that the resulting Q1 ∨ Q2
is in fact a closed query in the sense of Def. 6. ✷
Example 3. Continued from Example 2. Fig. 3 shows the iceberg query lattice
associated to r in Ex. 2 and Q with the support count 1, where each query Q ∈ Q
has customer (X) as a key atom, denoted by key(X) for short, Var (Q) ⊆ {X, Y }
and the 2nd argument of predicate buys is a constant. ✷
4</p>
        <p>Distributed Closed Pattern Mining in MRDB
Our purpose in this work is to mine global concepts in a distributed setting,
where a global database is supposed to be horizontally partitioned
appropriately, and stored possibly in different sites. Our approach is to first perform the
computations of local concepts on each partition of the global DB, and then
combine the local concepts by using the subposition operator.
We first consider the notion of a horizontal decomposition of a multi-relational
DB. Since a multi-relational DB consists of multiple relations, its horizontal
decomposition is not immediately clear.</p>
        <p>Definition 8. Let r be a multi-relational datalog database with a key predicate
key. We call a pair r1, r2 a horizontal decomposition of r, if
1. keyr = keyr1 ∪· keyr2 , i.e., the key relation keyr in r is disjointly decomposed
into keyr1 and keyr2 in r1 and r2, respectively, and
2. for any query Q, answerset (Q, r) = answerset (Q, r1) ∪ answerset (Q, r2). ✷
The second condition in the above states that the relations other than keyr
are decomposed so that any answer substitution in answerset (Q, r) is computed
either in r1 or r2, thereby being preserved in this horizontal decomposition.</p>
        <p>
          Given a horizontal decomposition of a multi-relational DB, we can utilize
any preferable concept (or closed pattern) mining algorithm for computing
local concepts on each partition, as long as the mining algorithm is applicable
to MRDM and its resulting patterns satisfy our bias condition. For example,
Stumme [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] discussed the algorithm called Titanic [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], which is based on a
level-wise approach. We use here an algorithm called ffLCM [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], which is based
on the notion of closure extension due to Pasquier et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] in FIM, and then
elaborated by Uno et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>Subposition Operator in MRDM</title>
        <p>We now present the counterpart to Lemma 1 in closed pattern mining in MRDB.
We first modify the mapping ϕ in Def. 3 suitably for our purpose.
Definition 9. Let r be a datalog database, and r1, r2 a horizontal
decomposition of r. Let (O(Q; r), Q) be a concept in r, i.e., Q is a closed query and
O(Q; r) = answerset (Q, r). Then,</p>
        <p>ϕ˜((O(Q; r), Q)) = ((O(Q; r1), Clo(Q; r1)), (O(Q; r2), Clo(Q, r2))).</p>
        <p>To give the counterpart to Lemma 1 in MRDM, we need another definition
of join. Let Q1 and Q2 be queries which contain the same set V of variables, i.e.,
Var (Q1) = Var (Q2) = V. We define Q1 ∨RR Q2 = lg(Q1, Q2)|V,Q, where, for
a query Q, Q|V,Q is the restriction of Q to V and Q, defined by a conjunction
consisting of every literal l in Q such that Var (l) ⊆ V and l is linked to key,
i.e., Q|V,Q is constructed from Q by deleting every literal in Q which contains a
variable not in V, then deleting every remaining literal not linked to key.
Theorem 3. Let r be a datalog database, and r1, r2 a horizontal
decomposition of r. For any global concept c = (O(Q; r), Q) in r, and its image ϕ˜(c) =
((O(Q1; r1), Q1), (O(Q2; r2), Q2)), it holds that</p>
        <p>O(Q; r) = O(Q1; r1) ∪ O(Q2; r2) and Q = Q1 ∨RR Q2.</p>
        <p>
          Remark 1. We omit the proof here, since we can prove the theorem similarly to
[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Instead, we give an example which will be helpful to understand why we
need an extra provision for considering the least generalization in this case.
        </p>
        <p>Let Q be a query of the form: key(A), p(A, B). Suppose that Q1 (Q2) is a
query of the form: Q1 = key(A), p(A, B), q(A, B) (Q2 = key(A), p(A, B), q(B, A)),
respectively. Then, lg(Q1, Q2) = key(A), p(A, B), q(C, D), where C and D are
newly introduced variables in the least generalization. In this case, since Var (Q) =
Var (Q1) = Var (Q2) = {A, B}, Q1 ∨RR Q2 is key(A), p(A, B), which coincides
with Q.</p>
        <p>Finally, we note that, in the case of transaction databases, the above theorem
coincides with Theorem 1 in Sect. 2. ✷
Example 4. Continued from Example 3. We consider a horizontal decomposition
r1, r2 of r such that the key relation keyr (i.e., Customer) in r is decomposed
into keyr1 = {allen, carol} and keyr2 = {dian, fred}, and the other relations than
Customer are decomposed so that they satisfy the second condition of Def. 8.</p>
        <p>Let Q be a pattern of the form: [key(X), parent (X, Y )] in Fig. 3. We have
that Q1 = Clo(Q; r1) = [Q, buys(X, pizza), male(Y )], and Q2 = Clo(Q; r2) =
[Q, buys(X, cake), female(Y )]. Then, it holds that Q = Q1 ∨RR Q2. ✷</p>
        <p>
          Distributed Mining Using MapReduce Framework
Since the computation of local concepts can be done independently, it is expected
that our algorithm is amenable to data-parallelism. We have therefore
implemented our algorithm using MapReduce framework [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], although any framework
supporting data-parallelism will do for our purpose.
        </p>
        <p>In MapReduce framework, the user expresses the computation in terms of
two functions: map and reduce. The map function takes an input key/value pair
and produces a set of intermediate key/value pairs. Then, the set of intermediate
key/value pairs are passed to the reduce function. The reduce function accepts
an intermediate key and a set of values for that key, and it then merges (or
aggregate) these values together to form a possibly smaller set of values.</p>
        <p>However, our use of MapReduce framework is very simple; We use map
operation to each local DB to compute a set of its local concepts. An intermediate
key/value pair simply consists of (DB id , Cid), where Cid is the set of local
concepts of DB id . We then apply a reduce operation which simply combines the
derived results to form an input to the subsequent subposition operator. We thus
simply exploited map for computing local concepts independently. We employed
Hadoop1,an open source implementation of MapReduce.</p>
        <p>We now present some preliminary results of our experiments. We
implemented our algorithm by using Java 1.6.0 22. Experiments were performed on 6
PCs with Intel Core i5 processors running at 2.8GHz, 8GB of main memory, and
8MB of L2 cache, working under Ubuntu 11.04. We used Hadoop 0.20.2 using 6
PCs, and 2 mappers working on each PC.</p>
        <p>Fig. 4 summarizes the results of the execution time for a test data on the
mutagenicity prediction,2 containing 30 chemical compounds. Each compound
is represented by a set of facts using predicates such as atom, bond , for example.
The size of the set of predicate symbols is 12. The size of key atom (active(X ))
is 230, and minimum support min sup = 2310 . We assume that patterns contain
at most 4 variables and they contain no constant symbols. The number of the
concepts mined is 4, 831.</p>
        <p>Fig. 4 shows that the execution times t1 for mining local concepts are reduced
almost linearly with the number of partitions from 1 (i.e., no partitioning) to 8.
When the number of partitions is 16, the speed-up did not scale well compared to
the other cases. This is a reasonable result; Due to the restriction of our current
experiment environment, we used 6 PCs. Therefore, at most 12 mappers are
simultaneously available. On the other hand, the execution times t2 for merging
local concepts to obtain global concepts increase almost linearly with the number
p of partitions from 1 (i.e., no partitioning) to 16. This is also reasonable; the
number of subposition operators applied is (p − 1) when we have p partitions.
1 Hadoop: Open source implementation of MapReduce.</p>
        <p>lucene.apache.org/hadoop/.
2 http://www.comlab.ox.ac.uk/activities/machinelearning/mutagenesis.html
http://
12,000
10,000
]s 8,000
[
e
m
i
T
n6,000
o
it
u
c
e
xE4,000
2,000
0</p>
        <p>Time for Mining Local DBs: t1
Time for Merging: t2</p>
        <p>Total Execution Time: t1 + t2</p>
        <p>Fig. 4. Execution Time
6</p>
        <sec id="sec-6-2-1">
          <title>Concluding Remarks</title>
          <p>We have studied the problem of mining frequent closed patterns in multi-relational
databases in a distributed environment. To do that, we have first reconsidered
the notion of iceberg query lattices, where each datalog query contains an atom
key, and the variables in a query are linked to the key. We have then proposed
the notion of a horizontal decomposition of a given MRDB, and explained how
the subposition operator can be utilized to generate the set of closed queries in
the global database from the two sets of local closed queries in the two
partitions. We have exemplified the effectiveness of our method by some preliminary
experimental results using Hadoop.</p>
          <p>
            As discussed in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], efficiency and scalability have been major concerns in
MRDM. Krajca et al. [
            <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
            ] have proposed algorithms which allow us to
compute search trees for concepts simultaneously either in parallel or in a distributed
manner. Since their approaches are orthogonal to ours, it would be beneficial to
employ their algorithms for computing local concepts in our method.
          </p>
          <p>
            In this work, we have confined ourselves to horizontal partitions of a global
context. It will be interesting to study vertical partitioning and their mixture in
MRDM, where the apposition operator studied by Valtchev et al. [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ] will play
an important role. Our future work includes developing an efficient algorithm for
handling such a general case, as well as accumulating more experimental results
on different MRDBs to confirm the effectiveness of our subposition operator.
Acknowledgement The authors would like to thank anonymous reviewers for
their useful comments on our paper. The authors are grateful to Seiji Yamazaki
for preparing the experiments in this paper.
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blockeel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Scalability and efficiency in multi-relational data mining</article-title>
          .
          <source>SIGKDD Explorations Newsletter</source>
          <year>2003</year>
          , Vol.
          <volume>4</volume>
          , Issue 2, pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>MapReduce: simplified data processing on large clusters</article-title>
          .
          <source>Commun. ACM</source>
          , Vol.
          <volume>51</volume>
          , No.
          <issue>1</issue>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>113</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dehaspe</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Frequent pattern discovery in first-order logic</article-title>
          ,
          <source>PhD thesis</source>
          , Dept. Computer Science, Katholieke Universiteit Leuven,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>De Raedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramon</surname>
          </string-name>
          , J.:
          <article-title>Condensed representations for Inductive Logic Programming</article-title>
          .
          <source>In: Proc. KR'04</source>
          , pp.
          <fpage>438</fpage>
          -
          <lpage>446</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Dzeroski, S.:
          <string-name>
            <surname>Multi-Relational Data</surname>
          </string-name>
          <article-title>Mining: An Introduction</article-title>
          .
          <source>SIGKDD Explorations Newsletter</source>
          <year>2003</year>
          , Vol.
          <volume>5</volume>
          , Issue 1, pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Dzeroski, S., Lavraˇc, N. (eds.):
          <source>Relational Data Mining</source>
          . Springer-Verlag,
          <year>Inc</year>
          .
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          . Springer,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis, Foundations and Applications. LNCS 3626</source>
          , Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Garriga</surname>
            ,
            <given-names>G. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khardon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Raedt</surname>
          </string-name>
          , L.:
          <article-title>On Mining Closed Sets in MultiRelational Data</article-title>
          .
          <source>IJCAI</source>
          <year>2007</year>
          , pp.
          <fpage>804</fpage>
          -
          <lpage>809</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Helft</surname>
          </string-name>
          , N.:
          <article-title>Induction as nonmonotonic inference</article-title>
          .
          <source>In Proc. KR'89</source>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>156</lpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Krajca</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychodil</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework</article-title>
          ,
          <source>IDA '09</source>
          , Springer-Verlag, pp.
          <fpage>333</fpage>
          -
          <lpage>344</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Krajca</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Outrata</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychodil</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Parallel algorithm for computing fixpoints of Galois connections</article-title>
          ,
          <source>AMAI</source>
          , Vol.
          <volume>59</volume>
          , No.
          <issue>2</issue>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>272</lpage>
          , Kluwer Academic Pub.,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S. A.:</given-names>
          </string-name>
          <article-title>Comparing performance of algorithms for generating concept lattices</article-title>
          .
          <source>J. Exp. Theor. Artif. Intell.</source>
          ,
          <volume>14</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>189</fpage>
          .216,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          :
          <source>Foundations of Logic Programming</source>
          , Springer,
          <year>1987</year>
          , Second edition.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lucchese</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orlando</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rergo</surname>
          </string-name>
          , R.:
          <article-title>Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results</article-title>
          .
          <source>International Workshop on High Performance and Distributed Mining</source>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Nienhuys-Cheng, S-H., de Wolf, R.:
          <source>Foundations of Inductive Logic Programming, LNAI 1228</source>
          , Springer,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Discovering Frequent Closed Itemsets for Association Rules</article-title>
          .
          <source>Proc. ICDT'99, LNAI 3245</source>
          , pp.
          <fpage>398</fpage>
          -
          <lpage>416</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Plotkin</surname>
          </string-name>
          , G.D.:
          <source>A Note on Inductive Generalization. Machine Intelligence</source>
          , Vol.
          <volume>5</volume>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>163</lpage>
          ,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Seki</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Honda</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nagano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>On Enumerating Frequent Closed Patterns with Key in Muti-relational Data</article-title>
          .
          <source>LNAI 6332</source>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>86</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>Iceberg Query Lattices for Datalog</article-title>
          .
          <source>In Conceptual Structures at Work, LNCS 3127</source>
          , Springer-Verlag, pp.
          <fpage>109</fpage>
          -
          <lpage>125</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Computing Iceberg Concept Lattices with Titanic</article-title>
          .
          <source>J. KDE</source>
          <volume>42</volume>
          (
          <issue>2</issue>
          ),
          <year>2002</year>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Uno</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asai</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Uchida</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arimura</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases</article-title>
          .
          <source>Proc. DS'04, LNAI 3245</source>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>31</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
          </string-name>
          , R.:
          <article-title>Building Concept (Galois) Lattices from Parts: Generalizing the Incremental Methods</article-title>
          .
          <source>In Proc. of the 9th Int'l. Conf. on Conceptual Structures: Broadening the Base (ICCS '01)</source>
          , Springer-Verlag, London, UK,
          <fpage>290</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pierre</surname>
            <given-names>Lebrun</given-names>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>A Partition-based Approach towards Constructing Galois (Concept) Lattices</article-title>
          .
          <source>Discrete Mathematics</source>
          <volume>256</volume>
          (
          <issue>3</issue>
          ):
          <fpage>801</fpage>
          -
          <lpage>829</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>