<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fast Mining of Iceberg Lattices: A Approach Using Generators Modular</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Laszlo Szathmary</string-name>
          <email>Szathmary.L@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petko Valtchev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
          <email>napoli@loria.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Godin</string-name>
          <email>godin.robertg@uqam.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alix Boc</string-name>
          <email>boc.alixg@uqam.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladimir Makarenkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. d'Informatique UQAM</institution>
          ,
          <addr-line>C.P. 8888</addr-line>
          ,
          <institution>Succ. Centre-Ville</institution>
          ,
          <addr-line>Montreal H3C 3P8</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LORIA UMR 7503</institution>
          ,
          <addr-line>B.P. 239, 54506 Vand uvre-les-Nancy Cedex</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Beside its central place in FCA, the task of constructing the concept lattice, i.e., concepts plus Hasse diagram, has attracted some interest within the data mining (DM) eld, primarily to support the mining of association rule bases. Yet most FCA algorithms do not pass the scalability test fundamental in DM. We are interested in the iceberg part of the lattice, alias the frequent closed itemsets (FCIs) plus precedence, augmented with the respective generators (FGs) as these provide the starting point for nearly all known bases. Here, we investigate a modular approach that follows a work ow of individual tasks that diverges from what is currently practiced. A straightforward instantiation thereof, Snow-Touch, is presented that combines past contributions of ours, Touch for FCIs/FGs and Snow for precedence. A performance comparison of Snow-Touch to its closest competitor, Charm-L, indicates that in the speci c case of dense data, the modularity overhead is o set by the speed gain of the new task order. To demonstrate our method's usefulness, we report rst results of a genome data analysis application.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Association discovery [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in data mining (DM) is aimed at pinpointing the most
frequent patterns of items, or itemsets, and the strongest associations between
items dug in a large transaction database. The main challenge here is the
potentially huge size of the output. A typical way out is to focus on a basis, i.e. a
reduced yet lossless representation of the target family (see a list in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). Many
popular bases are either formulated in terms of FCA or involve structures that
do. For instance, the minimal non-redundant association rules [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] require the
computation of the frequent closed itemsets (FCI) and their respective frequent
generators (FGs), while the informative basis involves the inclusion-induced
precedence links between FCIs.
      </p>
      <p>
        We investigate the computation of iceberg lattices, i.e., FCIs plus
precedence, together with the FGs. In the DM literature, several methods exist that
target FCIs by rst discovering the associated FGs (e.g. the levelwise FCI miners
A-Close [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and Titanic [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). More recently, a number of extensions of the
popular FCI miner Charm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have been published that output two or all three of
the above components. The basic one, Charm-L [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], produces FCIs with
precedence links (and could be regarded as a lattice construction procedure). Further
extensions to Charm-L produce the FGs as well (see [
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ]).
      </p>
      <p>
        In the FCA eld, the frequency aspect of concepts has been mostly ignored
whereas generators have rarely been explicitly targeted. Historically, the rst
method whose output combines closures, generators and precedence has been
presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] yet this fact is covered by a di erent terminology and a
somewhat incomplete result (see explanations below). The earliest method to
explicitly target all three components is to be found in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] while an improvement was
published in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Yet all these FCA-centered methods have a common drawback:
They scale poorly on large datasets due to repeated scans of the entire database
(either for closure computations or as an incremental restructuring technique).
In contrast, Charm-L exploits a vertical encoding of the database that helps
mitigate the cost of the impact of the large object (a.k.a. transaction) set.
      </p>
      <p>Despite a diverging modus operandi, both FCA and data mining methods
follow the same overall algorithmic schema: they rst compute the set of
concepts/FCIs and the precedence links between them and then use these as input
in generator/FG calculation.</p>
      <p>However e cient Charm-L is, its design is far from optimal: For instance,
FCI precedence is computed at FCI discovery, bene ting from no particular
insight. Thus, many FCIs from distant parts of the search space are compared.
We therefore felt that there is space for improvement, e.g., by bringing in
techniques operating locally. An appealing track seemed to lay in the exploration of
an important duality from hypergraph theory to inverse the computation
dependencies between individual tasks (and thus de ne a new overall work ow). To
clarify this point, we chose to investigate a less intertwined algorithmic schema,
i.e. by a modular design so that each individual task could be targeted by the
best among a pool of alternative methods.</p>
      <p>
        Here, we describe a rst step in our study, Snow-Touch, which has been
assembled from existing methods by wiring them w.r.t. our new schema. Indeed,
our method relies on Charm for mining FCIs and on the vertical FG miner
Talky-G , which are put together into a combined miner, Touch [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], by means
of an FGs-to-FCIs matching mechanism. The Snow method [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] extracts the
precedence links from FCIs and FGs.
      </p>
      <p>The pleasant surprise with Snow-Touch was that, when a Java
implementation thereof was experimentally compared to Charm-L (authors' version in
C++) on a wide range of data, our method prevailed on all dense datasets. This
was not readily anticipated as the modular design brought a computational
overhead, e.g. the extra matching step. Moreover, Snow-Touch proved to work well
with real-world data, as the rst steps of a large-scale analysis of genomic data
indicate.</p>
      <p>In summary, we contribute here a novel computation schema for iceberg
lattices with generators (hence a new lattice construction approach).
Moreover, we derive an e cient FCI/FG/precedence miner (especially on dense sets).
We also demonstrate the practical usefulness of Snow-Touch as well as of the
global approach for association mining based on generic rules.</p>
      <p>The remainder of the paper is as follows: Background on pattern mining,
hypergraphs, and vertical pattern mining is provided in Section 2. In Section 3
we present the di erent modules of the Snow-Touch algorithm. Experimental
evaluations are provided in Section 4 and conclusions are drawn in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>In the following, we summarize knowledge about relevant structures from
frequent pattern mining and hypergraph theory (with parallels drawn with similar
notions from FCA) as well as about e cient methods for mining them.
2.1</p>
      <sec id="sec-2-1">
        <title>Basic facts from pattern mining and concept analysis</title>
        <p>In pattern mining, the input is a database (comparable to an FCA formal
context). Records in the database are called transactions (alias objects), denoted
here O = fo1; o2; : : : ; omg. A transaction is basically subsets of a given total
set of items (alias attributes ), denoted here A = fa1; a2; : : : ; ang. Except for its
itemset, a transaction is explicitly identi ed through a unique identi er, a tid (a
set of identi ers is thus called a tidset ). Throughout this paper, we shall use the
following database as a running example (the \dataset D"): D = f(1; ACDE),
(2; ABCDE), (3; AB), (4; D), (5; B)g.</p>
        <p>The standard 0 derivation operators from FCA are denoted di erently in this
context. Thus, given an itemset X, the tidset of all transactions comprising X in
their itemsets is the image of X, denoted t(X) (e.g. t(AB) = 23). We recall that
an itemset of length k is called a k-itemset. Moreover, the (absolute) support of
an itemset X, supp : }(A) ! N, is supp(X) = jt(X)j. An itemset X is called
frequent, if its support is not less than a user-provided minimum support (denoted
by min supp). Recall as well that, in [X], the equivalence class of X induced
by t(), the extremal elements w.r.t. set-theoretic inclusion are, respectively, the
unique maximum X00 (a.k.a. closed itemset or the concept intent ), and its set
of minimums, a.k.a. the generator itemsets. In data mining, an alternative de
nition is traditionally used stating that an itemset X is closed (a generator ) if
it has no proper superset (subset) with the same support. For instance, in our
dataset D, B and C are generators, whose respective closures are B and ACDE.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], a subsumption relation is de ned as well: X subsumes Z, i X Z and
supp(X) = supp(Z). Obviously, if Z subsumes X, then Z cannot be a generator.
In other words, if X is a generator, then all its subsets Y are generators as well3.
Formally speaking, the generator family forms a downset within the Boolean
lattice of all itemsets h}(A); i.
        </p>
        <sec id="sec-2-1-1">
          <title>3 Please notice that the dual property holds for non generators.</title>
          <p>
            The FCI and FG families losslessly represent the family of all frequent
itemsets (FIs) [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. They jointly compose various non-redundant bases of valid
association rules, e.g. the generic basis [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]. Further bases require the inclusion order
between FCIs or its transitive reduction , i.e. the precedence relation.
          </p>
          <p>
            In Fig. 1 (adapted from [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]), views (a) and (b) depict the concept lattice
of dataset D and its iceberg part, respectively. Here we investigate the e cient
computation of the three components of an association rule basis, or what could
be spelled as the generator-decorated iceberg (see Fig. 1 (c)).
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>E ective mining methods for FCIs, FGs, and precedence links</title>
        <p>
          Historically, the rst algorithm computing all closures with their generators and
precedence links can be found in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] (although under a di erent name in a
somewhat incomplete manner). Yet the individual tasks have been addressed
separately or in various combinations by a large variety of methods.
        </p>
        <p>
          First, the construction of all concepts is a classical FCA task and a large
number of algorithms exist for it using a wide range of computing strategies.
Yet they scale poorly as FCI miners due to their reliance on object-wise
computations (e.g. the incremental acquisition of objects as in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]). These involve
to a large number of what is called data scans in data mining that are known
to seriously deteriorate the performances. In fact, the overwhelming majority of
FCA algorithms would su er on the same drawback as they have been designed
under the assumption that the number of objects and the number of attributes
remain in the same order of magnitude. Yet in data mining, there is usually a
much larger number of transactions than there are items.
        </p>
        <p>
          As to generators, they have attracted signi cantly less attraction in FCA as
a standalone structure. Precedence links, in turn, are sometimes computed by
concept mining FCA algorithms beside the concept set. Here again, objects are
heavily involved in the computation hence the poor scaling capacity of these
methods. The only notable exception to this rule is the method described in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
which was designed to deliberately avoid referring to objects by relying
exclusively on concept intents.
        </p>
        <p>
          When all three structures are considered together, after [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], e cient methods
for the combined task have been proposed, among others, in [
          <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
          ].
        </p>
        <p>
          In data mining, mining FCIs is also a popular task [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Many FCI miners
exist and a good proportion thereof would output FGs as a byproduct. For
instance, levelwise miners such as Titanic [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and A-Close [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], use FGs as entry
points into the equivalence classes of their respective FCIs. In this eld, the
FGs, under the name of free-sets [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], have been targeted by dedicated miners.
Precedence links do not seem to play a major role in pattern mining since few
miners would consider them. In fact, to the best of our knowledge, the only
mainstream FCI miner that would also output the Hasse diagram of the iceberg
lattice is Charm-L [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In order to avoid multiple data scans, Charm-L relies
on a speci c encoding of the transaction database, called vertical, that takes
advantage of the aforementioned asymmetry between the number of transactions
and the number of items. Moreover, two ulterior extensions thereof [
          <xref ref-type="bibr" rid="ref7 ref9">7,9</xref>
          ] would
also cover the FGs for each FCI, making them the primary competitors for our
own approach.
        </p>
        <p>
          Despite the clear discrepancies in their modus operandi, both FCA-centered
algorithms and FCI/FG miners share their overall algorithmic schema. Indeed,
they rst compute the set of concepts/FCIs and the precedence links between
them and then use these as input in generator/FG calculation. The latter task
can be either performed along a levelwise traversal of the equivalence class of a
given closure, as in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], or shaped as the computation of the minimal
transversals of a dedicated hypergraph4, as in [
          <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
          ] and [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>While such a schema could appear more intuitive from an FCA point of view
( rst comes the lattice, then the generators which are seen as an \extra"), it is
less natural and eventually less pro table for data mining. Indeed, while a good
number of association rule bases would require the precedence links in order to
be constructed, FGs are used in a much larger set of such bases and may even
constitute a target structure of their own (see above). Hence, a more versatile
mining method would only output the precedence relation (and compute it) as
an option, which is not possible with the current design schema. More precisely,
the less rigid order between the steps of the combined task would be: (1) FCIs,
(2) FGs, and (3) precedence. This basically means that precedence needs to be
computed at the end, independently from FG and FCI computations (but may
rely on these structures as input). Moreover, the separation of the three steps
insures a higher degree of modularity in the design of the concrete methods
following our schema: Any combination of methods that solve an individual task
could be used, leaving the user with a vast choice. On the reverse side of the coin,
4 Termed alternatively as (minimal) blockers or hitting sets.
total modularity comes with a price: if FGs and FCIs are computed separately,
an extra step will be necessary to match an FCI to its FGs.</p>
        <p>
          We describe hereafter a method tting the above schema which relies
exclusively on existing algorithmic techniques. These are combined into a single global
procedure, called Snow-Touch in the following manner: The FCI computation
is delegated to the Charm algorithm which is also the basis for Charm-L. FGs
are extracted by our own vertical miner Talky-G . The two methods together
with an FG-to-FCI matching technique form the Touch algorithm [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Finally,
precedence is retrieved from FCIs with FGs by the Snow algorithm [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] using a
ground duality result from hypergraph theory.
        </p>
        <p>In the remainder of this section we summarize the theoretical and the
algorithmic background of the above methods which are themselves brie y presented
and illustrated in the next section.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Hypergraphs, transversals, and precedence in closure semi-lattices</title>
        <p>
          The generator computation in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] exploits the tight interdependence between
the intent of a concept, its generators and the intents of its immediate predecessor
concepts. Technically speaking, a generator is a minimal blocker for the family
of faces associated to the concept intent and its predecessor intents5.
        </p>
        <p>Example. Consider the closed itemset (CI) lattice in Figure 1 (c). The CI
ABCDE has two faces: F1 = ABCDE n AB = CDE and F2 = ABCDE n
ACDE = B.</p>
        <p>
          It turns out that blocker is a di erent term for the widely known hypergraph
transversal notion. We recall that a hypergraph [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is a generalization of a graph
where edges can connect arbitrary number of vertices. Formally, it is a pair (V ,E )
made of a basic vocabulary V = fv1; v2; : : : ; vng, the vertices, and a family of
sets E , the hyper-edges, all drawn from V .
        </p>
        <p>
          A set T V is called a transversal of H if it has a non-empty intersection
with all the edges of H. A special case are the minimal transversals that are
exploited in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>Example. In the above example, the minimal transversals of fCDE; Bg are
fBC; BD; BEg, hence these are the generators of ABCDE (see Figure 1 (c)).</p>
        <p>
          The family of all minimal transversals of H constitutes the transversal
hypergraph of H (T r(H)). A duality exists between a simple hypergraph and its
transversal hypergraph [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]: For a simple hypergraph H, T r(T r(H)) = H. Thus,
the faces of a concept intent are exactly the minimal transversals of the
hypergraph composed by its generators.
        </p>
        <p>Example. The bottom node in Figure 1 (c) labelled ABCDE has three
generators: BC, BD, and BE while the transversals of the corresponding
hypergraph are fCDE; Bg.</p>
        <sec id="sec-2-3-1">
          <title>5 A face is the set-theoretic di erence between the intents of two concepts bound by</title>
          <p>
            a precedence link.
Miners from the literature, whether for plain FIs or FCIs, can be roughly split
into breadth- rst and depth- rst ones. Breadth- rst algorithms, more speci
cally the Apriori -like [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] ones, apply levelwise traversal of the pattern space
exploiting the anti-monotony of the frequent status. Depth- rst algorithms, e.g.,
Closet [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ], in contrast, organize the search space into a pre x-tree (see Figure 2)
thus factoring out the e ort to process common pre xes of itemsets. Among
them, the vertical miners use an encoding of the dataset as a set of pairs (item,
tidset), i.e., f(i; t(i))ji 2 Ag, which helps avoid the costly database re-scans.
          </p>
          <p>
            Eclat [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] is a plain FI miner relying on a vertical encoding at a depth- rst
traversal of a tree structure, called IT-tree, whose nodes are X t(X) pairs. Eclat
traverses the IT-tree in a pre-order way, from left-to-right [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] (see Figure 2).
Charm adapts the computing schema of Eclat to the rapid construction of the
FCIs [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. It is knowingly one of the fastest FCI-miners, hence its adoption as
a component in Touch as well as the idea to look for similar technique for
FGs. However, a vertical FG miner would be closer to Eclat than to Charm
as it requires no speci c speed-up during the traversal (recall that FGs form a
downset). In contrast, there is a necessary shift in the test focus w.r.t. Eclat :
Instead of supersets, subsets need to be examined to check candidate FGs. This,
in turn, requires that all such subsets are already tested at the moment an
itemset is examined. In other terms, the IT-tree traversal order needs to be a
linear extension of order between itemsets.
3
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The Snow-Touch Algorithm</title>
      <p>We sketch below the key components of Snow-Touch i.e. Talky-G , Touch, and
Snow.
3.1</p>
      <sec id="sec-3-1">
        <title>Talky-G</title>
        <p>
          Talky-G is a vertical FG miner that constructs an IT-tree in a depth- rst
rightto-left manner [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Traversal Of The Generator Search Space</title>
        <p>Traversing }(A) so that a given set X is processed after all its subsets induces
a -complying traversal order, i.e. a linear extension of .</p>
        <p>
          In FCA, a similar technique is used by the Next-Closure algorithm [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. The
underlying lectic order is rooted in an implicit mapping of }(A) to [0 : : : 2jAj 1],
where a set image is the decimal value of its characteristic vector w.r.t. an
arbitrary ordering rank : A $ [1::jAj]. The sets are then listed in the increasing
order of their mapping values which represents a depth- rst traversal of }(A).
This encoding yields a depth- rst right-to-left traversal (called reverse pre-order
traversal in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]) of the IT-tree representing }(A).
        </p>
        <p>Example. See Figure 2 for a comparison between the traversal strategies in
Eclat (left) and in Talky-G (right). Order-induced ranks of itemsets are drawn
next to their IT-tree nodes.</p>
      </sec>
      <sec id="sec-3-3">
        <title>The Algorithm</title>
        <p>The algorithm works the following way. The IT-tree is initialized by creating the
root node and hanging a node for each frequent item below the root (with its
respective tidset). Next, nodes below the root are examined, starting from the
right-most one. A 1-itemset p in such a node is an FG i supp(p) &lt; 100% in
which case it is saved to a dedicated list. A recursive exploration of the subtree
below the current node then ensues. At the end, all FGs are comprised in the
IT-tree.</p>
        <p>During the recursive exploration, all FGs from the subtree rooted in a node
are mined. First, FGs are generated by \joining" the subtree's root to each of
its sibling nodes laying to the right. A node is created for each of them and
hung below the subtree's root. The resulting node's itemset is the union of its
parents' itemsets while its tidset is the intersection of the tidsets of its parents.
Then, all the direct children of the subtree's root are processed recursively in a
right-to-left order.</p>
        <p>When two FGs are joined to form a candidate node, two cases can occur.
Either we obtain a new FG, or a valid FG cannot be generated from the two
FGs. A candidate FG is the union of the input node itemsets while its tidset
is the intersection of the respective tidsets. It can fail the FG test either by
insu cient support (non frequent) or by a strict FG-subset of the same support
(which means that the candidate is a proper superset of an already found FG
with the same support).</p>
        <p>Example. Figure 3 illustrates Talky-G on an input made of the dataset D and
a min supp = 1 (20%). The node ranks in the traversal-induced order are again
indicated. The IT-tree construction starts with the root node and its children
nodes: Since no universal item exists in D, all items are FGs and get a node
below the root. In the recursive extension step, the node E is examined rst:
Absent right siblings, it is skipped. Node D is next: the candidate itemset DE
fails the FG test since of the same support as E. With C, both candidates CD
and CE are discarded for the same reason. In a similar fashion, the only FGs in
the subtree below the node of B are BC, BD, and BE. In the case of A, these
are AB and AD. ABD fails the FG test because of BD.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Fast Subsumption Checking</title>
        <p>
          During the generation of candidate FGs, a subsumer itemset cannot be a
generator. To speed up the subsumption computation, Talky-G adapts the hash
structure of Charm for storing frequent generators together with their support
values. Thus, as tidsets are used for hashing of FGs, two equivalent itemsets get
the same hash value. Hence, when tracking a potential subsumee for a candidate
X, we check within the corresponding list in the hash table for FGs Y having
(i) the same support as X and, if positive outcome, (ii) proper subsets of X (see
details in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]).
        </p>
        <p>Example. The hash structure of the IT-tree in Figure 3 is drawn in Figure 4
(top right). The hash table has four entries that are lists of itemsets. The hash
function over a tidset is the modulo 4 of the sum of all tids. For instance, to
check whether ABD subsumes a known FG, we take its hash key, 2 mod 4 = 2,
and check the content of the list at index 2. In the list order, B is discarded
for support mismatch, while BE fails the subset test. In contrast, BD succeeds
both the support and the inclusion tests so it invalidates the candidate ABD.
3.2</p>
      </sec>
      <sec id="sec-3-5">
        <title>The Touch Algorithm</title>
        <p>The Touch algorithm has three main features, namely (1) extracting frequent
closed itemsets, (2) extracting frequent generators, and (3) associating frequent
generators to their closures, i.e. identifying frequent equivalence classes.</p>
        <p>Finally, our method matches FGs to their respective FCIs. To that end,
it exploits the shared storage technique in both Talky-G and Charm, i.e. the
hashing on their images (see Figure 4 (top)). The calculation is immediate: as
the hash value of a FG is the same as for its FCI, one only needs to traverse</p>
        <sec id="sec-3-5-1">
          <title>FCI (supp) FGs</title>
          <p>AB (2) AB
ABCDE (1) BE; BD; BC
A (3) A</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>FCI (supp) FGs B (3) B</title>
          <p>ACDE (2) E; C; AD
D (3) D
Fig. 4. Top: hash tables for dataset D with min supp = 1. Top left: hash table of
Charm containing all FCIs. Top right: hash table of Talky-G containing all FGs.
Bottom: output of Touch on dataset D with min supp = 1
the FG hash and for each itemset lookup the list of FCI associated to its own
hash value. Moreover, setting both lists to the same size, further simpli es the
procedure as both lists will then be located at the same o set within their
respective hash tables.</p>
          <p>Example. Figure 4 (top) depicts the hash structures of Charm and Talky-G .
Assume we want to determine the generators of ACDE which is stored at
position 3 in the hash structure of Charm. Its generators are also stored at position
3 in the hash structure of Talky-G . The list comprises three members that are
subsets of ACDE with the same support: E, C, and AD. Hence, these are the
generators of ACDE. The output of Touch is shown in Figure 4 (bottom).
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>The Snow Algorithm</title>
        <p>
          Snow computes precedence links on FCIs from associated FGs [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Snow
exploits the duality between hypergraphs made of the generators of an FCI and
of its faces, respectively to compute the latter as the transversals of the former.
Thus, its input is made of FCIs and their associated FGs. Several algorithms
can be used to produce this input, e.g. Titanic [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], A-Close [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], Zart [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], Touch,
etc. Figure 4 (bottom) depicts a sample input of Snow.
        </p>
        <p>On such data, Snow rst computes the faces of a CI as the minimal
transversals of its generator hypergraph. Next, each di erence of the CI X with a face
yields a predecessor of X in the closed itemset lattice.</p>
        <p>
          Example. Consider again ABCDE with its generator family fBC; BD; BEg.
First, we compute its transversal hypergraph: T r(fBC; BD; BEg) = fCDE; Bg.
The two faces F1 = CDE and F2 = B indicate that there are two
predecessors for ABCDE, say Z1 and Z2, where Z1 = ABCDE n CDE = AB, and
Z2 = ABCDE n B = ACDE. Application of this procedure for all CIs yields
the entire precedence relation for the CI lattice.
In this section we discuss practical aspects of our method. First, in order to
demonstrate that our approach is computationally e cient, we compare its
performances on a wide range of datasets to those of Charm-L. Then, we present
an application of Snow-Touch to the analysis of genomic data together with an
excerpt of the most remarkable gene associations that our method helped to
uncover.
We evaluated Snow-Touch against Charm-L [
          <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
          ]. The experiments were carried
out on a bi-processor Intel Quad Core Xeon 2.33 GHz machine running Ubuntu
GNU/Linux with 4 GB RAM. All times reported are real, wall clock times, as
obtained from the Unix time command between input and output. Snow-Touch was
implemented entirely in Java. For performance comparisons, the authors'
original C++ source of Charm-L was used. Charm-L and Snow-Touch were executed
with these options: ./charm-l -i input -s min_supp -x -L -o COUT -M 0 -n;
./leco.sh input min_supp -order -alg:dtouch -method:snow -nof2. In each case,
the standard output was redirected to a le. The di set optimization
technique [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] was activated in both algorithms.6
        </p>
        <p>Benchmark datasets. For the experiments, we used several real and
synthetic dataset benchmarks (see Table 1). The synthetic dataset T25, using the
IBM Almaden generator, is constructed according to the properties of market
basket data. The Mushrooms database describes mushrooms characteristics.
The Chess and Connect datasets are derived from their respective game
steps. The latter three datasets can be found in the UC Irvine Machine
Learning Database Repository. Typically, real datasets are very dense, while synthetic
data are usually sparse. Response times of the two algorithms on these datasets
are presented in Figure 5.</p>
        <p>
          Charm-L. Charm-L represents a state-of-the-art algorithm for closed
itemset lattice construction [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Charm-L extends Charm to directly compute the
lattice while it generates the CIs. In the experiments, we executed Charm-L with a
switch to compute (minimal) generators too using the minhitset method. In [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
Zaki and Ramakrishnan present an e cient method for calculating the
generators, which is actually the generator-computing method of Pfaltz and Taylor [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
6 Charm-L uses di sets by default, thus no explicit parameter was required.
180
160
140
.) 120
c
e
(s 100
e
itm 80
lt
a
to 60
40
20
        </p>
        <p>T25I10D10K</p>
        <p>Snow-Touch
Charm-L[minhitset]</p>
        <p>This way, the two algorithms (Snow-Touch and Charm-L) are comparable since
they produce exactly the same output.</p>
      </sec>
      <sec id="sec-3-7">
        <title>Performance on sparse datasets. On T25, Charm-L performs better than</title>
        <p>Snow-Touch. We have to admit that sparse datasets are a bit problematic for
our algorithm. The reason is that T25 produces long sparse bitvectors, which
gives some overhead to Snow-Touch. In our implementation, we use bitvectors
to store tidsets. However, as can be seen in the next paragraph, our algorithm
outperforms Charm-L on all the dense datasets that were used during our tests.</p>
        <p>Performance on dense datasets. On Mushrooms, Chess and
Connect, we can observe that Charm-L performs well only for high values of
support. Below a certain threshold, Snow-Touch gives lower response times, and
the gap widens as the support is lowered. When the minimum support is set
low enough, Snow-Touch can be several times faster than Charm-L.
Considering that Snow-Touch is implemented in Java, we believe that a good C++
implementation could be several orders of magnitude faster than Charm-L.</p>
        <p>According to our experiments, Snow-Touch can construct the concept lattices
faster than Charm-L in the case of dense datasets. From this, we draw the
hypothesis that our direction towards the construction of FG-decorated concept
lattices is more bene cial than the direction of Charm-L. That is, it is better to
extract rst the FCI/FG-pairs and then determine the order relation between
them than rst extracting the set of FCIs, constructing the order between them,
and then determining the corresponding FGs for each FCI.
4.2</p>
      </sec>
      <sec id="sec-3-8">
        <title>Analysis of Antibiotic Resistant Genes</title>
        <p>We looked at the practical performance of Snow-Touch on real-world genomic
dataset whereby the goal was to discover meaningful associations between genes
in entire genomes seen as items and transactions, respectively.</p>
        <p>The genomic dataset was collected from the website of the National
Center for Biotechnology Information (NCBI) with a focus on genes from microbial
genomes. At the time of writing (June 2011), 1; 518 complete microbial genomes
were available on the NCBI website.7 For each genome, its list of genes was
collected (for instance the genome with ID CP002059.1 has two genes, rnpB and
ssrA). Only 1; 250 genomes out of the 1; 518 proved non empty; we put them
in a binary matrix of 1; 250 rows 125; 139 columns. With an average of 684
genes per genome we got 0:55% density (i.e., large yet sparse dataset with an
imbalance between numbers of rows and of columns).</p>
        <p>
          The initial result of the mining task was the family of minimal
nonredundant association rules (MN R), which are directly available from the
output of Snow-Touch. We sorted them according to the con dence. Among all
strong associations, the bioinformaticians involved in this study found most
appealing the rules describing the behavior of antibiotic resistant genes, in
particular, the mecA gene. mecA is frequently found in bacterial cells. It induces a
resistance to antibiotics such as Methicillin, Penicillin, Erythromycin, etc. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
The most commonly known carrier of the gene mecA is the bacterium known as
MRSA (methicillin-resistant Staphylococcus aureus ).
        </p>
        <p>
          At a second step, we were narrowing the focus on a group of three genes,
mecA plus ampC and vanA [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. ampC is a beta-lactam-resistance gene. AmpC
beta-lactamases are typically encoded on the chromosome of many gram-negative
bacteria; it may also occur on Escherichia coli. AmpC type beta-lactamases
may also be carried on plasmids [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. Finally, the gene vanA is a
vancomycinresistance gene typically encoded on the chromosome of gram-positive bacteria
such as Enterococcus. The idea was to relate the presence of these three genes
to the presence or absence of any other gene or a combination thereof.
        </p>
        <p>Table 2 shows an extract of the most interesting rules found by our algorithm.
These rules were selected from a set of 18,786 rules.</p>
        <p>For instance, rule (1) in Table 2 says that the gene mecA is present in
85:71% of cases when the set of genes fclpX; dnaA; dnaI; dnaK; gyrB; hrcA; pyrFg</p>
        <sec id="sec-3-8-1">
          <title>7 http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi</title>
          <p>is present in a genome. The above rules have a direct practical use. In one such
scenario, they could be used to suggest which antibiotic should be taken by a
patient depending on the presence or absence of certain genes in the infecting
microbe.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We presented a new design schema for the task of mining the iceberg lattice
and the corresponding generators out of a large context. The target structure
directly involved in the construction of a number of association rule bases and
hence is of a certain importance in the data mining eld. While previously
published algorithms follow the same schema, i.e., construction of the iceberg lattice
(FCIs plus precedence links) followed by the extraction of the FGs, our approach
consists in inferring precedence links from the previously mined FCIs with their
FGs.</p>
      <p>We presented an initial and straightforward instanciation of the new
algorithmic schema that reuses existing methods for the three steps: the popular Charm
FCI miner, our own method for FG extraction, Talky-G (plus an FGs-to-FCIs
matching procedure), and the Hasse diagram constructor Snow. The resulting
iceberg plus FGs miner, Snow-Touch, is far from an optimal algorithm, in
particular due to redundancies in the rst two steps. Yet an implementation thereof
within the Coron platform (in Java) has managed to outperform its natural
competitor, Charm-L (in C++) on a wide range of datasets, especially on dense
ones.</p>
      <p>To level the playing ground, we are currently re-implementing Snow-Touch
in C++ and expect the new version to be even more e cient. In a di erent vein,
we have tested the capacity of our approach to support practical mining task by
applying it to the analysis of genomic data. While a large number of associations
usually come out of such datasets, many of the redundant with respect to each
other, by limiting the output to only the generic ones, our method helped focus
the analysts' attention to a smaller number of signi cant rules.</p>
      <p>As a next step, we are studying a more integrated approach for FCI/FG
construction that requires no extra matching step. This should result in substantial
e ciency gains. On the methodological side, our study underlines the duality
between generators and order w.r.t. FCIs: either can be used in combination
with FCIs to yield the other one. It rises the natural question of whether FCIs
alone, which are output by a range of frequent pattern miners, could be used to
e ciently retrieve rst precedence, and then FGs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          , R.:
          <article-title>Fast Algorithms for Mining Association Rules in Large Databases</article-title>
          .
          <source>In: Proc. of the 20th Intl. Conf. on Very Large Data Bases (VLDB '94)</source>
          , San Francisco, CA, Morgan Kaufmann (
          <year>1994</year>
          )
          <volume>487</volume>
          {
          <fpage>499</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kryszkiewicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Concise Representations of Association Rules</article-title>
          .
          <source>In: Proc. of the ESF Exploratory Workshop on Pattern Detection and Discovery</source>
          . (
          <year>2002</year>
          )
          <volume>92</volume>
          {
          <fpage>109</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Mining Minimal Non-Redundant Association Rules Using Frequent Closed Itemsets</article-title>
          .
          <source>In: Proc. of the Computational Logic (CL '00)</source>
          .
          <article-title>Volume 1861 of LNAI</article-title>
          ., Springer (
          <year>2000</year>
          )
          <volume>972</volume>
          {
          <fpage>986</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Discovering Frequent Closed Itemsets for Association Rules</article-title>
          .
          <source>In: Proc. of the 7th Intl. Conf. on Database Theory (ICDT '99)</source>
          , Jerusalem, Israel (
          <year>1999</year>
          )
          <volume>398</volume>
          {
          <fpage>416</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Computing Iceberg Concept Lattices with Titanic. Data and Knowl</article-title>
          . Eng.
          <volume>42</volume>
          (
          <issue>2</issue>
          ) (
          <year>2002</year>
          )
          <volume>189</volume>
          {
          <fpage>222</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsiao</surname>
            ,
            <given-names>C.J.: CHARM</given-names>
          </string-name>
          :
          <article-title>An E cient Algorithm for Closed Itemset Mining</article-title>
          .
          <source>In: SIAM Intl. Conf. on Data Mining (SDM' 02)</source>
          .
          <source>(Apr</source>
          <year>2002</year>
          )
          <volume>33</volume>
          {
          <fpage>43</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.: Mining</given-names>
          </string-name>
          <string-name>
            <surname>Non-Redundant Association Rules</surname>
          </string-name>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ) (
          <year>2004</year>
          )
          <volume>223</volume>
          {
          <fpage>248</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsiao</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>E cient Algorithms for Mining Closed Itemsets and Their Lattice Structure</article-title>
          .
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .
          <volume>17</volume>
          (
          <issue>4</issue>
          ) (
          <year>2005</year>
          )
          <volume>462</volume>
          {
          <fpage>478</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramakrishnan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Reasoning about Sets using Redescription Mining</article-title>
          .
          <source>In: Proc. of the 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD '05)</source>
          , Chicago, IL, USA (
          <year>2005</year>
          )
          <volume>364</volume>
          {
          <fpage>373</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Godin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>An incremental concept formation approach for learning from databases</article-title>
          .
          <source>Theoretical Computer Science Journal (133)</source>
          (
          <year>1994</year>
          )
          <volume>387</volume>
          {
          <fpage>419</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pfaltz</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>Incremental Transformation of Lattices: A Key to E ective Knowledge Discovery</article-title>
          .
          <source>In: Proc. of the First Intl. Conf. on Graph Transformation (ICGT '02)</source>
          , Barcelona, Spain (Oct
          <year>2002</year>
          )
          <volume>351</volume>
          {
          <fpage>362</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Le Floc'h</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fisette</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
          </string-name>
          , R.:
          <article-title>JEN : un algorithme e cace de construction de generateurs pour l'identi cation des regles d'association</article-title>
          . Nouvelles Technologies de l'
          <source>Information</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>2003</year>
          )
          <volume>135</volume>
          {
          <fpage>146</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
          </string-name>
          , R.:
          <article-title>E cient Vertical Mining of Frequent Closures and Generators</article-title>
          .
          <source>In: Proc. of the 8th Intl. Symposium on Intelligent Data Analysis (IDA '09)</source>
          . Volume 5772 of LNCS.,
          <string-name>
            <surname>Lyon</surname>
          </string-name>
          , France, Springer (
          <year>2009</year>
          )
          <volume>393</volume>
          {
          <fpage>404</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
          </string-name>
          , R.:
          <article-title>Constructing Iceberg Lattices from Frequent Closures Using Generators</article-title>
          .
          <source>In: Discovery Science</source>
          . Volume
          <volume>5255</volume>
          of LNAI., Budapest, Hungary, Springer (
          <year>2008</year>
          )
          <volume>136</volume>
          {
          <fpage>147</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Calders</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigotti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>A Survey on Condensed Representations for Frequent Sets</article-title>
          . In Boulicaut,
          <string-name>
            <given-names>J.F.</given-names>
            ,
            <surname>Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Mannila</surname>
          </string-name>
          , H., eds.:
          <source>ConstraintBased Mining and Inductive Databases</source>
          . Volume
          <volume>3848</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2004</year>
          )
          <volume>64</volume>
          {
          <fpage>80</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Baixeries</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valtchev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godin</surname>
          </string-name>
          , R.:
          <article-title>Yet a Faster Algorithm for Building the Hasse Diagram of a Galois Lattice</article-title>
          .
          <source>In: Proc. of the 7th Intl. Conf. on Formal Concept Analysis (ICFCA '09)</source>
          . Volume 5548 of LNAI., Darmstadt, Germany, Springer (May
          <year>2009</year>
          )
          <volume>162</volume>
          {
          <fpage>177</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pasquier</surname>
          </string-name>
          , N.:
          <article-title>Mining association rules using formal concept analysis</article-title>
          .
          <source>In: Proc. of the 8th Intl. Conf. on Conceptual Structures (ICCS '00)</source>
          , Shaker-Verlag (
          <year>Aug 2000</year>
          )
          <volume>259</volume>
          {
          <fpage>264</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Berge</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Hypergraphs: Combinatorics of Finite Sets. North Holland, Amsterdam (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .,
          <string-name>
            <surname>Mao</surname>
          </string-name>
          , R.:
          <article-title>CLOSET: An E cient Algorithm for Mining Frequent Closed Itemsets</article-title>
          .
          <source>In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery</source>
          . (
          <year>2000</year>
          )
          <volume>21</volume>
          {
          <fpage>30</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parthasarathy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogihara</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>New Algorithms for Fast Discovery of Association Rules</article-title>
          .
          <source>In: Proc. of the 3rd Intl. Conf. on Knowledge Discovery in Databases. (August</source>
          <year>1997</year>
          )
          <volume>283</volume>
          {
          <fpage>286</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal concept analysis: mathematical foundations</source>
          . Springer, Berlin/Heidelberg (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Calders</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goethals</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Depth- rst non-derivable itemset mining</article-title>
          .
          <source>In: Proc. of the SIAM Intl. Conf. on Data Mining (SDM '05)</source>
          , Newport Beach, USA. (
          <year>Apr 2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>ZART: A Multifunctional Itemset Mining Algorithm</article-title>
          .
          <source>In: Proc. of the 5th Intl. Conf. on Concept Lattices and Their Applications (CLA '07)</source>
          , Montpellier, France (Oct
          <year>2007</year>
          )
          <volume>26</volume>
          {
          <fpage>37</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gouda</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Fast vertical mining using di sets</article-title>
          .
          <source>In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining (KDD '03)</source>
          , New York, NY, USA, ACM Press (
          <year>2003</year>
          )
          <volume>326</volume>
          {
          <fpage>335</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Pfaltz</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , C.M.:
          <article-title>Scienti c Knowledge Discovery through Iterative Transformation of Concept Lattices</article-title>
          .
          <source>In: Proc. of the SIAM Workshop on Data Mining and Discrete Mathematics</source>
          , Arlington,
          <string-name>
            <surname>VA</surname>
          </string-name>
          , USA (
          <year>2002</year>
          )
          <volume>65</volume>
          {
          <fpage>74</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Philippon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arlet</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacoby</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Plasmid-Determined AmpC-Type - Lactamases</surname>
          </string-name>
          .
          <source>Antimicrobial Agents and Chemotherapy</source>
          <volume>46</volume>
          (
          <issue>1</issue>
          ) (
          <year>2002</year>
          )
          <volume>1</volume>
          {
          <fpage>11</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohnen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jansen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obst</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Detection of antibiotic-resistant bacteria and their resistance genes in wastewater, surface water, and drinking water bio lms</article-title>
          .
          <source>Microbiology Ecology</source>
          <volume>43</volume>
          (
          <issue>3</issue>
          ) (
          <year>2003</year>
          )
          <volume>325</volume>
          {
          <fpage>335</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>