<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DFSP: A new algorithm for a swift computation of formal concept set stability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ilyes DIMASSI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amira MOUAKHER</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadok BEN YAHIA</string-name>
          <email>sadok.benyahia@fst.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Tunis El Manar, LIPAH, Faculty of Sciences of Tunis</institution>
          ,
          <addr-line>Tunis</addr-line>
          ,
          <country country="TN">Tunisia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Concept lattices are very useful for the task of knowledge discovery in databases. However, the overwhelming number of drawn formal concepts was always an actual hamper towards their effective use. In the aim of filtering out, such endless lists of formal concepts, the stability metric is the most worth of mention one. In this respect, the stability computation of large concepts has been shown to be infeasible due to exponential number of object sets to be processed. The literature only witnesses approaches for the stability computation that heavily rely on the existence of the Galois lattice. In this paper, we introduce a new efficient algorithm, called DFSP, for computing the stability of a set of formal concepts without having at hand the underlying partial relation. The main thrust of the introduced algorithm stands in the smart detection of non generators and their pruning owe to their fulfilment of monotony property within a given equivalence class. To the best of our knowledge, DFSP is the first algorithm that tackled such tough issue. Carried out experiments showed that DFSP efficiently computes the scalability of very large formal concepts extracted from benchmark datasets of the Data Mining field.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction and motivations</title>
      <sec id="sec-1-1">
        <title>Concept lattices are very useful for the task of knowledge discovery in databases. How</title>
        <p>ever, this field is hampered by the overwhelming size of formal concept lists drawn from
even moderately sized contexts. In this respect, the stability index has been shown to be
efficient for throwing away ”bad” concepts. Nevertheless, the computation of such
stability index is very consuming and has been shown to be NP-Complete task. Thus, the</p>
      </sec>
      <sec id="sec-1-2">
        <title>FCA community paid much attention to the computation of exact and/or approximative</title>
        <p>of the stability as could witness the recent publications on such issue [1, 2]. At a glance,
the recent proposals of approximations of stability computations unveil the actual
complexity of such a task. Roughly speaking, the computation of the stability of a concept
C = (A, B) comes back to the exploration to a huge space made of the power set of the
extension part. In this space, we have to record all the elements in snugness connection
with the corresponding part. Clearly, even for an extent with dozens of objects, it does
not exit a primitive type for storing the value of a stability1.</p>
      </sec>
      <sec id="sec-1-3">
        <title>At a glance, the related work flags out approaches that compute the stability of a set</title>
        <p>of formal concepts organized through the Galois lattice. Doing so, they start computing</p>
      </sec>
      <sec id="sec-1-4">
        <title>1 The GMP library, https://gmplib.org/, could be of use, in such a case, to store huge values.</title>
        <p>the stability of smallest formal concept (in extent’s size terms) and exploit this result
for the subsumer concepts until reaching the top formal concept.</p>
        <p>In this paper, we introduce a new algorithm, called DFSP, that aims to an efficient
straightforward computation of a set of formal concepts. The main thrust of the
introduced algorithm stands in the smart detection of non generators and their pruning owe
to their fulfilment of monotony property within a given equivalence class. Indeed, we
introduce the notion of saturation of non-generators through the detection of the
maximal set of a non-generators. Given that each subset of a non generator is also a non
generator, the DFSP algorithms sweeps the search space in depth first manner and only
stresses on the generators by avoid squandering its efforts on useless non-generators
subspaces.</p>
      </sec>
      <sec id="sec-1-5">
        <title>The carried out experiments highlight that DFSP easily handles formal concepts having thousands of objects in their extent part. To the best of our knowledge, DFSP is the first algorithm that handles efficiently and straightforwardly formal concepts for the stability computation.</title>
      </sec>
      <sec id="sec-1-6">
        <title>The remainder of the paper is organized as follows: The next section recalls key</title>
        <p>notions used throughout this paper as well as the pioneering approaches of the related
work. Then, we present in section 3 our algorithm for computing the stability of a set
of formal concepts, called DFSP. Section 4 describes the experimental study and the
results we obtained. Section 5 concludes the paper and points out our future work.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Stability computation: Scrutiny of the Related work</title>
      <sec id="sec-2-1">
        <title>Before scrutinizing the related work that paid attention the stability computation, we provide a simplified definition of some concepts used throughout in this paper, by supposing that the the reader is familiar with FCA basic settings.</title>
      </sec>
      <sec id="sec-2-2">
        <title>Definition 1. (MONOTONIC / ANTI-MONOTONIC CONSTRAINT) Let Q be a con</title>
        <p>straint,
• Q is anti-monotonic if ∀I ⊆ I, ∀I1 ⊆ I : I fulfils Q ⇒ I1 fulfils Q.</p>
        <p>• Q is monotonic if ∀ I ⊆ I, ∀ I1 ⊇ I: I fulfils Q ⇒ I1 fulfils Q.</p>
        <p>Definition 2. (EQUIVALENCE CLASS) An equivalence class is a set of itemsets with
same closure (and same image). Let C=(A, B) be a formal concept, for any subset
X ⊆ O, A = X ′′ is the largest tidset w.r.t. set inclusion in its equivalence class.
Precisely, A ⊆ O is closed iff ∄X such as X ⊂ A with X ′ = A′; X ⊆ O is a
generator iff ∄U ⊂ X with U ′ = X ′. GA = {X ⊆ A|X ′ = B} is the set of all
generators in the equivalence class.</p>
        <p>Definition 3. (EXTENT FULL SPACE) Let K = (O, I, R) be a formal context, B(K)
its concept lattice and C = (A, B) a concept from B(K) where |A| = n. P (A) =
{X |X ⊆ A} is the set of all A’s subsets and P (A) = 2|A|.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Stability has been introduced probably for the first time by Kuznetsov [3] and later revisited in [4, 5]. This measure seems to be the most widely used around the FCA community and is applied in numerous applications [6], e.g. biclustering, detection of scientific subcommunities, to cite but a few. Formally, it is defined as follows:</title>
        <p>Definition 4. (STABILITY) The stability index of a given concept describes the
proportion of subsets of its objects whose closure is equal to the intent of this concept.
This metric reflects the dependency of the intent on particular objects of the extent [4].
Let K = (O, I, R) be a formal context and C = (A, B) a formal concept of K. The
stability index, σ, of C is defined as follows:
σ(A, B) =
|{C ⊆ A|C′ = B}|
2|A|</p>
        <p>
          |GA|
= 2|A|
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
The authors of [7] highlighted that a concept that covers fewer objects is normally less
stable than do a concept covering more objects.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>However, the main hamper towards its intensive use mainly for large datasets, is the</title>
        <p>complexity of its computation. In fact, it’s shown to be a #P-complete problem [3, 5]. In
order to compute it for large concept lattices, several works proposed to use estimates
and approximations however others tried to find an exact solution using the concept
lattice in computing stability.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Roth et al. [8] paid attention to concept’s stability as well as other metrics to reduce</title>
        <p>the size of large concept lattice. They proposed an exact and polynomial algorithm
COMPUTESTABILITY that computes the stability indices for every concept of the
lattice using the covering graph of a concept lattice. The algorithm traverses the covering
graph from the bottom concept upwards. A concept is processed only after the stability
indices of all its sub-concepts have been computed. The main drawback of this
algorithm that it is essentially quadratic in the number of concepts in the lattice, which may
be prohibitively expensive for large lattices. In addition, this algorithm inputs a Galois
lattice and such a requirement could not be easily available for very large datasets may
be impractical.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Kuznetsov [5] introduced a polynomial algorithm for computing stability using var</title>
        <p>ious methods of estimating scientific hypotheses. This algorithm is considered as
optimal in the sense that its time complexity is linear and polynomial in the size of the
context. Nevertheless, this approach only gives an approximate assessment about the
stability index and could not be efficient in exact studies.</p>
      </sec>
      <sec id="sec-2-7">
        <title>Later, Jay et al. [9] applied the concept of stability and iceberg lattices in social</title>
        <p>network analysis. They used the stability metric to reduce the complexity of the lattice,
by filtering out all unstable concepts w.r.t. a given threshold. In this respect, the authors
introduced a new definition of the stability using the equivalence classes. Given a
concept (A, B), the stability metric measures the number of elements of G that are in the
same equivalence class of A where an equivalence class is defined as follows:</p>
      </sec>
      <sec id="sec-2-8">
        <title>Property 1. Using Definition 4, the authors proved the following property:</title>
        <p>σ(A, B) = 1 −
σ(X, X ′)2|X|−|A|
X
X⊂A,X=X′′</p>
      </sec>
      <sec id="sec-2-9">
        <title>So, once the lattice concept is given, it is possible to compute quickly the stability of concepts using an ascending algorithm.</title>
      </sec>
      <sec id="sec-2-10">
        <title>Roth et al. [10] proposed an algorithm, based on a polynomial heuristic for comput</title>
        <p>ing stability index for all concepts using the concept lattice. This algorithm was quite
good in practical applications so far, but in the worst case its complexity is quadratic in
the size of the lattice.</p>
      </sec>
      <sec id="sec-2-11">
        <title>Later, Babin and Kuznetsov [1] also suggested a method for approximating con</title>
        <p>cept stability based on a Monte Carlo approach. Their approximate algorithm can run
in reasonable time. In their approach, they specified a new parameter called stability
threshold to reduce the number of the concepts. The results show that the
approximations are better when stability threshold is low.</p>
      </sec>
      <sec id="sec-2-12">
        <title>Recently, Buzmakov et al. [2] introduced an efficient way for finding a ”good” as</title>
        <p>sessment of concept stability. The authors combined the bounding method [9] as well
as the Monte Carlo method [1] in a complementary way. Once the stability bounds are
computed in the lattice, the method that should be applied is chosen according to the
most tight of each them.</p>
        <p>The main criticism that can be made about the literature’s approaches stands in
the fact that they are unable to compute stability of concepts in absence of order
relation. In fact, the lattice structure is a sine qua non condition to proceed with the
computation. Beside that, such computation of concept’s stability requires visiting all its
sub-concepts, i.e., direct and non direct ones. Clearly, doing so is very greedy in
computations and memory usage. Nevertheless, building concept lattice is very far from
being an easy task [11] and sometimes impossible. Furthermore, the cost of generating
a lattice concept remains high as far as the context is composed of a large number of
objects [12] and/or a dense incidence relation.</p>
      </sec>
      <sec id="sec-2-13">
        <title>Thus, we introduce a new efficient algorithm, called DFSP that allows computing</title>
        <p>the stability for a given set of concepts. The latter doesn’t need any partial relation
between the concepts. The main thrust of the introduced algorithm stands in the smart
detection of non generators and their pruning owe to their fulfilment of monotony
property within a given equivalence class.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>DFSP : Depth First Stability Processor algorithm</title>
      <sec id="sec-3-1">
        <title>Before, thoroughly describing the DFSP algorithm, we start by introducing some useful notations that are used in the remainder.</title>
        <p>Let C = (A, B) be a formal concept for which the stability index needs to be
processed. DFSP algorithm organises P (A) according to a tidset prefix tree. Each
exploration node in the tree is specific to a tidset and represented by the TSNode data
structure. TSNode is a recursive structure that keeps track of useful informations about
its associated tidset such as its suffix, itemset and a set of immediate supersets. As for
the current tidset, its immediate supersets are themselves represented each by a TSNode
instance and so on.</p>
        <p>Definition 5. (SUFFIX OF TIDSET) Let ts = {t1, t2 . . . tk} be an ordered sequence of
objects and nts its associated exploration node. Suf f (ts) = tk is the last object in ts.
TSNode.s is the member property of the data structure TSNode in which Suf f (ts) is
maintained ; nts.s = tk.
Definition 6. (CHILDREN’S NODE) TSNode.ss is a member of the TSNode structure
that holds a set of TSNode instances. TSNode.ss is the set of a TSNode immediate
children.</p>
        <p>Definition 7. (NODE INTENT) Let ts be a tidset and nts its associated node. TSNode.is
is a TSNode member that holds the tidset image. nts.is = ts′.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Unlike nts.s which only holds the tidset suffix Suf f (ts), nts.is integrally main</title>
        <p>tains ts′.</p>
        <p>Example 1. With respect to the formal context depicted in Table 1, we have ts = {123}
and n123 its associated node, then n123.s = 3. In addition, we have : n123.ss={n1234,
n1235, n1236, n1237, n1238, n1239}. Besides, we also have n123.is = ts′ = {123}′ =
{g}.</p>
      </sec>
      <sec id="sec-3-3">
        <title>In the following, we present a detailed description of DFSP algorithm. Let us re</title>
        <p>mind that the main idea of our new approach is to provide a simple and very efficient
strategy for computing stability through generators enumeration. The DFSP algorithm,
whose pseudo-code is sketched by Algorithm 1, operates mainly as follows:</p>
        <p>Initially, the sizes of the extent and the intent are stored respectively into n and m
(lines 2,3). The root node is then built and it’s root.s and root.is members are set to ∅
(line 4). Then, the objects ai of A are scanned in order to build the first level nodes. For
each object ai, a child node is created, its member child.s is set to ai and its member
child.is is set to the intent of ai (line 6, 7). If the size of the intent a′i is different from
m the size of B, then child is a non generator and is added to root.ss the set of the
root node immediate children (c.f lines 8, 9). Then the first level generators count is
determined using the generators counting formula (line 10). After that, the exploration
of space of tidset through the EXPLORETIDSET function updates gc one last time to
obtain the final count of generators (line 11). The stability index is determined when
the generators count gc is divided by the overall tidset count (line 12).</p>
      </sec>
      <sec id="sec-3-4">
        <title>In the following part, we will explain the fundamental step of the algorithm which is the recursive exploration of tidset’s space.</title>
      </sec>
      <sec id="sec-3-5">
        <title>Algorithm 1: Depth First Stability Processor (DFSP)</title>
        <sec id="sec-3-5-1">
          <title>Data: TSNode</title>
          <p>-TSNode.s: the tidset suffix
-TSNode.is: the tidset intent
-TSNode.ss: the set of immediate children nodes.</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>Input:</title>
          <p>-K = (O, I, R): a formal context.
-C = (A, B): a formal concept.</p>
          <p>Results:
-S: the stability of C.
1 Begin
2 n := |A|;
3 m := |B|;
4 root.i := root.is := ∅;
5 For i = 1 . . . n do
6 nchild.s := ai;
7 nchild.is := a′i;
8 If |nchild.is|! = m then
9 root.ss∪ = nchild;
10 gc := Pin=−|1root.ss| 2i;
11 gc := gc + EXPLORETIDSET(root.ss, K, m);
12 S := 2gnc ;
13 Return S;
14 End
3.1</p>
        </sec>
        <sec id="sec-3-5-3">
          <title>Depth First exploration of the Tidset space</title>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>The main goal of EXPLORETIDSET which pseudo-code is sketched through Algorithm</title>
      </sec>
      <sec id="sec-3-7">
        <title>2 is counting generators while minimizing as much as possible the visited tidsets. This is achieved by pruning generators and most importantly by detecting ”prunable” non generators. The first invocation for EXPLORETIDSET (c.f line 11 of DFSP) is applied on the root node immediate children.</title>
      </sec>
      <sec id="sec-3-8">
        <title>The tidset space exploration pattern The exploration mechanics are straightforward.</title>
      </sec>
      <sec id="sec-3-9">
        <title>To harness this process, lets ignore any possible optimisation that leads to nodes prun</title>
        <p>ing. On the first invocation of EXPLORETIDSET in DFSP, ss contains the nodes {na1,
. . . , nan} associated to unitary first level tidsets {a1},. . . , {an} for which
EXPLORETIDSET builds immediate children as follows: na1a2 the first immediate child of na1 is
built by adding the suffix of na2 to na1. More generally, naiaj the jth immediate child
of nai (line 12) is obtained by adding to nai the suffix of na(i+j) the jth node
following nai (line 15). For the tidset {aiaj} associated to naiaj , only Suf f ({aiaj})
is stored in naiaj .s (line 16). Since, {aiaj} = {ai}|Suf f ({aj}) then {aiaj}′ =
{ai}′ T Suf f ({aj})′. {aiaj}′ is stored integrally in naiaj .is (line 17). After building
nai.ss from nai followings, EXPLORETIDSET is recursively applied on nai.ss (line 22)
which only makes sense when |nai.ss| has at least 2 elements (line 21). After
processing the nai subspace, EXPLORETIDSET moves to the next node na(i+1) (line 12). The
last node is nan is not processed as it has no followings.</p>
        <p>Counting and pruning generators As described above, the generation process builds
a child node by adding an object to its parent node. A child node is therefore always
a superset of its parent. Otherwise, it is known that a superset of a generator is also
a generator. Therefore, applying the exploration process on a generator node will
inevitably produce generators. The exploration branch starting from that node is said to
be monotonous and since we are able to count the population induced from that branch
we can save processing power by dismissing these nodes (line 8 in DFSP and line 18
in EXPLORETIDSET). Let’s find out how it is possible to count generators that are
inferred from a given generator without the need of exploring them. Let ni be the ith
node in ss and ni is a generator. Building the ni subspace by exploring its immediate
children then its children’s children and so on recursively is equivalent to generating all
possible supersets of the tidset associated to ni using the suffixes of ni following nodes
{n(i+1).s,. . . , n(i+1).s}. The count of all generators in ni branch (including even ni) is
equal 2(n−i−1).</p>
        <p>We have to also consider supersets of ni that are not part of ni branch but rather
in nk branches {k &gt;= 1 and k &lt; i} the branches of all nodes that precedes ni in
ss. To avoid locating these nodes and simplify calculations, let’s virtually move ni
to the start of ss. All supersets of ni are now confined in ni branch and the updated
count of ni supersets is 2(n−1). Let nj be another generator in the same cluster. It
is important to count all nj supersets while avoid including elements that are already
counted as part of the ni branch. By virtually moving nj after ni in ss and counting
all elements in the nj branch, it is possible to fulfil both conditions. nj branch count
is 2(n−2) and the same process is applied to the remaining generators in the cluster.
Doing so leads us to the generalized generators counting formula gc = P|kg=s||ss|−1 2k
where |gs| is the generators count and |ss| is cluster size (line 10 in DFSP and line 20
in EXPLORETIDSET).</p>
      </sec>
      <sec id="sec-3-10">
        <title>Detecting non generators monotony The most significant mop up mechanism in</title>
      </sec>
      <sec id="sec-3-11">
        <title>DFSP is non generators pruning. In order to also eliminate non generators, EXPLORETID</title>
        <p>SET looks for nodes in ss that when combined together, the resulting clique superset
will still be a non generator. Those nodes are said to form a non generator monotonous
clique. Suppose, we’re building the branch of a node from this clique. If we use
exclusively nodes from the clique, all nodes in the branch are guaranteed to be subsets of
the clique superset. Since a subset of a non generator is also a non generator then all
branches in the clique will only contain non generators. Nodes in the clique are pushed
at the end of the ss set to insure that the generation process will only use nodes from
the clique. Nodes outside the clique are moved away to the beginning of ss. Nodes in
the clique are not expanded, since no generator could be found in their branches but are
still used to build branches outside the clique.</p>
      </sec>
      <sec id="sec-3-12">
        <title>Algorithm 2: EXPLORETIDSET</title>
        <sec id="sec-3-12-1">
          <title>Input:</title>
          <p>-K = (O, I, R): a formal context.
-ss=a set of TSNode siblings.
-m: the size of the intent.</p>
          <p>Results:
-gc : the generators count.
1 Begin
2 i := ssc := |ss|;
3 ingpc := gc := 0;
4 ingpi := I;
5 While i &lt; 1 do
6 If |ngpi ∩ ss[i].is| = m then</p>
        </sec>
      </sec>
      <sec id="sec-3-13">
        <title>7 MOVETOHEAD(i, ss);</title>
        <p>8 ingpc := ingpc + 1;</p>
        <sec id="sec-3-13-1">
          <title>Else</title>
          <p>i := i − 1;
ngpi := ngpi ∩ ss[i].is;
23 Return gc;
24 End</p>
        </sec>
        <sec id="sec-3-13-2">
          <title>3.2 Illustrative example</title>
          <p>To illustrate our approach, let us consider the formal concept C1 = (A1, B1) from the
formal context depicted by Table 1 such that A1 = {3, 4, 5, 6, 7, 9} and B1 = {f, g}.</p>
        </sec>
      </sec>
      <sec id="sec-3-14">
        <title>As shown in figure 1, the DFSP algorithm operates as follows:</title>
      </sec>
      <sec id="sec-3-15">
        <title>During the first step (1), the root node is created and initialized though the function</title>
        <p>
          BUILDTREEROOT (gc=0). Initially, root.s = ∅ and nodes n3, n4, n5, n6, n7 will be
created through individual elements of {3, 4, 5, 6, 7, 9} (steps (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ), (
          <xref ref-type="bibr" rid="ref4">3</xref>
          ), (
          <xref ref-type="bibr" rid="ref5">4</xref>
          ), (
          <xref ref-type="bibr" rid="ref6">5</xref>
          ), (
          <xref ref-type="bibr" rid="ref7">6</xref>
          ) and
(
          <xref ref-type="bibr" rid="ref8">7</xref>
          )). These nodes are prospective direct children to the root node. Given that all these
nodes are non generators, they become in step (
          <xref ref-type="bibr" rid="ref9">8</xref>
          ) as effective direct children of root and
are decreasingly sorted with respect to their support value. In step (
          <xref ref-type="bibr" rid="ref10">9</xref>
          ), non generators
forming monotone clique are placed at the end of the list and marked by (*). However,
instable generators are placed at the beginning of the list and marked by (+). After that,
in steps (
          <xref ref-type="bibr" rid="ref11">10</xref>
          ), (
          <xref ref-type="bibr" rid="ref12">11</xref>
          ), (
          <xref ref-type="bibr" rid="ref13">12</xref>
          ), (13) and (14) prospective direct children of node n3 are created
which are, respectively, n36, n39, n34, n35 and n37. The count of generators is updated
in step (15) (gc = 24 +23 = 24). Only nodes n34, n35 and n39 are left as effective direct
children of n3. The latter are also sorted decreasingly. In step (16), all these effective
direct children form a monotone clique and exploration of this branch is stopped. After
that, nodes n69, n64, n65 and n67 are created and count of generators is also updated
in step (21) with the tree generators of n64, n65 and n67 (gc = gc + 23 + 22 + 21 =
24 + 14 = 38). Only the node n69 is kept in the list of effective direct children of n6.
        </p>
      </sec>
      <sec id="sec-3-16">
        <title>Indeed, the latter does not fulfil the condition of EXPLORETIDSET to be launched.</title>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental results</title>
      <sec id="sec-4-1">
        <title>In this section, we put the focus on the evaluation of the DFSP algorithm by stressing on</title>
        <p>two complementary aspects : (i)Execution time; (ii) efficiency of search space pruning.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experiments were carried out on an Intel Xeon PC, CPU E5-2630 2,30 GHz with 16</title>
      </sec>
      <sec id="sec-4-3">
        <title>GB of RAM and Linux system. During the lead experiments, we used some benchmark</title>
        <p>datasets commonly of extensive use within Data mining. The first three datasets are
considered as dense ones, i.e., yielding high number of formal concepts even for a
small number of objects and attributes, while the other ones are considered as sparse.</p>
      </sec>
      <sec id="sec-4-4">
        <title>The characteristics of these datasets are summarized by Table 2. Thus, for each dataset</title>
        <p>we report its number of objects, the number of attributes, as well as the number of
all formal concepts that may drawn. In addition, we also reported the respective sizes
of the smallest and the largest formal concepts (in terms of extent’s size). For these
considered concepts, we kept track of the number of the actually explored nodes as
well as the execution time (the column denoted |explor.|).</p>
      </sec>
      <sec id="sec-4-5">
        <title>At a glance, statistics show that the DFSP algorithm is able to process dozens of</title>
        <p>thousands of objects in a reasonable time. Indeed, the 15596 (respec. 16040) objects
composing the extent of the largest formal concept extracted from the RETAIL
(respec. T10I4D100K) dataset are handled in only 27.27 (respec. 68.85) seconds. Even
though, the respective cardinalities are close (15596 vs 16040 objects), the difference
in execution time is not proportional to this low gap. A preliminary explanation could
be the difference in density of both datasets (RETAIL is dense while T10I4D100K is
a sparse one). A in-depth study of these performances in connection to the nature of
datasets is currently carried out. The most sighting fact is the low number of visited
nodes in the associated search space. For example for the MUSHROOM dataset, DFSP
algorithm actually handled only 83918 nodes from 21000 potential nodes of the search
space, i.e., in numerical terms it comes to only explore infinitely insignificant part equal
to 7.8 × 10−297 of the search space. The case of RETAIL and T10I4D100K datasets is
also worth of mention. For the respective smallest extracted concepts, DFSP algorithm
only explores, 1.14 × 10(−45) and 1.5 × 10(−90) parts of the respective search spaces.</p>
        <sec id="sec-4-5-1">
          <title>Datasets</title>
        </sec>
      </sec>
      <sec id="sec-4-6">
        <title>CHESS</title>
      </sec>
      <sec id="sec-4-7">
        <title>MUSHROOM</title>
      </sec>
      <sec id="sec-4-8">
        <title>RETAIL T10I4D100K T40I10D100K</title>
        <p>These highlights are also confirmed by Figures 2-11. Indeed, Figures 2, 4, 6, 8 and
10 stress on the variation of the Execution time, while Figures 3, 5, 7, 9 and 11 assess
what we call the workload which means the efficiency of search space exploration. At
a glance, the execution time is in a snugness connection with the reduction of search
space, i.e., the variation of the workload has the same tendency as the performance
since we consider the visited tidset in the search space as the processing unit. Worth
of mention, the performance is rather correlated to the extent’s size rather than the
exponential nature of the search space.</p>
        <p>Scaleup</p>
        <p>Trend (Scaleup)</p>
        <p>Workload
10.4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <sec id="sec-5-1">
        <title>Through the DFSP algorithm, we gaped in the combinatorics of lattices by the show</title>
        <p>ing that most of this sear space could be smartly explored thanks to the saturation of
generators. The swift computation of stability encouraged us to integrate the stability
as a on-the-fly pruning strategy during mining closed itemsets. We are currently
working on a new algorithm for the stability computation given the Galois lattice. The new
algorithm only relies on the direct sub-concepts to compute the stability of a concept.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Outside the FCA field, the strategy of DFSP would be of benefit for very efficient extraction well known problem of combinatorics : minimal transversals.</title>
        <p>Scaleup
Trend (Scaleup)
18
1
3
Fig. 9. T10I4D100K workload</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Babin</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Approximating concept stability</article-title>
          .
          <source>In: Proceedings of the 11th International Conference on Formal Concept Analysis(ICFCA)</source>
          , Dresden, Germany. (
          <year>2012</year>
          )
          <fpage>7</fpage>
          -
          <lpage>15</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Buzmakov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Scalable estimates of concept stability</article-title>
          .
          <source>In: Proceedings of the 12th International Conference on Formal Concept Analysis(ICFCA)</source>
          , ClujNapoca, Romania. (
          <year>2014</year>
          )
          <fpage>157</fpage>
          -
          <lpage>172</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>0.5 Fig. 11. T40I10D100K workload</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Stability as an estimate of the degree of substantiation of hypotheses derived on the basis of operational similarity</article-title>
          .
          <source>Automatic Documentation and Mathematical Linguistics</source>
          <volume>24</volume>
          (
          <year>1990</year>
          )
          <fpage>62</fpage>
          -
          <lpage>75</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Reducing the representation complexity of lattice-based taxonomies</article-title>
          .
          <source>In: Proceedings of the 15th International Conference on Conceptual Structures (ICCS)</source>
          , Sheffield, UK. (
          <year>2007</year>
          )
          <fpage>241</fpage>
          -
          <lpage>254</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>On stability of a formal concept</article-title>
          .
          <source>Ann. Math. Artif. Intell</source>
          .
          <volume>49</volume>
          (
          <year>2007</year>
          )
          <fpage>101</fpage>
          -
          <lpage>115</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          6.
          <string-name>
            <surname>Buzmakov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Is concept stability a measure for pattern selection?</article-title>
          <source>Procedia Computer Science</source>
          <volume>31</volume>
          (
          <year>2014</year>
          )
          <fpage>918</fpage>
          -
          <lpage>927</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          7.
          <string-name>
            <surname>Klimushkin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Approaches to the selection of relevant concepts in the case of noisy data</article-title>
          .
          <source>In: Proceedings of the 8th International Conference(ICFCA)</source>
          , Agadir,
          <string-name>
            <surname>Morocco.</surname>
          </string-name>
          (
          <year>2010</year>
          )
          <fpage>255</fpage>
          -
          <lpage>266</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          8.
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kourie</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Towards concise representation for taxonomies of epistemic communities</article-title>
          .
          <source>In: Proceedings of the 4th International Conference on Concept Lattices and Their Applications (CLA)</source>
          , Hammamet,
          <string-name>
            <surname>Tunisia.</surname>
          </string-name>
          (
          <year>2006</year>
          )
          <fpage>240</fpage>
          -
          <lpage>255</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jay</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohler</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Analysis of social communities with iceberg and stabilitybased concept lattices</article-title>
          .
          <source>In: Proceedings of the 6th International Conference(ICFCA)</source>
          , Montreal, Canada. (
          <year>2008</year>
          )
          <fpage>258</fpage>
          -
          <lpage>272</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          10.
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kourie</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>On succinct representation of knowledge community taxonomies with formal concept analysis</article-title>
          .
          <source>Int. J. Found. Comput. Sci</source>
          .
          <volume>19</volume>
          (
          <year>2008</year>
          )
          <fpage>383</fpage>
          -
          <lpage>404</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          11.
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>S.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wen</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.G.</given-names>
          </string-name>
          :
          <article-title>A fast algorithm for building concept lattice</article-title>
          . (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          12.
          <string-name>
            <surname>Demko</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertet</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Generation algorithm of a concept lattice with limited object access</article-title>
          .
          <source>In: Proceedings of the 8th International Conference Concept Lattices and Their Applications (CLA)</source>
          , Nancy, France. (
          <year>2011</year>
          )
          <fpage>239</fpage>
          -
          <lpage>250</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>