<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fast Computation of Proper Premises</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Uwe Ryssel</string-name>
          <email>uwe.ryssel@tu-dresden.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felix Distel</string-name>
          <email>felix@tcs.inf.tu-dresden.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Borchmann</string-name>
          <email>borch@tcs.inf.tu-dresden.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Algebra, Technische Universität Dresden</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Applied Computer Science, Technische Universität Dresden</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Theoretical Computer Science, Technische Universität Dresden</institution>
          ,
          <addr-line>Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work is motivated by an application related to refactoring of model variants. In this application an implicational base needs to be computed, and runtime is more crucial than minimal cardinality. Since the usual stem base algorithms have proven to be too costly in terms of runtime, we have developed a new algorithm for the fast computation of proper premises. It is based on a known link between proper premises and minimal hypergraph transversals. Two further improvements are made, which reduce the number of proper premises that are obtained multiple times and redundancies within the set of proper premises. We provide heuristic evidence that an approach based on proper premises will also be beneficial for other applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Today, graph-like structures are used in many model languages to specify
algorithms or problems in a more readable way. Examples are data-flow-oriented
simulation models, such as MATLAB/Simulink, state diagrams, and diagrams
of electrical networks. Generally, such models consist of blocks or elements and
connections among them. Using techniques described in Section 5.2, a formal
context can be obtained from such models. By computing an implicational base
of this context, dependencies among model artifacts can be uncovered. These
can help to represent a large number of model variants in a structured way.</p>
      <p>
        For many years, computing the stem base has been the default method for
extracting a small but complete set of implications from a formal context. There
exist mainly two algorithms to achieve this [
        <xref ref-type="bibr" rid="ref10 ref15">10,15</xref>
        ], and both of them compute
not only the implications from the stem base, but also concept intents. This is
problematic as a context may have exponentially many concept intents. Recent
theoretical results suggest that existing approaches for computing the stem base
may not lead to algorithms with better worst-case complexity [
        <xref ref-type="bibr" rid="ref1 ref6">6,1</xref>
        ].
      </p>
      <p>Bearing this in mind, we focus on proper premises. Just like pseudo-intents,
that are used to obtain the stem base, proper premises yield a sound and
complete set of implications. Because this set of implications does not have minimal
cardinality, proper premises have been outside the focus of the FCA community
for many years. However, there are substantial arguments to reconsider using
them. Existing methods for computing proper premises avoid computing
concept intents. Thus, in contexts with many concept intents they may have a clear
advantage in runtime over the stem base algorithms. This is particularly true
for our application where the number of concept intents is often close to the
theoretical maximum. Here, attributes often occur together with their negated
counterparts, and the concept lattice can contain several millions of elements.
In Section 5.1 we provide arguments that we can expect the number of
concept intents to be larger than the number of proper premises in most contexts,
assuming a uniform random distribution.</p>
      <p>
        Often, in applications, runtime is the limiting factor, not the size of the basis.
But even where minimal cardinality is a requirement, computing proper premises
is worth considering, since there are methods to transform a base into the stem
base in polynomial time [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>In this paper we present an algorithm for the fast computation of proper
premises. It is based on three ideas. The first idea is to use a simple connection
between proper premises and minimal hypergraph transversals. The problem
of enumerating minimal hypergraph transversals is well-researched. Exploiting
the link to proper premises allows us to use existing algorithms that are known
to behave well in practice. A first, naïve algorithm iterates over all attributes
and uses a black-box hypergraph algorithm to compute proper premises of each
attribute.</p>
      <p>A drawback when iterating over all attributes is that the same proper premise
may be computed several times for different attributes. So we introduce a
candidate filter in the second step: For each attribute m, the attribute set is filtered
and proper premises are searched only among the candidate attributes. We show
that this filtering method significantly reduces the number of multiple-computed
proper premises while maintaining completeness. In a third step we exploit the
fact that there are obvious redundancies within the proper premises. These can
be removed by searching for proper premises only among the meet-irreducible
attributes.</p>
      <p>We argue that our algorithms are trivial to parallelize, leading to further
speedups. Due to their incremental nature, parallelized versions of the stem base
algorithms are not known to date. We conclude by providing experimental
results. These show highly significant improvements for the contexts obtained from
the model refactoring application. For a sample context, where Next-Closure
required several hours to compute the stem base, runtime has dropped to fractions
of a second. For contexts from other applications the improvements are not as
impressive but still large.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>We provide a short summary of the most common definitions in formal concept
analysis. A formal context is a triple K = (G; M; I) where G is a set of objects,
M a set of attributes, and I G M is a relation that expresses whether an
object g 2 G has an attribute m 2 M . If A G is a set of objects then A0
denotes the set of all attributes that are shared among all objects in A, i.e.,
A0 = f m 2 M j 8g 2 G : gIm g. Likewise, for some set B M we define
B0 = f g 2 G j 8m 2 B : gIm g. Pairs of the form (A; B) where A0 = B and
B0 = A are called formal concepts. Formal concepts of the form (f m g0; f m g00)
for some attribute m 2 M are called attribute concept and are denoted by m.
We define the partial order on the set of all formal concepts of a context to
be the subset order on the first component. The first component of a formal
concept is called the concept extent while the second component is called the
concept intent.</p>
      <p>
        Formal concept analysis provides methods to mine implicational knowledge
from formal contexts. An implication is a pair (B1; B2) where B1; B2 M ,
usually denoted by B1 ! B2. We say that the implication B1 ! B2 holds in a
context K if B10 B20. An implication B1 ! B2 follows from a set of implications
L if for every context K in which all implications from L hold, B1 ! B2 also
holds. We say that L is sound for K if all implications from L hold in K, and
we say that L is complete for K if all implications that hold in K follow from
L. There exists a sound and complete set of implications for each context which
has minimal cardinality [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This is called the stem base. The exact definition
of the stem base is outside the scope of this work.
      </p>
      <p>A sound and complete set of implications can also be obtained using proper
premises. For a given set of attributes B M , define B to be the set of those
attributes in M n B that follow from B but not from a strict subset of B, i.e.,
B
= B00 n</p>
      <p>B [ [ S00 :</p>
      <p>
        S(B
B is called a proper premise if B is not empty. It is called a proper premise for
m 2 M if m 2 B . It can be shown that L = f B ! B j B proper premise g
is sound and complete [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Several alternative ways to define this sound and
complete set of implications can be found in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>We write g $ m if g0 is maximal with respect to the subset order among all
object intents which do not contain m.
3</p>
      <p>
        Proper Premises as Minimal Hypergraph Transversals
We present a connection between proper premises and minimal hypergraph
transversals, which forms the foundation for our enumeration algorithms. It has
been exploited before in database theory to the purpose of mining functional
dependencies from a database relation [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Implicitly, it has also been known
for a long time within the FCA community. However, the term hypergraph has
not been used in this context (cf. Prop. 23 from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
      </p>
      <p>Let V be a finite set of vertices. Then a hypergraph on V is simply a pair
(V; H) where H is a subset of the power set 2V . Intuitively, each set E 2 H
represents an edge of the hypergraph, which, in contrast to classical graph theory,
may be incident to more or less than two vertices. A set S
hypergraph transversal of H if it intersects every edge E 2 H, i.e.,</p>
      <sec id="sec-2-1">
        <title>V is called a</title>
        <p>
          8E 2 H : S \ E 6= ;:
S is called a minimal hypergraph transversal of H if it is minimal with respect to
the subset order among all hypergraph transversals of H. The transversal
hypergraph of H is the set of all minimal hypergraph transversals of H. It is denoted by
Tr (H). The problem of deciding for two hypergraphs G and H whether H is the
transversal hypergraph of G is called TransHyp. The problem of enumerating
all minimal hypergraph transversals of a hypergraph G is called TransEnum.
Both problems are relevant to a large number of fields and therefore have been
well-researched. TransHyp is known to be contained in coNP. Since it has
been shown that TransHyp can be decided in quasipolynomial time [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], it is
not believed to be coNP-complete. Furthermore, it has been shown that it can
be decided using only limited non-determinism [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. For the enumeration problem
it is not known to date whether an output-polynomial algorithm exists. However,
efficient algorithms have been developed for several classes of hypergraphs [
          <xref ref-type="bibr" rid="ref4 ref8">8,4</xref>
          ].
        </p>
        <p>
          The following proposition can be found in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] among others.
        </p>
        <sec id="sec-2-1-1">
          <title>Proposition 1. P</title>
          <p>M is a premise of m 2 M iff</p>
          <p>(M n g0) \ P 6= ;
holds for all g 2 G with g $ m. P is a proper premise for m iff P is minimal
(with respect to ) with this property.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>We immediately obtain the following corollary.</title>
        <p>Corollary 1. P is a premise of m iff P is a hypergraph transversal of (M; H)
where</p>
        <p>H := fM n g0 j g 2 G; g $ mg:
The set of all proper premises of m is exactly the transversal hypergraph</p>
        <p>Tr (fM n g0 j g 2 G; g $ mg):</p>
        <p>
          In particular this proves that enumerating the proper premises of a given
attribute m is polynomially equivalent to TransEnum. This can be exploited
in a naïve algorithm for computing all proper premises of a formal context
(Algorithm 1). Being aware of the link to hypergraph transversals, we can benefit
from existing efficient algorithms for TransEnum in order to enumerate proper
premises similar to what has been proposed in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Of course, it is also possible
to use other enumeration problems to which TransEnum can be reduced.
Examples are the enumeration of prime implicants of Horn functions [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and the
enumeration of set covers.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Improvements to the Algorithm</title>
      <sec id="sec-3-1">
        <title>Avoiding Duplicates using Candidate Sets</title>
        <p>We can further optimize Algorithm 1 by reducing the search space. In the naïve
algorithm proper premises are typically computed multiple times since they can
be proper premises of more than one attribute. Our goal is to avoid this wherever
possible.</p>
        <p>The first idea is shown in Algorithm 2. There we introduce a candidate set
C of particular attributes, depending on the current attribute m. We claim now
that we only have to search for minimal hypergraph transversals P of f M n g0 j
g $ m g with P C. We provide some intuition for this idea.</p>
        <p>Algorithm 1 Naïve Algorithm for Enumerating All Proper Premises
Input: K = (G; M; I)
P = ;
for all m 2 M do</p>
        <p>P = P [ Tr (fM n g0 j g 2 G; g $ mg)
end for
return P
Algorithm 2 A Better Algorithm for Enumerating All Proper Premises
Input: K = (G; M; I)
P = f f m g j m 2 M; f m g is a proper premise of K g
for all m 2 M do</p>
        <p>C = f u 2 M n f m g j 6 9v 2 M : u ^ m v &lt; m g</p>
        <p>P = P [ f P C j P minimal hypergraph transversal of f M n g0 j g $ m g g
end for
return P</p>
        <p>Let us fix a formal context K = (G; M; I), choose m 2 M and let P M be
a proper premise for m. Then we know that m 2 P 00, which is equivalent to
If we now find another attribute n 2 M n f m g with
^ p
p2P</p>
        <p>m:
^ p
p2P
n &lt;
m
it suffices to find the set P as a proper premise for n, because from n &lt; m we
can already infer m 2 P 00. Conversely, if we search for all proper premises for m,
(1)
we only have to search for those who are not proper premises for attributes n
with n &lt; m. Now suppose that there exists an element u 2 P and an attribute
v 2 M such that
Then we know
m ^ u
v &lt;</p>
        <p>m:
( ^
i.e., P is already a proper premise for v. In this case, we do not have to search
for P , since it will be found in another iteration. On the other hand, if P is a
proper premise for m but not for any other attribute n 2 M with n &lt; m, the
argument given above shows that an element u 2 P and an attribute v 2 M
satisfying (1) cannot exist.</p>
        <p>Lemma 1. Algorithm 2 enumerates for a given formal context K = (G; M; I)
all proper premises of K.</p>
        <p>Proof. Let P be a proper premise of K for the attribute m. P is a proper premise
and therefore m 2 P 00 holds, which is equivalent to m (P 0; P 00). Let c 2 M
be such that m c (P 0; P 00) and c is minimal with this property. We
claim that either P = f c g or P is found in the iteration for c of Algorithm 2.</p>
        <p>Suppose c 2 P . Then m 2 f c g00 follows from m c. As a proper premise,
P is minimal with the property m 2 P 00. It follows P = f c g and P is found by
Algorithm 2 during the initialization.</p>
        <p>Now suppose c 62 P . Consider</p>
        <p>C := f u 2 M n f c g j 6 9v 2 M : u ^ c
v &lt; c g:
We shall show P C. To see this, consider some p 2 P . Then p 6= c holds by
assumption. Suppose that p 62 C, i.e., there is some v 2 M such that p ^ c
v &lt; c. Because of p 2 P , p (P 0; P 00) and together with c (P 0; P 00) we
have
(P 0; P 00)
p ^ c
v &lt; c
in contradiction to the minimality of c. This shows p 2 C and all together
P C.</p>
        <p>To complete the proof it remains to show that P is a minimal hypergraph
transversal of f M n f g g0 j g $ c g, i.e., that P is also a proper premise for c, not
only for m. Consider n 2 P . Assume c 2 (P n f n g)00. Since fcg implies m, then
P n f n g would be a premise for m in contradiction to the minimality of P . Thus
c 62 (P n f n g)00 holds for all n 2 P and therefore P is a proper premise for c.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Irreducible Attributes</title>
        <p>We go one step further and also remove attributes m from our candidate set C
whose attribute concept m is the meet of other attribute concepts x1; : : : ; xn,
where x1; : : : ; xn 2 C, i.e., m = Vin=1 xi. This results in Algorithm 3 that no
longer computes all proper premises, but a subset that still yields a complete
implicational base. We show that we only have to search for proper premises P
with P N where N is the set of irreducible attributes of K.</p>
        <p>To ease the presentation, let us assume for the rest of this paper that the
formal context K is attribute-clarified.</p>
        <sec id="sec-3-2-1">
          <title>Algorithm 3 Computing Enough Proper Premises</title>
          <p>Input: K = (G; M; I)
P = f f m g j m 2 M; f m g is a proper premise of K g
N = M n f x 2 M j x = Vin=1 xi for an n 2 N and xi 2 M for 1 i n g
for all m 2 M do</p>
          <p>C = f u 2 N n f m g j 6 9v 2 M : u ^ m v &lt; m g</p>
          <p>P = P [ f P C j P minimal hypergraph transversal of f M n g0 j g $ m g g
end for
return P
Proposition 2. Let m be an attribute and let P be a proper premise for m. Let
x 2 P , n 2 N, and for 1 i n let xi 2 M be attributes satisfying
– m 2= f x1; : : : ; xn g,
– x = Vin=1 xi,
– xi 2= ;00 for all 1
– x &lt; xi for all 1
i
i</p>
          <p>Then f x g is a proper premise for all xi and there exists a nonempty set Y
f x1; : : : ; xn g such that (P n f x g) [ Y is a proper premise for m.
Proof. It is clear that f x g is a proper premise for all xi, since xi 2 f x g00 and
xi 2= ;00. Define</p>
          <p>QY := (P n f x g) [ Y
for Y f x1; : : : ; xn g. We choose Y f x1; : : : ; xn g such that Y is minimal with
respect to m 2 Q0Y0 . Such a set exists, since m 2 ((P n f x g) [ f x1; : : : ; xn g)00
because of f x1; : : : ; xn g ! f x g. Furthermore, Y 6= ;, since m 2= (P n f x g)00.</p>
          <p>We now claim that QY is a proper premise for m. Clearly m 2= QY , since
m 2= Y . For all y 2 Y it holds that m 2= (QY n f y g)00 or otherwise minimality of
Y would be violated. It therefore remains to show that m 2= (QY n f y g)00 for all
y 2 QY n Y = P n f x g.</p>
          <p>(QY n f y g)00 = ((P n f x; y g) [ Y )00</p>
          <p>((P n f y g) [ Y )00
= (P n f y g)00
since f x g ! Y and x 2 P n f y g. Since m 2= (P n f y g)00, we get m 2= (QY n f y g)00
as required. In sum, QY is a proper premise for m.</p>
          <p>Lemma 2. Let N be the set of all meet-irreducible attributes of a context K.
Define</p>
          <p>P = f X</p>
          <p>M j jXj
1; X proper premise g [ f X</p>
          <p>N j X proper premise g
Then the set L = f P ! P j P 2 P g is sound and complete for K.
Proof. Let m be an attribute and let P be a proper premise for m. If P 2= P then
it follows that P 6 N . Thus we can find y1 2 P nN and elements x1; : : : ; xn 2 M
with n 1 such that
– m 2= f x1; : : : ; xn g,
– y1 = Vin=1 xi,
– xi 2= ;00 for all 1
– x &lt; xi for all 1
i
i</p>
          <p>By Proposition 2 we can find a proper premise P1 such that P ! f m g
follows from f y1 g ! f x1; : : : ; xn g and P1 ! f m g. Clearly f y1 g 2 P, since all
singleton proper premises are contained in P. If P1 2= P then we can apply
Proposition 2 again and obtain a new proper premise P2, etc. To see that this
process terminates consider the strict partial order defined as
P</p>
          <p>Q iff 8q 2 Q : 9p 2 P : p &lt; q:
It is easy to see that with each application of Proposition 2 we obtain a new
proper premise that is strictly larger than the previous with respect to . Hence,
the process must terminate. This yields a set P0 = f f y1 g; : : : ; f yk g; Pk g P
such that P ! f m g follows from f Q ! Q j Q 2 P0 g. Thus L is a sound and
complete set of implications.</p>
          <p>Together with Lemma 1 this yields correctness of Algorithm 3.</p>
          <p>Corollary 2. The set of proper premises computed by Algorithm 3 yields a
sound and complete set of implications for the given formal context.
5
5.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <sec id="sec-4-1">
        <title>Computing Proper Premises Instead of Intents</title>
        <p>
          In both the stem base algorithms and our algorithms, runtime can be exponential
in the size of the input. In the classical case the reason is that the number
of intents can be exponential in the size of the stem base [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. In the case of
our algorithms there are two reasons: the computation of proper premises is
TransEnum-complete, and there can be exponentially many proper premises.
The first issue is less relevant in practice because algorithms for TransEnum,
while still exponential in the worst case, behave well for most instances.
        </p>
        <p>To see that there can be exponentially many proper premises in the size of the
stem base, let us look at the context Kn from Table 1 for some n 2, consisting
of two contranominal scales of dimension n n and one attribute a with empty
extent. It can be verified that the proper premises of the attribute a are exactly
the sets of the form fmi j i 2 Ig [ fm0i j i 2= Ig for some I f1; : : : ; ng, while the
only pseudo-intents are the singleton sets and fm1; : : : ; mn; m01; : : : ; m0ng. Hence
there are 2n proper premises for a, while there are only 2n + 2 pseudo-intents.</p>
        <p>
          Next-Closure behaves poorly on contexts with many intents while our
algorithms behave poorly on contexts with many proper premises. In order to provide
evidence that our algorithm should behave better in practice we use formulae
for the expectation of the number of intents and proper premises in a formal
context that is chosen uniformly at random among all n m-contexts for fixed
natural numbers n and m.4 Derivations of these formulae can be found in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>The expected value for the number of intents in an n m-context is
Eintent =
q=0
Xm m Xn n 2 rq(1
q r
r=0
2 r)m q(1
2 q)n r;
while the expected value for the number of proper premises for a fixed attribute
a in an n m-context is</p>
        <p>n
Epp = 2 n X
r=0
n
r
m 1
X
q=0
m q! 2 q2
q</p>
        <p>X
(p1;:::;pq)2Nq i=0
1 p1&lt; &lt;pq r
q
Y 1
2 q(1 + i) pi+1 pi 1:
In the Section 5.3 we shall see that the proper premise approach performs
extremely well on contexts obtained from model refactoring. We briefly describe
how these contexts are obtained. Working with data-flow-oriented simulation
4 We ignore renaming of attributes and objects.
60
40
20
0
100
50
0
100
50
0
2
4</p>
        <p>6
n = m
8
2
4
6</p>
        <p>8
models will result fast in many variants of similar models, which are difficult to
manage. One solution for that problem is to refactor these variants to
configurable models, containing all parts and information about the valid
combinations. Using a generator, the single variants, which are merged, can be created
later on demand.</p>
        <p>
          Using matching algorithms, corresponding artifacts (i.e., blocks, connections,
and parameter settings) among the variants can be found. This results in a
minimal set of artifacts, from that each variant can be built. To restrict the
combinations, and with it the number of variants that can be generated, to the
given model variants, we have to specify the dependencies among the artifacts.
These can be defined by a set of implications, which can be calculated by the
formal concept analysis. The whole process is described in [
          <xref ref-type="bibr" rid="ref17 ref18">17,18</xref>
          ].
        </p>
        <p>Using the matching result, a formal context can be built: The model variants
form the object set and the artifacts form the attribute set. The relation between
variants and artifacts is defined by incidence. If a block or connection exists
in a variant, they are related. The many-valued relation among variants and
parameters is resolved by using parameter settings (i.e., pairs of parameters and
their values) as attributes. The resulting implications should also contain negated
literals, so we also need the negated counterparts of the artifact attributes. The
semantic of the negated attributes is in this case a non-existing block, a
nonexisting connection, and a parameter that is not set to a certain value. In our
application, only negated counterparts of irreducible attributes are needed.</p>
        <p>This results in dense contexts (usually with a fill ratio greater than 0.3) with
a typical size of 20 objects and 60 attributes, from which a complete set of
implications has to be calculated.
5.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experimental Comparison to Other Algorithms</title>
        <p>We experimentally compare our proposed algorithm to other well known ones.
For this, we name the algorithms we want to compare, briefly discuss some
implementation details, and then present the achieved results.</p>
        <p>Algorithms We compare the following implementations: SB which is an
implementation of Next-Closure, HT which computes all proper premises as
hypergraph transversals as in Algorithm 1, and PP, an implementation of Algorithm 3.
Implementation An easy optimization we have made in HT and PP concerns
parallelization. In all the listings we have presented so far, the iterations over the
set of attributes in our formal context are independent of each other. It is natural
to evaluate those iterations in parallel to improve the overall performance of our
algorithms.</p>
        <p>
          In our experiments we did not use special-purpose algorithms to compute
the hypergraph transversals in HT and PP. Instead we have used an ad-hoc
implementation that is based on backtracking and some simple heuristics [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
Compared to advanced algorithms for TransEnum, this leaves room for further
optimizations.
        </p>
        <p>
          Data Sets Our test data sets can be divided into two categories, random and
structured. For the tests on structured data we have chosen two data sets from
the UCI Machine Learning Repository. The first data set is the testing data set
of SPECT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which describes Single Proton Emission Computed Tomography
(SPECT) images. This data set is given as a dyadic formal context with 187
objects, 23 attributes, and an approximate density of 0.38. The second one is a
data set on Congressional Voting Records of the U.S. House of Representatives
from 1984 [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. It contains 435 objects, 16 attributes and is given as many valued
context. It has been nominally scaled, resulting in a context with 50 attributes
and an approximate density of 0.34.5 The third structured data set originates
from the application described in Section 5.2 and [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. It has 26 objects, 79
attributes and an approximate density of 0.35.
        </p>
        <p>The other part of testing data sets consist of randomly generated contexts.
For this we fix the number nG of objects, nM of attributes and the density nI
of crosses. Then we generate for those three numbers one context of the given
size nG nM with an approximate density of nI .</p>
        <p>Experimental Results We have implemented all three algorithms as part of
conexp-clj, a general-purpose FCA tool developed by one of the authors. The
implementations itself are not highly optimized but rather prototypical, so the
absolute running times of the test cases should not be taken as best possible.
However, for comparing the three algorithms SB, HT, and PP, those
implementations are sufficient and give a good impression on their performance. The
experimental results (runtime and size of implication base) are given in Table 2
and Table 3.</p>
        <p>
          As one can see from the results, HT most often runs faster then SB, but
both are outperformed by PP. This can be seen most drastically with the
DataFlow-Model data sets, where PP only runs a fraction of a second whereas both
5 Note that this is another scaling as the one in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], so the times obtained there
cannot be compared directly to ours.
SB and HT run for hours. The same occurs with the Voting data set. The same
observation, although not that drastically, can also be seen with the randomly
generated data sets.
        </p>
        <p>The number of implications returned varies significantly not only between SB
and HT/PP, but also between different runs of PP. Most often, HT and PP will
return the same result, i.e., if the input context is attribute reduced. However,
if it is not, the number of implications returned by PP may be significantly
smaller than the overall number of proper premises, as one can see with the
Data-Flow-Model data set, where the number of returned implications is the
smallest possible.</p>
        <p>However, most of the time the number of implications computed by HT and
PP is much larger then the size of the stem base. The observed factors mostly
range between 5 and 20. This might be a problem in practice, in particular if this
factor is much higher. Therefore, one has to consider a tradeoff between the time
one wants to spend on computing a sound and complete set of implications and
on the size of this set of implications. The actual requirements of the particular
application decide on the usefulness of the particular algorithm.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Mikhail</given-names>
            <surname>Babin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sergei</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          .
          <article-title>Recognizing pseudo-intents is coNPcomplete</article-title>
          .
          <source>In Marzena Kryszkiewicz and Sergei Obiedkov</source>
          , editors,
          <source>Proc. of the 7th Int. Conf. on Concept Lattices and Their Applications</source>
          , volume
          <volume>672</volume>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>K.</given-names>
            <surname>Bertet</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Monjardet</surname>
          </string-name>
          .
          <article-title>The multiple facets of the canonical direct unit implicational basis</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <volume>411</volume>
          (
          <fpage>22</fpage>
          -24):
          <fpage>2155</fpage>
          -
          <lpage>2166</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Borchmann</surname>
          </string-name>
          . conexp-clj. http://www.math.tu-dresden.de/~borch/ conexp-clj/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>E.</given-names>
            <surname>Boros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elbassioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gurvich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Khachiyan</surname>
          </string-name>
          .
          <article-title>An efficient incremental algorithm for generating all maximal independent sets in hypergraphs of bounded dimension</article-title>
          .
          <source>Parallel Processing Letters</source>
          ,
          <volume>10</volume>
          :
          <fpage>253</fpage>
          -
          <lpage>266</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Krzysztof</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Cios</surname>
          </string-name>
          , Lukasz A.
          <string-name>
            <surname>Kurgan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Lucy</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goodenday</surname>
          </string-name>
          .
          <source>UCI Machine Learning Repository: SPECT Heart Data Set</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Felix</given-names>
            <surname>Distel</surname>
          </string-name>
          .
          <article-title>Hardness of enumerating pseudo-intents in the lectic order</article-title>
          .
          <source>In Barış Sertkaya and Léonard Kwuida</source>
          , editors,
          <source>Proc. of the 8th Int. Conf. on Formal Concept Analysis, volume 5986 of Lecture Notes in Artificial Intelligence</source>
          , pages
          <fpage>124</fpage>
          -
          <lpage>137</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Felix</given-names>
            <surname>Distel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Borchmann</surname>
          </string-name>
          .
          <article-title>Expected numbers of proper premises and concept intents</article-title>
          .
          <source>Preprint, Institut für Algebra, TU Dresden</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Eiter</surname>
          </string-name>
          , Georg Gottlob, and
          <string-name>
            <given-names>Kazuhisa</given-names>
            <surname>Makino</surname>
          </string-name>
          .
          <article-title>New results on monotone dualization and generating hypergraph transversals</article-title>
          .
          <source>SIAM J. Comput.</source>
          ,
          <volume>32</volume>
          :
          <fpage>514</fpage>
          -
          <lpage>537</lpage>
          ,
          <year>February 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Michael</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Fredman</surname>
            and
            <given-names>Leonid</given-names>
          </string-name>
          <string-name>
            <surname>Khachiyan</surname>
          </string-name>
          .
          <article-title>On the complexity of dualization of monotone disjunctive normal forms</article-title>
          .
          <source>Journal of Algorithms</source>
          ,
          <volume>21</volume>
          (
          <issue>3</issue>
          ):
          <fpage>618</fpage>
          -
          <lpage>628</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Ganter</surname>
          </string-name>
          .
          <article-title>Two basic algorithms in concept analysis</article-title>
          .
          <source>Preprint</source>
          <volume>831</volume>
          ,
          <string-name>
            <surname>Fachbereich</surname>
            <given-names>Mathematik</given-names>
          </string-name>
          , TU Darmstadt, Darmstadt, Germany,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Ganter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rudolf</given-names>
            <surname>Wille</surname>
          </string-name>
          .
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          . Springer, New York,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>J.-L. Guigues</surname>
            and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Duquenne</surname>
          </string-name>
          .
          <article-title>Familles minimales d'implications informatives résultant d'un tableau de données binaires</article-title>
          .
          <source>Math. Sci. Humaines</source>
          ,
          <volume>95</volume>
          :
          <fpage>5</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sergei</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Kuznetsov</surname>
          </string-name>
          .
          <article-title>On the intractability of computing the Duquenne-Guigues base</article-title>
          .
          <source>Journal of Universal Computer Science</source>
          ,
          <volume>10</volume>
          (
          <issue>8</issue>
          ):
          <fpage>927</fpage>
          -
          <lpage>933</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Heikki</given-names>
            <surname>Mannila</surname>
          </string-name>
          and
          <string-name>
            <surname>Kari-Jouko Räihä</surname>
          </string-name>
          .
          <article-title>Algorithms for inferring functional dependencies from relations</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <fpage>83</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Sergei</given-names>
            <surname>Obiedkov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Vincent</given-names>
            <surname>Duquenne</surname>
          </string-name>
          .
          <article-title>Attribute-incremental construction of the canonical implication basis</article-title>
          .
          <source>Annals of Mathematics and Artificial Intelligence</source>
          ,
          <volume>49</volume>
          (
          <issue>1-4</issue>
          ):
          <fpage>77</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Rudolph</surname>
          </string-name>
          .
          <article-title>Some notes on pseudo-closed sets</article-title>
          .
          <source>In Sergei O. Kuznetsov and Stefan Schmidt</source>
          , editors,
          <source>Proc. of the 5th Int. Conf. on Formal Concept Analysis</source>
          , volume
          <volume>4390</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>151</fpage>
          -
          <lpage>165</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Uwe</surname>
            <given-names>Ryssel</given-names>
          </string-name>
          , Joern Ploennigs, and
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Kabitzsch</surname>
          </string-name>
          .
          <article-title>Automatic variation-point identification in function-block-based models</article-title>
          .
          <source>In Proc. of the 9th Int. Conf. on Generative Programming and Component Engineering</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>32</lpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Uwe</surname>
            <given-names>Ryssel</given-names>
          </string-name>
          , Joern Ploennigs, and
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Kabitzsch</surname>
          </string-name>
          .
          <article-title>Extraction of feature models from formal contexts</article-title>
          .
          <source>In Proc. of the 15th Int. Software Product Line Conference</source>
          , pages
          <volume>4</volume>
          :
          <fpage>1</fpage>
          -
          <issue>4</issue>
          :
          <fpage>8</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Schlimmer</surname>
          </string-name>
          .
          <source>UCI Machine Learning Repository: 1984 United Stated congressional voting records</source>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>