<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Implementation for Fault Tolerance and Experimental Results</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Frithjof Dau</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SAP AG</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Even for small or medium sized contexts, the corresponding concept lattice soon becomes too large to be nicely displayed and understood, which renders a naive application of FCA for visual analytics challenging. Concept lattices depict all information of the underlying formal context. In certain settings, one can consider the context to be noisy or incomplete, and moderate and meaningful changes of the context such that the corresponding lattices significantly shrink in size and complexity are in those settings permissible. In this paper, such an approach is taken. The formal definitions for the approach are given, and using an implementation, several examples are provided which show the usefulness of the approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>number of concepts shown to the user [8,9]. It is not only the number of concepts
which often render the visualization of lattices complicated: Even with good layout
algorithms, the Hasse-diagram of a lattice often contains a significant number of
crossing edges, which is the main problem for the visual comprehension of graphs.</p>
      <sec id="sec-1-1">
        <title>Cassio Melo et al. present in [4,5] different metrics which allow for each node in a</title>
        <p>concept lattice to choose a uniquely given upper neighbor. By doing so, one can
transform a lattice into a tree (with loss of information), which in turn can be better
displayed compared to lattices. Finally, Boulicaut et al consider in [2,7] “noisy” formal
contexts and directly change the incidence relation I in order to reduce the number of
formal concepts. For CUBIST, in line with the approach of Boulicaut et al, Simon</p>
      </sec>
      <sec id="sec-1-2">
        <title>Andrews has provided in [1] an example with real-data of one the CUBIST use cases</title>
        <p>which clearly shows the advantage of this approach.</p>
        <p>In this paper, we follow up the notion of considering contexts to be noisy and to
extend the incidence relation in order to simplify the derived concept lattices. In the next
section, we provide the mathematical definition on how we change the incidence
relation. As described in the following section, this approach has been implemented into a
small FCA-tool in order to show its effectiveness. A thorough, though artificial
example is given in the next section, before we turn to examples based on real data out
of the CUBIST project.</p>
      </sec>
      <sec id="sec-1-3">
        <title>The authors want to stress that apart from providing the mathematical definitions, this paper focuses on the implementation and the experimental results. A thorough mathematical investigation is out of scope of this paper and will we provided in a different paper which is currently in preparation.</title>
        <p>2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Formal Definition</title>
      <p>
        Let a formal context (O,A,I) be given. Our basic idea is to understand the context to
be noisy or incomplete, and we are aiming at adding crosses to I such that the derived
lattice becomes smaller (hence, easier to read and understand). For an object o O
and an attribute a A, our starting point is the well-known equation
oIa
Ù
oII  aI Ù
aII  oI Ù
oI ĺ
a
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
      </p>
      <sec id="sec-2-1">
        <title>That is, we do not have oIa if some objects and/or some attributes violate (1). We define the sets of those objects and attributes as follows:</title>
        <p>DiffObj(o,a) := { x  O | x  oII and x  aI } and</p>
        <p>DiffAtt(o,a) := { y  A | y  aII and y  oI }</p>
        <p>The straight-forward approach taken in this paper, is, roughly speaking, as follows:
the bigger DiffObj(o,a) or DiffAtt(o,a), the less we like to add an additional cross
between o and a. In order to do so, it is possible to take only DiffObj(o,a) into account
(this approach is reasonable in contexts having a small amount of attributes, but a
large amount of objects, which is standard in traditional BI settings), or only
DiffAtt(o,a), or both. Moreover, we can either relate DiffObj(o,a) and DiffAtt(o,a) to the
overall sets of objects and attributes, i.e. taking a global approach, or relate them only
to the concepts generated by o and a, i.e. taking a local approach. This gives rise to
six different ways to measure “an approximate incidence measure” between o and a
as follows:</p>
        <p>GObj(o,a) := 1-( |DiffObj(o,a)/|O| )
GAtt(o,a) := 1-( |DiffAtt(o,a)|/|A| )
GObj,Att(o,a) := 1-1/2 ( |DiffObj(o,a)|/|O| + |DiffAtt(o,a)|/|A| )
LObj(o,a) := 1-( |DiffObj(o,a)|/| oII | )
LAtt(o,a) := 1-( |DiffAtt(o,a)|/| aII | )
LObj,Att(o,a) := 1-1/2 ( |DiffObj(o,a)|/| oII | + |DiffAtt(o,a)|/ | aII| )
Fig 1: Six approaches to measure the incidence between objects and attributes
For any of these incidence measures S  { GObj , GAtt , GObj,Att , LObj , LAtt , LObj,Att }
we obviously have for all o,a:</p>
      </sec>
      <sec id="sec-2-2">
        <title>So, for a given threshold t with 0  t  1 we can naturally set</title>
        <p>0  S(o,a)  1 and S(o,a) = 1 Ù</p>
        <p>
          oIa
JS,t := { (o,a) | S(o,a)  t }
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
Due to (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ), we obviously have s  t Ö JS,t  JS,s and JS,1 = I
.Finally, the local measure for objects corresponds to the confidence of the
according association rule, which is a nice rationale for that measure:
݋݂݊ܿ
(݋ ூ ՜ ܽ ) ൌ
ȁ௢൫ ಺ ׫ {௔ }൯ ಺ |
|௢ ಺ |
= 1 െ
ห௢ ಺ ȁห௢൫ି
        </p>
        <p>಺ ׫ {௔ }൯ ಺ |
|௢ ಺ |
= 1 െ ௙௜஽ |್ೀೕ௢ ಺ |(௢ǡ௔ )
= ܮ ܱܾ݆ (ǡ݋ ܽ )</p>
        <p>
          We will now exemplify (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) for S := GObj (and LObj(o,a). which is for this example
the same relation) with the following simple context:
        </p>
        <p>Please note the following:
x For o10, we have o10I ĺ a3 and o10I ĺ a1 , but not o10I ĺ a2 . This implication is
violated by exactly one object, namely o10 itself. If we consider a threshold of
0.9 (that is, as we have 10 objects in total, we set a cross between o and a iff
the condition oIĺ a is violated by at most one object), then we should add a
cross between o10 and a2. Note that the high amount of objects which have a1
as attribute (i.e. o3 - o9) is not relevant.
x For o1, we have o1Iĺ a3 and o1Iĺ a2 , but not o1Iĺ a1 . This implication is
violated by exactly two objects, namely o1 itself, and o2. If we consider a
threshold of 0.8 (that is, now we set a cross between o and a iff the condition
oIĺ a is violated by two or less objects), then we should add a cross between
o1 and a1
x A similar consideration applies of course to o2 .</p>
        <p>The approximate incidence measure S:=GObj and the concept lattices for the
incidence relations I=JS,1 , JS,0.9 and JS,0.8 is depicted in the next figure.</p>
        <p>S := GObj</p>
      </sec>
      <sec id="sec-2-3">
        <title>Threshold 1</title>
      </sec>
      <sec id="sec-2-4">
        <title>Threshold 0.9</title>
      </sec>
      <sec id="sec-2-5">
        <title>Threshold 0.8</title>
        <p>Fig 2: A simple example for GOBJ</p>
      </sec>
      <sec id="sec-2-6">
        <title>Generally, adding crosses to an incidence relation can decrease or increase the</title>
        <p>number of formal concepts, and one can hardly make statements on how the concept
lattices change. For the herein defined extensions of the incidence relations, the
situation is different. We start with a simple result which does not generally apply to
extensions of incidence relations.</p>
        <p>Lemma: Let (O,A,I) be a formal context, let S  { GObj , GAtt , GObj,Att , LObj , LAtt ,</p>
      </sec>
      <sec id="sec-2-7">
        <title>LObj,Att } be an approximate incidence measure, let t with 0  t  1 be a threshold, and</title>
        <p>let J:= JS,t . Then, for all o1,o2  O; we have o1I  o2I Ö o1J  o2J . Similarly, for all
a1,a2  A we have a1I  a2I Ö a1J  a2J .</p>
      </sec>
      <sec id="sec-2-8">
        <title>Proof: We only show the lemma for objects (i.e. the first implication), the proof for attributes is done analogously.</title>
        <p>Let a A be an attribute . As we have o1I  o2I and hence o2II  o1II , we obtain
DiffObj(o2,a)  DiffObj(o1,a) and DiffAtt(o2,a)  DiffAtt(o1,a); thus S(o1,a)  S(o2,a) .
Particularly, we get o1 J a Ù S(o1,a)  t =&gt; S(o2,a)  t Ù o2Ja. Q.e.d.</p>
        <p>As stated in the introduction, a thorough investigation of the approximate incidence
relations will be the subject of a different paper. Here we only provide a remark that
even though we can prove statements about the relationship between I and JS,t which
do not hold for I and any extension J  I , the dependencies between I and JS,t have to
be investigated further. For example, the author conjectured that we have PJ = PIIJ for
all sets of objects P  O, but the context in Fig 3, together with the approximate
incidence measure GObj(o,a), shows that this equation does not generally hold. To see
this, consider a threshold t=0.8, P:={o0,o1,o2}: we then have PIIJ = {o1, o2, o3, o4 } and
PJ={o1, o2, o3, o4, o6, o7, o8 }</p>
        <p>Fig 3: Counterexample for PJ = PIIJ
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Implementation</title>
      <sec id="sec-3-1">
        <title>We have extended the tool presented in [3] in order to implement the approach presented in this paper. The tool, initially developed to transform SPARQL-queries into formal contexts, can now load formal contexts (independently from SPARQL) and transform them according to (3). A partial screenshot is provided below.</title>
        <p>Fig 4: screenshot of a tool which implements approximate incidence relations</p>
      </sec>
      <sec id="sec-3-2">
        <title>The tool allows to load a formal context (as a Burmeister file) and computes all six approximate incidence measures. Either the original context or one of the measures can be selected for display, as the next screenshot shows.</title>
        <p>Fig 5: The different metrics as the appear in the tool</p>
        <p>The slider allows to adjust the threshold t . A cell for an object o and an attribute a
is marked grey iff S(o,a)  1 (where S stands for the selected measure). One can now
either store only the single derived context with the incidence relation JS,t , or a set of
derived contexts, where all thresholds 0.9, 0.91, …, 0.99 (these thresholds are
manually chosen and so far hardcoded) and all six measures are used (thus, 60 Burmeister
files are stored).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>An thorough, artificial example</title>
      <sec id="sec-4-1">
        <title>In this section, we exemplify all measures and different thresholds with an artificial example. Let us consider the following formal context and its concept lattice:</title>
        <p>Fig 6: Example context and lattice for approximate incidence relations</p>
      </sec>
      <sec id="sec-4-2">
        <title>For this context, we obtain the following values for the six different measures:</title>
        <p>Global
local
O
b
j
e
c
t
s
A
t
t
r
i
b
u
t
e
s
o
b
j
s
a
n
d
a
t
t
s</p>
        <p>Fig 7: All six approximate incidence measures for the example</p>
      </sec>
      <sec id="sec-4-3">
        <title>For the thresholds 0.9 (upper half of table) and 0.8 (lower half of table), the next figure provides for each measure S the corresponding concept lattice.</title>
        <p>Fig 8: Concept lattices for thresholds 0.8 and 0.9 and all six measures</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Heriot-Watt University Example</title>
      <p>The example of this section is based on the Heriot-Watt University use case in the
research project CUBIST, in particular, it is built upon a real data set, namely
embryo gene expression data. Gene expression information describes whether or not a
gene is expressed (active) in a location. There is a fixed set of levels of
expressiveness: A gene can in some location be detected, or more one can more precisely state
that it is weakly, moderate or strongly detected. Moreover, it can be stated that a gene
is not detected, that it is possible for the gene to be expressed, or that it is not
examined. Based on public available data, we have generated a context where the objects
are (names of) mouse genes (we have a total of 6613 formal objects in the context),
the attributes are the seven different levels of expressiveness, and a cross is set
between a gene and a level if that gene is detected in some location with that level of
expressiveness. The resulting lattice, having 81 formal concepts, is depicted below.</p>
      <p>Fig 9: Genes and Levels of Expressiveness without approximation</p>
      <p>The lattice clearly shows a well-known challenge for FCA. Most of possible
combinations of attributes are indeed instantiated objects in the context, often by few only,
(the clarified context still contains 77 objects), which leads to an explosion of the
number of concepts. Amongst the concepts, we have some formal concepts with large
number of objects, but most concepts are rather small and render the lattice clutterd
and hard to read. As we have many objects and only few attributes, it is sensitive to
apply the global object-based measurement GObj to the context. With a (quite high)
threshold of 97 already, the concept lattice is reduced significantly, as shown in Fig
10.</p>
      <p>Fig 10: Genes and Levels of Expressiveness with approximation, threshold 0.97
6</p>
    </sec>
    <sec id="sec-6">
      <title>Summary and next steps</title>
      <sec id="sec-6-1">
        <title>In this paper, we have introduced a fault tolerance approach for FCA and shown</title>
        <p>promising experimental results.</p>
        <p>Of course, first of all, this approach has to be thoroughly investigated from a
formal point of view. It has to be proven that for each approximate incidence measure,
the number of formal concepts decreases when the threshold is decreased. More
specifically, it has to be investigated how for a given measure and two thresholds t1  t2
how the concept lattice for t1 can be mapped onto the concept lattice for t2 . It has to be
scrutinized how the different measures relate to each other. And it has to be
investigated how the approach taken in this paper relates to other research, most importantly
the work of Boulicaut et al.</p>
      </sec>
      <sec id="sec-6-2">
        <title>From a conceptual point of view, it has to be worked out on how derived concept lattices are to be understood, particularly how the relationship of derived lattices and association rules of the orgin lattice is.</title>
      </sec>
      <sec id="sec-6-3">
        <title>Next, from an implementational point of view, we have to elaborate on the algorithm which computes the measures, i.e. scrutinizing its complexity and possibly improving its performance (this is e.g. important for big data sets and real-time applications).</title>
      </sec>
      <sec id="sec-6-4">
        <title>Finally, from an user interface point of view, it would be desirable to have a lattice</title>
        <p>visualization with a slider to adjust the threshold for a given measure such that
adjusting the threshold with the slider leads to animated transitions between the derived
lattices. Here the mappings which have been mentioned in the section about the
formal investigations are likely to be helpful, and further mathematical investigations
might be needed.</p>
      </sec>
      <sec id="sec-6-5">
        <title>In the long run, assuming that the steps mentioned above have been undertaken,</title>
        <p>the approach taken in this paper is envisioned to bring FCA closer to business
intelligence applications and visual analytics, where one needs easy-to-understand and
interactive visualizations for large amounts of data, and where a certain and controlled
loss of information for the sake of simplified diagrams is permissible.
7</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>McLeod</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Gene Co-Expression in Mouse Embryo Tissues</article-title>
          . In: Dau,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (ed.)
          <source>1st CUBIST Workshop</source>
          , at ICCS 2011,
          <article-title>Derby</article-title>
          , UK.
          <source>CEUR Workshop Proceedings</source>
          , Vol.
          <volume>753</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . ISSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Besson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Mining a New Fault-Tolerant Pattern Type as an Alternative to Formal Concept Discovery</article-title>
          . Scharfe et al.
          <source>ICCS. LNAI 4065</source>
          . Springer (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dau</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Towards Scalingless Generation of Formal Contexts from an Ontology in a Triple Store In: Dau, F and Andrews</article-title>
          ,
          <source>S: Proceedings of the second CUBIST workshop 2012</source>
          . KULeuven press,
          <year>2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Melo</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aufaure</surname>
            , M.-
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Grand</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bezerianos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Extracting and Visualizing Tree-like Structures from Concept Lattices In: 15th</article-title>
          <source>International Conference on Information Visualization (IV</source>
          <year>2011</year>
          ). London, UK,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Melo</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aufaure</surname>
            , M.-
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bezerianos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Le Grand</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Cubix: A Visual Analytics Tool for Formal Concept Analysis</article-title>
          .
          <source>In: 23ième Conférence Francophone Sur l'IHM (IHM</source>
          <year>2011</year>
          )
          <article-title>- Demo</article-title>
          . Sophia-Antipolis, France,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pensa</surname>
            ,
            <given-names>R. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J-F.</given-names>
          </string-name>
          :
          <article-title>Towards Fault-Tolerant Formal Concept Analysis</article-title>
          . In: Banidini,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Manzoni</surname>
          </string-name>
          , S. (eds.)
          <source>AI*IA</source>
          <year>2005</year>
          , LNAI 3673, pp.
          <volume>212</volume>
          {
          <issue>223</issue>
          , Springer-Verlag, Berlin Heidelberg (
          <year>2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pensa</surname>
            ,
            <given-names>G.R:</given-names>
          </string-name>
          , Boulicaut,
          <string-name>
            <surname>J.F.</surname>
          </string-name>
          :
          <article-title>Towards fault-tolerant formal concept analysis</article-title>
          .
          <source>In: Congress of the Italian Association for Artificial Intelligence AI* IA</source>
          ,
          <year>2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Computing iceberg concept lattices with Titanic</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          , Volume
          <volume>42</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>2</given-names>
          </string-name>
          ,
          <string-name>
            <surname>August</surname>
            <given-names>2002</given-names>
          </string-name>
          , Pages
          <fpage>189</fpage>
          -
          <lpage>222</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Conceptual clustering with iceberg concept lattices</article-title>
          .
          <source>In: Proc. of GI-Fachgruppentreffen Maschinelles Lernen'01</source>
          ,
          <string-name>
            <surname>Universität</surname>
            <given-names>Dortmund</given-names>
          </string-name>
          ,
          <year>2001</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>