=Paper=
{{Paper
|id=Vol-1921/paper10
|storemode=property
|title=Accidental Formal Concepts in the Presence of Counterexamples
|pdfUrl=https://ceur-ws.org/Vol-1921/paper10.pdf
|volume=Vol-1921
|authors=Dmitry Vinogradov
}}
==Accidental Formal Concepts in the Presence of Counterexamples==
<pdf width="1500px">https://ceur-ws.org/Vol-1921/paper10.pdf</pdf>
<pre>
    Accidental formal concepts in the presence of
                  counterexamples

                            Dmitry V. Vinogradov1,2
          1
          Federal Research Center for Computer Science and Control,
             Russian Academy of Science, Moscow 119333, Russia
                         vinogradov.d.w@gmail.com
  2
    Russian State University for Humanities, Intelligent Robotics Laboratory,
                           Moscow 125993, Russia
WWW home page: http://isdwiki.rsuh.ru/index.php/User:Vinogradov.Dmitry


      Abstract. An accidental formal concept corresponds to a subset of at-
      tributes that accidentally appear in several objects (of the concept ex-
      tent) if every such an object belongs to different “real” formal concept
      extent. There are two standard techniques to forbid such concepts: coun-
      terexamples to exclude involved concepts and the lower bound on the size
      of extent. Both are insufficient. Here we define random formal context
      with attributes (different from elements of “real” formal concepts) gener-
      ated by independent Bernoulli variables. The main result has asymptotic
      form: If number n of the p random attributes tends to infinity, the proba-
                                                           √
      bility of success equals to na , and there are m = b· n counterexamples,
      then the probability of an √accidental formal concept with 2 parents is
      1 − e−a − a · e−a · [1 − e−b· a ] in the limit.

      Keywords: formal context, formal concept, Bernoulli variable, coun-
      terexample, overfitting


1    Introduction

Formal Concept Analysis (FCA) [2] is one of the most popular means to analyse
small sample data by tools of lattice theory.
   Applicability of FCA to Big Data has several obstacles:

 – Exponentialy large number of hypotheses with respect to size of the initial
   formal context appears in the worst case.
 – In practice of sociology data analysis 100×100 training sample has generated
   more than 11, 500 formal concepts.
 – Many problems of FCA belong to famous N P and #P completeness com-
   plexity classes [3].

    The author adds the overfitting phenomenon to this list, when so-called “ac-
cidental” formal concepts appear. The following example demonstrates an acci-
dental concept:
2

    Let
             O = {o1 = B737, o2 = SSJ100, o3 = IL76, o4 = A320}
be a set of damaged planes, described by set

    F = {f1 = empennage damage, f2 = motor damage, f3 = curse scratch}

of attributes by formal context

                                  O | F f1 f2 f3
                                  o1    1 0 0
                                  o2    1 0 1
                                  o3    0 1 1
                                  o4    0 1 0

    Formal concepts
                                  h{o1 , o2 } , {f1 }i
“empennage damage is a cause” and

                                  h{o3 , o4 } , {f2 }i

“motor damage is a cause” are plausible ones, but

                                  h{o2 , o3 } , {f3 }i

“curse scratch is a cause of damage” is doubtful. The last one occurs since the
attribute {f3 } accidentally appears in 2 training objects o2 and o3 with different
“real” causes.
    Of couse, separation of formal concepts into “accidental” and “real” ones
heavily depends on experts’ intuition about research domain. In the previous
example we know enough to do this ourselves. However in more complex domain
there are no information to reject “accidental” concepts automatically.
    Counterexamples give a mean to delete such “accidental” formal concepts by
using so-called “counterexamples forbidden condition” procedure. A counterex-
ample is an object (described by combinations of the same attributes as training
objects) without the target property. Target property (or goal attribute) is a bi-
nary attribute of the objects. We suppose that formal concept contains training
objects with the target property and there exists an additional list of counterex-
amples without this property. This assumption converts FCA into a machine
learning method (see, for example, [3]). In the previous example ”to be dam-
aged” was the target property of training examples.
    We develop a probabilistic model for (retriction of) formal context (on the
subset of unessential attributes). Attribute is essential if it occurs as a part of
intent of some “real” formal concept. The goal is to estimate of probability of
appearance of subset of unessential attributes as a formal concept intent with-
out inclusion into some counterexample. Counterexample intent (or description)
corresponds to a random subset of unessential attributes. Inclusion of the intent
                                                                                  3

of a formal concept into some counterexample intent is suspicious because the
cause (this formal concept) is present and the target property is absent.
    Main result of the paper states unsufficiency of the counterexamples for-
bidden condition. If number n of the random p    (unessential) attributes tends√to
                                                   a
infinity, the probability of success equals to     n , and there are m = c ·     n
counterexamples, then the probability of an accidental formal concept with√ 2
elements extent and without any counterexample is 1 − e−a − a · e−a · [1 − e−c· a ]
in the limit.
    It’s clear that the last term is greater than 0. It means that there is a pos-
itive probability of appearance of formal concept with intent consisting of only
inessential attributes such that there is no inclusion into some counterexample.
“Accidental” formal concepts correspond to the overfitting phenomenon since
we try to use all available information on one hand, and “accidental” formal
concepts do not correspond to any “real” cause and hence worsen the target
property prediction on test examples on the other hand.


2   Basic definitions and facts of FCA

Here we recall some basic definitions and facts of Formal Concept Analysis
(FCA) [2].
   A (finite) context is a triple (G, M, I) where G and M are finite sets and
I ⊆ G × M . The elements of G and M are called objects and attributes,
respectively. As usual, we write gIm instead of hg, mi ∈ I to denote that object
g has attribute m.
   For A ⊆ G and B ⊆ M , define

                          A0 = {m ∈ M |∀g ∈ A(gIm)},                            (1)
                          B 0 = {g ∈ G|∀m ∈ B(gIm)};                            (2)

so A0 is the set of attributes common to all the objects in A and B 0 is the set of
objects possesing all the attributes in B. The maps (·)0 : A 7→ A0 and (·)0 : B 7→
B 0 are called derivation operators (polars) of the context (G, M, I).
    A concept of the context (G, M, I) is defined to be a pair (A, B), where
A ⊆ G, B ⊆ M , A0 = B, and B 0 = A. The first component A of the concept
(A, B) is called the extent of the concept, and the second component B is
called its intent. The set of all concepts of the context (G, M, I) is denoted by
B(G, M, I).
    Let (G, M, I) be a context. For concepts (A1 , B1 ) and (A2 , B2 ) in B(G, M, I)
we write (A1 , B1 ) ≤ (A2 , B2 ), if A1 ⊆ A2 . The relation ≤ is a partial order on
B(G, M, I).
    Fix a context (G, M, I). In the following, let J be an index set. We assume
that Aj ⊆ G and Bj ⊆ M , for all j ∈ J.
4

Lemma 1. [2] Assume that (G, M, I) is a context and let A ⊆ G, B ⊆ M and
Aj ⊆ G and Bj ⊆ M , for all j ∈ J. Then
                          A ⊆ A00           and B ⊆ B 00 ,                        (3)
               A1 ⊆ A2 ⇒ A01 ⊇ A02          and B1 ⊆ B2 ⇒ B10 ⊇ B20 ,             (4)
                                A0 = A000   and B 0 = B 000 ,                     (5)
                     [             \              [           \
                 (       Aj ) 0 =    A0j    and (    Bj )0 =    Bj0 ,             (6)
                  j∈J              j∈J                    j∈J              j∈J

                                   A ⊆ B 0 ⇔ A0 ⊇ B.                              (7)
                                                                                   t
                                                                                   u
    Lemma 1 implies that for (A1 , B1 ) and (A2 , B2 ) in B(G, M, I)
                  (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 ⇔ B2 ⊆ B1 .                   (8)
   A subset A ⊆ G is the extent of some concept if and only if A00 = A in which
case the unique concept of which A is the extent is (A, A0 ). Similarly, a subset
B of M is the intent of some concept if and only if B 00 = B and then the unique
concept with intent B is (B 0 , B). Again Lemma 1 is maim tools in proofs of these
propositions.
Proposition 1. [2] Let (G, M, I) be a context. Then (B(G, M, I), ≤) is a lattice
with join and meet given by
                       _                 [          \
                         (Aj , Bj ) = ((   Aj )00 ,   Bj ),                  (9)
                          j∈J                    j∈J            j∈J
                           ^                     \              [
                                (Aj , Bj ) = (         Aj , (       Bj )00 );    (10)
                          j∈J                    j∈J        j∈J
                                                                         t
                                                                         u
   A counterexample is an object without the target property. If the intent
B of formal concept hA, Bi is a subset of intent {c}0 of the counterexample
c then cause hA, Bi presents and the property absents for c, and the formal
concept must be ignored. This procedure is called counterexample forbidden
condition check.

3    Probabilistic model for accidental formal concepts
An attribute is called essential, if it occurs in some “real” cause. Assume that
2 “real” causes have no common attributes. Other attributes are called accom-
panying.
    Essential attributes of counterexamples are absent, and essential attributes
of training objects correspond to some of “real” cause in such a way that any of
2 “real” causes appear in some training object.
    Denote the number of training objects by k, the number of counterexamples
by m, and the number of accompanying attributes by n. It’s clear that accom-
panying attributes of training objects and counterexamples form a (k + m) × n
binary matrix. It contains N = (k + m) · n bites. Accompanying attributes are
generated by Bernoulli series of N tests.
                                                                                     5

                                                                                      N
Definition 1. Bernoulli series of N tests is the probability distribution on {0, 1}
with
                                                      N
                                                      Y                  1−δj
                 P(x1 = δ1 , . . . , xN = δN ) =            pδj · (1 − p)        ,
                                                      j=1


where 0 < p < 1. The number p > 0 is called success probability xj = 1 in test j.


Lemma 2. Accident formal concept with extent of cardinality k can be replaced
by Bernoulli series of n tests ha1 , . . . , an i with success probability pk of aj = 1
added to the string of 0s corresponding to essential attributes.
                                                                                     t
                                                                                     u
   Lemma 2 easily implies the first negative result (insufficiency of lower bound
on the size of extent to reject all accidental concepts):


Proposition 2. Probability of an accidental formal concept with the extent of
                                                                   1/k
size k exceeds ε > 0 if the success probability p ≥ (− ln(1 − ε)/n) .
                                                                                     t
                                                                                     u
   In analysis of the counterexample forbidden rejection procedure we restrict
ourselves by k = 2.


Lemma 3. The probability of appearance of accidental formal concept without
m random counterexamples is

                        m  
                        X  m
                                   · (−1)j · (1 − p2 + p2+j )n .
                        j=0
                               j


Proof. Lemma 2 implies that accidental formal concept with 2 elements extent
corresponds to Bernoulli series of n tests with success probability p2 .
    Fix the cardinality l of intent of accidental formal concept. Then the proba-
bility equals to
                     n  
                     X  n             l              n−l              m
                               · p2        · 1 − p2          · 1 − pl        ,
                           l
                     l=0


since pl is the probability of deleting this formal concept by some fixed coun-
                            m
terexample, hence 1 − pl       is the probability that any (of m) counterexample
                                                  l        n−l
doesn’t delete the formal concept, and nl · p2 · 1 − p2
                                            
                                                                 is the probability
6

of appearance of an accidental formal concept with intent of cardinality l. But

    n  
    X  n               l              n−l              m
                · p2        · 1 − p2          · 1 − pl        =
          l
    l=0
                                                                           
                   n                             m  
                   X  n          l        n−l     X      m
                          · p2 · 1 − p2                       · (−1)j · plj  =
                                       
               =                              ·
                      l                           j=0
                                                         j
                 l=0
                   m                n  
                                                                          !
                  X     m          j
                                     X     n        2+j l
                                                                  2 n−l
                                                                     
                =           · (−1) ·            · p        · 1−p             =
                  j=0
                        j                   l
                                     l=0
                                                m  
                                               X     m
                                            =              · (−1)j · (1 − p2 + p2+j )n .
                                               j=0
                                                      j


                                                                                           t
                                                                                           u


   Also the proof of Lemma 3 can be performed by direct application of “inclusion-
exclusion” principle.


4    Main result


Lemma 4. For any constant c the following holds

                                        m  
                                        X  m
                                                    · (−1)j · c = 0.
                                        j=0
                                                j


    This is proved by Newton’s binomial formula.                                           t
                                                                                           u


Lemma 5. For j > 0

                                      n
                   a  a 1+j/2                 a n              a j/2
              1−     +                      − 1−      = a · e−a ·          + o(n−j/2 )
                   n   n                         n                  n

holds as n → ∞.
                                                                                              7

Proof.
                          n 
      a  a 1+j/2                         a n
  1− +                        − 1−                =
      n      n                             n
                                                       n 
                                   a           a j/2              a n
                       = 1− · 1−                           − 1−          =
                                   n            n                   n
           a            a j/2
 =1−n· · 1−                       + ...
           n             n
                                         k                                    
         n · . . . · (n − k)          −a                    a j/2
      +                         ·              · 1−k·               + o(n−j/2 ) + . . .
                   k!                  n                     n
                                  n                                     
                      n!     −a                        a j/2
                 +       ·               · 1−n·                + o(n−j/2 ) −
                      n!      n                         n
                                                              k                     n !
                                  n · . . . · (n − k)       −a              n!      −a
             − 1 + ... +                                ·          + ... +     ·            =
                                            k!              n               n!      n
                                                          k                         
    a  a j/2               n · . . . · (n − k)       −a             a j/2
= n· ·            −...−                            ·          · k·           + o(n−j/2 ) − . . .
    n n                                k!               n             n
                                                         n                           
                                               n!     −a               a j/2
                                           −      ·           · n·           + o(n−j/2 ) =
                                               n!       n              n
                                          k−1  j/2
        a j/2                   (−a)              a                   (−a)n−1  a j/2
  =a·             + ... + a ·                   ·          + ... + a ·            ·         +
         n                         (k − 1)!         n                   (n − 1)!     n
                                                                          a j/2
                                                + o(n−j/2 ) = a · e−a ·            + o(n−j/2 )
                                                                           n
                                                                                              t
                                                                                              u

Lemma 6. Formula
                                 a n
                            1− 1−      = 1 − e−a + o(1)
                                  n

holds as n → ∞ .
                                                                                              t
                                                                                              u

Theorem 1. If number n ofpthe random attributes tends √     to infinity, the prob-
                                a
ability of success equals to    n , and there are m  =  b ·   n counterexamples,
then the probability of an accidental formal concept with 2√ elements extent and
without any counterexample is 1 − e−a − a · e−a · [1 − e−b· a ] in the limit.

Proof. Lemma 3 reduces the theorem to estimation of
                         m  
                         X  m
                                     · (−1)j · (1 − p2 + p2+j ))n .
                         j=0
                                j
8

                     pa
    Substitute p =        n , then Lemma 4 implies

    m                                n
    X  m             j     a  a 1+j/2
              · (−1) · 1 − +               =
    j=0
          j               n     n
                       m                             n         
                      X   m                 a  a 1+j/2        a n
                    =        · (−1)j · 1 − +               − 1−       .
                      j=0
                          j                 n   n               n

    Lemma 5 and Lemma 6 imply
    m                                          n             
    X  m                            a  a 1+j/2            a n
              · (−1)j ·        1−     +                 − 1−        =
    j=0
          j                         n   n                    n
      h      a n i
    = 1− 1−          +
              n
                  m
                 X m                                    n              
                                 j          a  a 1+j/2                a n
              =            · (−1) · 1 − +                       − 1−           =
                 j=1
                       j                    n     n                     n
                              m                          a j/2            
                          X      m
       = 1 − e−a + o(1) +              · (−1)j · a · e−a ·          + o(n−j/2 ) .
        
                             j=1
                                   j                         n

     Then
     m                       a j/2            
    X    m
           · (−1)j · a · e−a ·          + o(n−j/2 ) =
    j=1
         j                        n
                m  
                                    "        √ j               #
               X    m           j       −a    b· a            −j
            =            · (−1) · a · e ·               + o(m ) =
               j=1
                     j                          m
                                       √ m        
                          −a           b· a
                   =a·e · 1−                    − 1 + o(1) =
                                         m
                                                           h    √     i
                                                = a · e−a · e−b· a − 1 + o(1),

finishes the proof, since the last equation holds by Lemma 6.                          t
                                                                                       u
    To prove that                            h        √ i
                          1 − e−a − a · e−a · 1 − e−b· a > 0
for any a > 0 and b > 0 use the following reasoning
                     h        √ i
  1 − e−a − a · e−a · 1 − e−b· a ≥
                                                                             a2
                          ≥ 1 − e−a − a · e−a = e−a · (ea − 1 − a) ≥ e−a ·      > 0.
                                                                             2!
    Our model of random counterexamples is incorrect if the number m of coun-
terexamples is comparable with 2n , since in this case many pairs of counterex-
amples coincide. But this situation is also inpractical from data analysts’ point
                                                                                   9

of view, since majority of attrubutes absents, when real data are coded by binary
attributes. Hence formal context has a small fraction of 1s.

Conclusions
In this paper we demonstrate that Formal Concept Analyses has the overfitting
shortcoming similar to other machine learning methods. JSM method [1] has
the same drawback. The author [5] developed probabilistic approach to JSM,
FCA, and related formalisms. The inspiration for the study was the overfitting
phenonenon of classical methods.
    Recently Tatiana Makhalova and Prof. Sergey O. Kuznetsov [4] develop an
interesting approach to compare of overfitting phenomena between different ma-
chine learning algorithms by means of FCA. They suppose that formal context
contains errors of prediction of the target property of objects by different algo-
rithms (attributes). Hence Formal Concept Analysis is a means to estimate the
overfitting of algorithms when set of all objects randomly separated into two
parts (to learn and to test). There are some theoretical results based on “weak
probability axiomatization” proposed by Prof. Konstantin V. Vorontsov [6].
    In the present paper we discuss the overfitting of FCA itself considered as a
machine learning method. We consider “accidental” formal concepts appearence
as overfitting because it occurs during computing of all formal concepts (in
attempt to use full available information ignoring a possibility of “accidental”
coincidence between unessential attributes).

Acknowledgements. The author would like to thank Prof. Victor K. Finn and
his colleagues at Federal Research Center for Computer Science and Control for
support and helpful discussions.

References
1. Finn, V.K. The Synthesis of Cognitive Procedures and the Problem of Induction.
   Automatic Documentation and Mathematical Linguistics, Vol. 43. – 2009. – pp. 149–
   195
2. Ganter, Bernard and Wille, Rudolf. Formal Concept Analysis: Mathematical Foun-
   dations, Springer-Verlag, 1999
3. Kuznetsov, S.O. Complexity of Learning in Concept Lattices from Positive and
   Negative Examples. Discrete Applied Mathematics, no. 142(1-3). – 2004. – pp. 111–
   125
4. Makhalova, T. and Kuznetsov, S.O. On Overfitting of Classifiers Making a Lattice
   Proc. 14th International Conference on Formal Concept Analysis (ICFCA 2017),
   Lecture Notes in Artificial Intelligence (LNAI), vol. 10308. – Springer. – 2017. –
   pp. 184–197.
5. Vinogradov, D.V. VKF-method of Hypotheses Generation. Communications in
   Computer and Information Science, Vol. 436. – 2014. – pp. 237–248
6. Vorontsov, K.V. Combinatorial Probability and Generalization Bounds Tightness.
   Pattern Recognition and Image Analysis, Vol. 18. – no. 2. – 2008. – pp. 243–259

</pre>