<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Formalization of “Natural” Classification and “Natural” Concepts by Probabilistic Generalization of Formal Concepts Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evgenii E. Vityaev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladislav Degtiarev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bayar Pak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuri Meister</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Novosibirsk State University</institution>
          ,
          <addr-line>Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sobolev Institute of Mathematic</institution>
          ,
          <addr-line>Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>In the previous works, a probabilistic generalization of the formal concepts analysis was developed. This generalization is induced by the problem of formal concepts determining under noise conditions, when the lattice of formal concepts exponentially grows. In this paper, probabilistic formal concepts with negation are determined, as well as a statistical method for detecting these probabilistic formal concepts. The purpose of this paper is to show that probabilistic formal concepts have a deeper meaning. It is argued that probabilistic formal concepts formalize the “natural” concepts described in cognitive sciences by “causal models”, which are characterized by a highly correlated structure of attributes. The same structure is specific for the “natural" classification of objects of the external world. The definition of “natural" classification given by J. Stuart Mill is fairly accurately formalized by probabilistic formal concepts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In previous works [Vityaev et al. 2012], [Vityaev et al. 2015], [Martynovich et al. 2016], a
probabilistic generalization of the Formal Concepts Analysis (FCA) was developed. This
generalization solves the problem of the formal concepts discovery in noise conditions associated
with the exponential growth of the lattice of formal concepts in the presence of noise. This
issue is discussed in a more detail in section 2.</p>
      <p>Probabilistic generalization of formal concepts was obtained by generalizing the definition
of formal concepts as fix-points of implication. For this purpose, the probabilistic measure on
the objects of context and the most specific probabilistic implication were defined. Consistency
of fix-points based on maximally specific probabilistic implications is proved and then
probabilistic formal concepts are defined as fix-points of predictions based on maximally specific
probabilistic implications. Figuratively speaking, the obtained probabilistic formal concepts
not display data in all details as in photographs, but give an “idealized" representation of the
data.</p>
      <p>In this work, probabilistic formal concepts with negation are defined, as well as a statistical
method of detecting these probabilistic formal concepts based on the works [Vityaev et al.
2014], [Vityaev et al. 2015], [Martynovich et al. 2016].</p>
      <p>The purpose of this work is to show that probabilistic formal concepts have deeper meaning
than their restoration in noise conditions. In [Vityaev et al. 2014], it was shown that the
fixpoints of anticipation underlying probabilistic formal concepts model the process of perception
in exact accordance with the psychology of perception described in [Smirnov 1985], [Naiser
1981].</p>
      <p>In addition, this paper shows that probabilistic formal concepts simulate “natural" concepts
explored in cognitive sciences. “Natural" concepts are concepts about directly perceived
objects, not abstract objects like production, plant and object. The following section, based on
research in cognitive sciences, provides principles of categorization and description of
“natural" concepts. It follows that “natural" concepts reflect a highly correlated structure of features
of objects of the external world, which can be represented by “causal models". It is argued that
causal models are rather precisely modeled by probabilistic formal concepts.</p>
      <p>The highly correlated structure of features of objects of the external world is also manifested
in the “natural" classification (see section 3). J. Stuart Mill has observed that “natural" classes of
animals or plants are described by a potentially infinite number of properties [Mill 2011]. The
naturalists who built the “natural" classifications also noted that the more similar the compared
objects are in the greater number of significant features, the more likely their similarity in other
features [Rutkovsky 1884], and that the construction of the “natural" classification consists in
“indication" - from the infinite number of features we need to move to a limited number of
features that would replace all other features [Smirnov 1938]. This means that in the “natural"
classes the features are strongly correlated, for example, if classes 128 and features are binary,
the independent “indicator" features among them will be only 7 features, because 27 =128, and
the remaining features, which can be tens and hundreds (denoted by N) can be predicted by
these 7 features, which means the presence of N-7 regularities. Since the “indicator" features
can be diferent 7 features from N, the choice of which is equal to  7 , the number of regularities
that predict non-indicator features based on the indicator features is not much less than ( −
7) ∙  7 . Highly correlated structure of features is well detected by the probabilistic formal
concepts; moreover, the following statement by J.S. Mill almost precisely defines the meaning
of probabilistic formal concepts: “Natural groups ... are defined by features ... However, at the
same time not only features which are certainly common for all the objects included into the
group but the whole set of those features is taken into account, of which all are found in the
majority of these objects and the majority - in all" [Mill 2011].</p>
    </sec>
    <sec id="sec-2">
      <title>2. “Natural" concepts</title>
      <p>In the works of Eleanor Rosch [Rosch 1973, 1978], [Rosch et al. 1975, 1976, 1978] the principles
of categorization of “natural" categories [Rosch 1973, 1978] are formulated on the basis of the
conducted experiments. Directly perceived objects (basic objects) – information rich bundles
of observed and functional properties that form a natural partition that creates a
categorization. These bundles form “prototypes" of objects classes: Categories can be viewed in terms
of their clear cases if the perceiver places emphasis on the correlational structure of perceived
attributes . . . By prototypes of categories we have generally meant the clearest cases of
category membership [Rosch et al. 1978], [Rosch 1978]. Rosch and Mervis (1975) have shown that
the more prototypical of a category a member is rated, the more attributes it has in common
with other members of the category and the fewer attributes in common with members of the
contrasting categories [Rosch et al. 1975, 1976].</p>
      <p>Later on, the theory of “natural" concepts of Eleanor Rosch was called a “prototype” theory.
Its main features are described in the following way: The prototype view (or probabilistic
view) keeps the attractive assumption that there is some underlying common set of features
for category members but relaxes the requirement that every member have all the features
[Ross et al. 2008].</p>
      <p>Further research has found that models based on features, similarities and prototypes are not
suficient to describe classes. Theoretical, causal and ontological knowledge relating to class
objects must be taken into account. For example, people not only know that birds have wings,
can fly and nest in trees, but also that birds nest in trees because they can fly and fly because
they have wings.</p>
      <p>Based on this research, Bob Rehder, in his work “Categorization as causal reasoning",
formulated a causal-model theory: people’s intuitive theories about categories of objects consist of a
model of the category in which both a category’s features and the causal mechanisms among
those features are explicitly represented [Rehder 2003a].</p>
      <p>In the theory of causal models, the relationship of an object to a category is no longer based
on a set of features and proximity based on features, but on the similarity of the generating
causal mechanism: Specifically, a to-be-classified object is considered a category member to the
extent that its features were likely to have been generated by the category’s causal laws, such
that combinations of features that are likely to be produced by a category’s causal mechanisms
are viewed as good category members and those unlikely to be produced by those mechanisms
are viewed as poor category members [Rehder 2003a].</p>
      <p>Some researchers have used Bayesian networks or causal models to represent causal
knowledge [Cheng 1997], [Gopnik et al. 2004], [Grifiths et al. 2009]. However, these models cannot
model cyclical causal relationships because Bayesian networks do not support cycles.</p>
      <p>In [Rehder et al. 2011], Bob Rehder proposed a model of causal cycles based on the
“disclosure" of causal graphical models (CGMs). The “disclosure" is done by creating a Bayesian
network that deploys over time. Such a network is already quite well suited for modeling
“natural" concepts reflecting the highly correlated structure of the outside world. However, it does
not explicitly formalize causal link cycles.</p>
      <p>The model we propose further is directly based on cyclic causal links represented by
probabilistic formal concepts.</p>
    </sec>
    <sec id="sec-3">
      <title>3. “Natural" classification</title>
      <p>The first suficiently detailed philosophical analysis of the “natural" classification belongs to
J.St. Mill [Mill 2011]. This analysis revealed all the main properties of the “natural"
classifications, which were later confirmed by naturalists who built the “natural" classifications.</p>
      <p>According to J.St. Mill [Mill 2011], “artificial" classifications difer from “natural"
classifications in that they can be based on any one or more attributes, so that diferent classes difer only
in that they include objects with diferent values of these attributes. But if we consider classes
of “animals" or “plants", they difer in such a large (potentially infinite) number of properties
that they cannot be enumerated" [Mill 2011].</p>
      <p>J.St. Mill gives the following definition of the “natural" classification: it is such a classification
which unites objects into groups concerning which it is possible to express the greatest number
of general propositions and it is based on such properties which serve as reasons for many
others or at least make up their true attributes. It also defines the notion of the “image" of the
class, which is the forerunner of “natural" concepts: “our concept of class – the image by which
this class is represented in our mind – is the concept of a certain specimen that possesses all
the features of this class ... to the highest degree".</p>
      <p>The properties of “natural" classes according to J. St. Mill were confirmed by naturalists.
For example, Wavell W.: “The more general statements about objects give an opportunity to
make classification, the more natural it is" [Mayen et al. 1976]. L. Rutkovsky writes about
the inexhaustible number of general properties of “natural" classes. [Rutkovsky 1884]: “The
greater the number of essential features similar to comparable objects, the more likely their
similarity in other respects". A similar statement is made by E.S. Smirnov: “The taxonomic
problem lies in ’indication’: from an infinite number of features we need to move to a limited
number of them, which would replace all other features" [Smirnov 1985].</p>
      <p>As a result, the problem of defining “natural" classifications has been formulated and is still
discussed in the literature [Zabrodin 1981], [The Nature of Classification 2013]. However, from
our point of view, the formalization of the “natural" classification is not yet suficiently
adequate.</p>
    </sec>
    <sec id="sec-4">
      <title>4. The problem of formal concepts discovery in noise conditions</title>
      <p>Let us consider the known problem of formal concepts analysis [Ganter 2003]: standard
methods of concept lattice construction on noisy data turn out to be unproductive, because they
are designed for accurate data, and in the presence of noise (data distortions) the concept
lattice turns out to be filled with noisy clones of initial concept classes [Klimushkin et al. 2010],
[Frequent Itemset Mining 2004]. Attempts to solve this problem “on the forehead" do not look
promising [Klimushkin et al. 2010], [Kuznetsov 2007]. The main idea of these works is to take
a full lattice of concepts, and get rid of noise, “side” concepts by filtering. A widespread
evaluation of the “side” concept is the Stability Index [Kuznetsov 2007], [Buzmakov 2014]. The
Stability Index and other approaches based on the filtration of lattices have significant
disadvantage – the filtered lattices of concepts often have exponential sizes [Emilion et al. 2009],
which is unacceptable for large contexts.
from [Ganter et al. 2004].
attribute.
and attributes, and  ⊆</p>
      <p>×</p>
    </sec>
    <sec id="sec-5">
      <title>5. Basics of formal concepts analysis</title>
      <p>Let’s start with the main definitions of formal concepts analysis. The majority of definitions
here start with classical works on FCA [Ganter 2003], [Ganter et al. 1999], others are taken</p>
      <sec id="sec-5-1">
        <title>Definition 1 . Formal context is a triple (,  ,</title>
        <p>), where  and</p>
        <p>– arbitrary sets of objects
– binary relation, expressing the belonging of an object to an
In formal context, the operators of the derivative play a key role.</p>
        <p>Definition 2 . ⊆ ,  ⊆</p>
        <p>. Then:
1.  ↑ = { ∈  | ∀ ∈ , (, 
2.  ↓ = { ∈  | ∀ ∈ , (, 
) ∈  }
) ∈  }</p>
        <p>Operators of a derivative link a subset of objects and attributes of a context. In the future
we will also refer to both operators as a derivative with a single symbol ′.</p>
        <sec id="sec-5-1-1">
          <title>Definition 3.</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>Definition 4.</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>An implication is a pair of attribute sets (,</title>
        <p>A pair (, 
) – is a formal concept, if  ↑ =  and  ↓ =  .</p>
        <p>), ,  ⊆ 

→  . Implication</p>
        <p>→  is truth on the context 
A set of all true implications on context K well be denoted as
= (,  , 
all possible conclusions of implications to the original set of attributes X:</p>
        <p>Definition 5. For any set L of implications we define a direct inference operator   that add
that is recorded as
), if ∀ ∈  (⊈
′ ∨  ⊆ 
′ .
)
  ( ) =  ∪ { |  ⊆  , 
→  ∈  }</p>
      </sec>
      <sec id="sec-5-3">
        <title>Theorem 1. [Ganter et al. 1999] For any subset  ⊆</title>
        <p>Let us illustrate Theorem 1 in the following simple context.</p>
        <p>,
( )( ) = 
⇔  ′′ =  .
[Ganter et al. 1999] Very simple formal context K0
m</p>
        <p>1
1
1
0
0
m
1
1
1
0
2
m</p>
        <p>3
0
1
1
0</p>
        <p>We can reformulate the task of constructing a concept lattice in terms of implication and
output operator. Looking at Table 1, it is not dificult to notice that m
and m
attributes
completely define the object class. In fact, m
1 infer m2, and the same is true for m , from these
3
it follows that  1
→  2
and  3 →  2. The sets { 1,  2}, {
2,  3} and { 1, 
2,  3} will be
formal concepts. According to the theorem, they will also be fix-points of the direct inference
1
3
operator. For example,
 
( )</p>
        <p>( )
{ 1} −−−−−→ { 1,  2} −−−−−→ { 1,  2}.</p>
        <p>It is therefore { 1,  2} both a fix-point and a formal concept.</p>
        <p>Now let’s add some noise to the original context K0 from Table 1. In order to keep the noise
level at a fairly low level, let’s also expand the context by adding duplicates of objects:</p>
        <p>Every small change made to the context will have a dramatic impact on the conceptual
lattice. The first context in Table 2 is equivalent to the original context K 0 and generates the
same conceptual lattice. However, the lattice with noise of the second context in Table 3 will
contain some fictitious notions {m 2} and {m1, m3} generated by noise, which will also be formal
notions. The number of fictitious notions will increase with the noise level, and the qualitative
estimation of speed is exponential [Emilion et al. 2009].</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Probabilistic logic in a formal context</title>
      <p>To avoid the exponential growth of the lattice, a stability index is introduced [Klimushkin et al.
2010], [Kuznetsov 2007],[Buzmakov 2014] to assess the degree of stability of the concept and
to select only sustainable concepts.</p>
      <p>We will present a fundamentally diferent way of the lattice construction, based on
probabilistic implications. For this purpose, we redefine the context within the framework of logic.
Further we will consider only finite contexts.</p>
      <p>Definition 6. For the finite context  = (,  ,  ) we define a signature Ω that contains
predicate symbols m(x) for each  ∈  . For the signature Ω and the K context, considered
as a model, we define the interpretation of predicate symbols as follows  ⊧ ( ) ⇔ (,  ) ∈  .</p>
      <p>Definition 7. For the signature Ω let’s define the following variant of the first order logic:
1.   – set of variables;
2.   – set of atomic formulas (atoms)  ( ) where  ∈ Ω and  ∈   ;
3.   – set of literals, including all atoms  ( ) and their negations ¬ ( );
4. Φ – set of formulas defined inductively: every atom is a formula, for any Φ, Ψ ∈ Φ
following syntactic constructions Φ ∧ Ψ, Φ ∨ Ψ, Φ → Ψ, ¬Φ are formulas as well.</p>
      <sec id="sec-6-1">
        <title>For a set of literals  ⊆   we define their conjunctions as ∧ = ∧  . In the same way</title>
        <p>∈
¬ = {¬ |  ∈  }.</p>
        <p>Definition 8. A single element set { }, together with signature Ω predicates, forms a
model   of this object. The fact of truth of a formula  on an object model   is written as
⊧ ⇔   ⊧ .</p>
        <p>Definition 9. Let us define a probabilistic measure  on the set G, in the Kolmogorov sense.
Then we can define a probabilistic measure on the set of formulas as:</p>
        <p>∶ Φ → [0, 1],  ( ) =  ({ | ⊧ }).</p>
        <p>Let us assume that the context has no statistically insignificant objects such as  ({ }) =
0,  ∈  .</p>
        <p>Definition 10. The set of literals M will be called  -joint if  (∧ ) &gt; 0.</p>
        <p>In the future, the compatibility of the set of literals will be considered within a
probabilistic measure of a context, and in the absence of ambiguity the symbol of the measure will be
omitted.</p>
        <p>Let us consider some subset of atoms  ⊆  ( ). The formula  1 ∧  2 … ∧   →  =
∧{  } →  defines implication ({  } , { }) on the context. The presence of a probabilistic
measure allows to determine the probabilistic implication.</p>
        <p>Let’s define the rules on the context.</p>
        <p>Definition 11. Let { 1,  2, … ,   ,  } ∈   ,  ∉ { 1,  2, …   },  ≥ 0.
1. The rule  = ( 1,  2.,   →  ) is called implication ( 1 ∧  2 … ∧   →  );
2. The premise  ← of rule R is a set of literals { 1,  2,...,   };
3. The conclusion of the rule is  → =  ;
4. The length of the rule is called the premise power | ← ;
|
5. If  1← =  2← and  1→ =  2→, then  1 =  2.</p>
        <p>Definition 12. The probability  of the rule R is the value
 ( ) =  ( →| ←) =
 ( ← ∧  →)
 ( ←) .</p>
        <p>If the rule denominator  (∧ ←) is equal to 0, then the probability is not defined.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Rule classes and consistency of predictions</title>
      <p>The following sections are a combination of works [Ganter et al. 1999], [Vityayev 1992],
[Vityaev 2006], [Vityayev et al. 2012], [Vityaev et al. 2015], [Martynovich et al. 2016].</p>
      <p>Definition 13. Let us define a prediction operator by a set of rules  as
Π( ) =  ∪ { | ∃ ∈  ∶  ← ⊆ , 
→ =  }.</p>
      <p>Definition 14. A closure  ̄ of the set of literals L is the smallest fix-point of a prediction
operator containing L
 ̄ = Π( ̄ ) = Π( ) = ⋃ Π( ).</p>
      <p>∞
 ∈ℕ</p>
      <p>Definition 15. Rule R1 is a subrule of rule R2 if  1→ =  2→,  1← ⊂  2←. We denote this fact
as  1⊏ 2.</p>
      <p>Definition 16. Rule R1 specifies rule R 2, or R1 &gt; R2 if  2⊏ 1 and  ( 1) &gt;  ( 2).</p>
      <p>Example 2. A rule  1 →  2 from the context   (Table 3) has only one, unconditional
sub-rule (∅ →  2)⊏( 1 →  2) but it is wrong that the first rule specifies the second
 (∅ →
 2) = 180 = 54 =  ( 1 →  2). However, ( 3 →  2) &gt; (∅ →  2) since  (∅ →  2) = 180 &lt; 65 =
 ( 3 →  2).</p>
      <p>To achieve consistency and resistance to noise, some restrictions on the rule classes used
will be required [Vityayev 1992], [Speransky 2013].</p>
      <p>Class M1 defined below contains only those rules whose conditional probability is strictly
greater than the probability of an unconditional rule.</p>
      <p>Definition 17.  ∈  1( ) ⇔  ( ) &gt;  ( →),  → =  ,  ← = ∅.</p>
      <p>The next class M2 requires the property of maximum specificity – the impossibility to
increase the conditional probability by refining the rule [Vityaev 2006].</p>
      <p>Definition 18.  ∈  2( ) ⇔ ( ∈  1( ))&amp;(⊏ ̃ ⇒  ( ̃ ) ≤  ( )).</p>
      <p>We have proved that maximally specific rules of M2 solve the problem of statistical ambiguity
[Vityaev 2006], [Vityaev et al. 2015].</p>
      <p>Example 3. A rule ( 3 →  2) can be specified to the rule (¬ 1 ∧  3 →  2), because of
 ( 3 →  2) = 65 &lt; 1 = 33 =  (¬ 1 ∧  3 →  2). The last rule meets all the requirements of
class M2, and therefore (¬ 1 ∧  3 →  2) ∈  2( 2).</p>
      <p>The next class Imp contains all precise implication and corresponds to the set   ( ) of
definition 4.</p>
      <p>Definition 19.  ∈   ( ) ⇔ ( → =  ).</p>
      <p>Definition 20. Define classes M 1, M2, Imp by joining over all literals</p>
    </sec>
    <sec id="sec-8">
      <title>8. Probabilistic formal concepts</title>
      <p>Fix-points of the prediction operator are obvious candidates for the role of the content of
probabilistic concepts. To determine the volume of notions let’s refer to [Kuznetsov 2007],[Buzmakov
2014]. The idea is to use all possible prototypes of the closing operator, i.e. all sets of M such
that Π( ) =  to compose their derivatives into a single volume of notion A. This will allow
to restore the initial correspondence of objects-concept using the prediction operator and also
to include all objects of the same class into a single volume of the notion.</p>
      <p>Definition 23. A probabilistic formal concept on the context K is a pair (A,B) that meets the
conditions:
Π( ) = , 
=</p>
      <p>∪
Π( )=
 ′,  ⊆  2 ( .</p>
      <p>22).
.</p>
      <p>Example 4. Let’s consider a K context consisting of two non-intersecting squares that
give two independent formal concepts. We will change the values of some cells to make the
task a bit more complicated.</p>
      <p>Note that the most specific rules of M 2 will be (  =1…4 → ¬  =5…8), but not the rule  5 →
¬  =1…4 for which you can find the following clarification  ( 5 ∧  6 → ¬ 1) = 1 &gt; 54 =
the process of rule refinement.
 ( 5∧ →</p>
      <p>¬ 1). This example shows how the noise is processed and invested in probability in
To calculate a concept containing the feature m1, we first calculate its closing
Π({ 1}) =
{ 1, 
2, 
actly the same are gathered in a single volume of the concept {g2, g3, g4}.</p>
      <p>8}. Then all the objects whose closing turns out to be
ex</p>
      <p>To distinguish probabilistic formal concepts from classical formal concepts on the context
K, we will call the latter simply formal concepts. The definition of set A is also based on the
following theorem linking probabilistic and simple formal concepts on context K.</p>
      <p>Theorem 3. Let K be a formal context.
1. If (A,B) is a formal concept in K, then there is a probabilistic formal concept (N,M) in the
2. If (N,M) is a probabilistic formal concept on K, then there is a family  of formal concepts</p>
    </sec>
    <sec id="sec-9">
      <title>9. Statistical method for detection of probabilistic formal concepts</title>
      <p>same context, such as  ⊆  ,
in the same context, such that
 ̃ – probabilistic law and  ̃ &gt;  .
law  ̃ such as ( ̃ &gt;  ).</p>
      <p>Definition 27.</p>
      <p>.
2
∀(,  ) ∈  (Π( ) =  ),
 =
(, ∪)∈

will be defined.</p>
      <p>The procedure for obtaining all M rules by exhaustive search is exponential, because the
 (2| |) candidate rules (each atom or its negation can be included in the rule) are evaluated
at |G|. Of course, this estimation is rather rough and the procedure can be improved. Instead
a more radical simplification of the problem by finding a statistical approximation of M</p>
      <sec id="sec-9-1">
        <title>2-rules</title>
        <p>Definition 24. R is a probabilistic law, if ( ̃ ⊏ ) ⇒ ( ̃ &lt;  ).</p>
        <sec id="sec-9-1-1">
          <title>Definition 25.</title>
          <p>A rule  ̃ is semantically derived from the rule R, we denote as  ⊳  ̃ , if R and
Definition 26. The probabilistic law R will be called the strongest if there is no probabilistic</p>
          <p>We define the Semantic Probabilistic Inference (SPI) of some strongest
probis the strongest probabilistic law.
abilistic law</p>
          <p>as such a sequence of probabilistic laws  0 ⊳  1 ⊳  2. ⊳   that  0← = ∅ and  
Let us denote by  the set of all the strongest probabilistic laws obtained by various semantic
probabilistic inferences. Specification check R1 &gt; R2 requires an inequality  ( 1) &gt;  ( 2) check.</p>
          <p>In general, from a practical point of view, we cannot assume that a probability measure is
defined as in definition 9, so we will consider it unknown. The probabilistic inequalities used
in the SPI definition we will determine using some statistical criterion, e.g. the exact Fisher
independence criterion with a confidence level  . The resulting set of rules 

make contradicting predictions. Therefore, to approximate the prediction operator Π( ), it
is necessary to introduce an additional criterion of mutual consistency of predictions by rules
will already
 on the set L.</p>
          <p>In this case, we write that  ∈ 
this case, we write that 
∈  
( ) ⊆  .</p>
          <p>Definition 29. The rule  ∈  is confirmed on the set of letters L, if  ← ⊂  and  → ∈  .</p>
        </sec>
        <sec id="sec-9-1-2">
          <title>Definition 30.</title>
          <p>The rule  ∈ </p>
          <p>is refuted on the set of letters L, if  ← ⊂  and  → ∈ ¬ . In</p>
          <p>The diference between the quantity and quality of rules confirmed and refuted on the
concept-hypothesis L is the determinant for adding or deleting literals to the content of the
concept.</p>
          <p>Definition 31. The consistency criterion for the set L is the value
  ( ) =</p>
          <p>∑
 ∈ ( )
 ( ) −</p>
          <p>∑
 ∈ ( )
 ( ).
below, we are guided by Shannon’s considerations as</p>
          <p>The choice of rule evaluation  may depend on additional task specifics. In the experiments
 ( ) = − (1 +  −  ( )).</p>
        </sec>
        <sec id="sec-9-1-3">
          <title>Definition 32.</title>
          <p>Define a consistent prediction operator
by one element so as to strictly increase the consistency criterion:
Υ( ) that changes a set of literals L</p>
          <p>G;
1. For all  ∈   ⧵ calculate the efect of adding G: △+ =   ( ∪ { }) −  
( );
2. For all 
∈  calculate the efect of removing G: △−=  
( ⧵{ }) −  
( );
3. The operator Υ( ) adds a letter G to the L set if △+&gt; 0 and △+&gt;△−; the operator delete a
letter G from L if △−&gt; 0 and △−&gt;△+. If △−= △+ and △−&gt; 0, the operator will delete literal
4. If △+≤ 0 and △−≤ 0, the operator returns L and we have a fix-point.</p>
          <p>Definition 33. A probabilistic formal concept discovered by a statistical method is a
probaby various semantic probabilistic inferences with a confidence level  in the context K.
bilistic formal concept in which </p>
          <p>is a set of all strongest probabilistic laws (def. 26) obtained
10. Experiments in the detection of probabilistic formal
concepts by statistical method
Let us illustrate the detection of probabilistic formal concepts by statistical method on the
example of classes (concepts) detection of digits. Let’s encode the digits with 24 features, located
as shown in fig. 1. Each feature has 7 values (see fig. 2). Some lines of numbers are encoded
twice, for example, the vertical line at number 4 and 1 is encoded both by the vertical right line
(feature 3) in cells 3,7,11,15,19,23 and by the vertical left line (feature 5) in cells 4,8,12,16,20,24.</p>
          <p>Two experiments were conducted for two diferent contexts. The first contained 30 copies of
each digit (360 in total) randomly mixed together. The program does not know where the digit
is and it treats them as diferent images. The second context contained 10 copies of each of the
10 digits (without repeating sixes and nines, a second copy was taken) and 300 random images
in which each of the 24 features took a random value from the seven available. All numbers
and images were also randomly mixed. In each context a corresponding set 
probabilistic laws with  = 0.05 for the Fisher criterion were found. In the first experiment
66160 regularities with conditional probability greater than 0.9 and  = 0.05 were found. In the
second experiment 629494 regularities with conditional probability greater than 0.9 and  =</p>
          <p>In the first context, the method found as classes all 12 digits – they were fix-points, as well as
two fix-points for 6 and 9, which did not contain, respectively, signs 13 and 12, with which two
copies of digits difered. In the second context, the method identified 10 digits as fix-points.</p>
          <p>In both experiments, the stability of the probabilistic formal concepts found was tested with
regard to the lack of information on any feature of a digit. It was checked that if any digit
described by the set of literals L removes one of the literals  
or ¬  , i.e., removes one of
the values of the feature, the digit will still be assigned to the same class and the unknown
feature will be correctly identified and added to L, i.e.
removed value of the feature will be correctly predicted by the regularities from 
without
contradictions on the part of the logical relationship of this feature with other features. In the

ifrst experiment, for all 360 digits, a feature was accidentally removed and then restored by the
prediction operator. In the second experiment, the removed features were restored in 96% of
Π( ⧵  ) = 
. This means that the
cases.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>The publication of this article was supported by the project «Investigation, analysis and
complex independent expertise of projects of the National technological initiatives, including the
accompanying of projects of «road map» NeuroNet», which is executed within the framework
of the state assignment №. 28.12487.2018/12.1 of the Ministry of Science and Higher
Education of the Russian Federation. The study was also carried out within the framework of the
state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0002) regarding
theoretical results in Sections 6-9.
[Grifiths et al. 2009] Grifiths, T. L., Tenenbaum, J. B. Theory-based causal induction.
Psychological Review, 116, 56.
[Rehder et al. 2001] Rehder, B., Hastie, R. Causal knowledge and categories: The efects of
causal beliefs on categorization, induction, and similarity. Journal of Experimental
Psychology: General, 130, 323–360.
[Rehder 2003a] Bob Rehder. Categorization as causal reasoning // Cognitive Science 27 (2003)
709–748.
[Rehder 2003b] Rehder, B. A causal-model theory of conceptual representation and
categorization. J. of Exper. Psych.: Learning, Memory, and Cognition, 29, 1141-1159.
[Rehder et al. 2011] Bob Rehder, Jay B. Martin. Towards A Generative Model of Causal Cycles
// 33rd Annual Meeting of the Cognitive Science Society 2011, (CogSci 2011), Boston,
Massachusetts, USA, 20-23 July 2011, V.1 pp. 2944-2949.
[Rosch 1973] Rosch, E.H. Natural categories // Cognitive Psychology 4. 1973. P. 328-350.
[Rosch et al. 1975] Rosch, E., Mervis, C.B. Family resemblances. Studies in the internal
structure of categories // Cognitive Psychology, 7. P. 573–605.
[Rosch et al. 1976] Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., Boyes-Braem, P. Basic
objects in natural categories // Cognitive Psychology, 8. P. 382–439.
[Rosch et al. 1978] Rosch, Eleanor and Lloyd, Barbara B. (eds) Cognition and categorization.</p>
      <p>Hillsdale, NJ: Lawrence Erlbaum, 1978, P. 27-48.
[Rosch 1978] Rosch, E. Principles of Categorization // Rosch, E. &amp; Lloyd, B.B. (eds), Cognition
and Categorization, Lawrence Erlbaum Associates, Publishers, (Hillsdale), 1978. P.
27–48
[Ross et al. 2008] B. H. Ross, E. G. Taylor, E. L. Middleton, and T. J. Nokes. Concept and
Category Learning in Humans // H. L. Roediger, III (Ed.), Cognitive Psychology of
Memory. Vol. [2] of Learning and Memory: A Comprehensive Reference, 4 vols. (J.Byrne
Editor), Oxford: Elsevier, 2008, P. 535-556.
[The Nature of Classification 2013] The Nature of Classification. Relationships and Kinds in
the Natural Sciences. Palgrave Macmillan. 2013. 208p.
[Ganter 2003] Ganter, B. Formal Concept Analysis: Methods, and Applications in Computer</p>
      <p>Science. TU Dresden (2003).
[Ganter et al. 1999] Ganter, B., Wille, R. Formal concept analysis – Mathematical Foundations.</p>
      <p>Berlin-Heidelberg-New York, Springer (1999)
[Ganter et al. 2004] Ganter, B., Obiedkov, S. Implications in Triadic Formal Contexts. TU
Dresden, Springer (2004)
[Klimushkin et al. 2010] Klimushkin, M., Obiedkov, S., Roth, C. Approaches to the Selection
of Relevant Concepts in the Case of Noisy Data. In: ICFCA 2010, LNAI 5986, 255-266.
(2010)
[Frequent Itemset Mining 2004] Bayardo Jr., R., Goethals, B., Zaki, M. (eds.) Proc. of the IEEE
ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004).
CEURWS.org (2004)
[Kuznetsov 2007] Kuznetsov, S.O. On Stability of a Formal Concept. Annals of Mathematics
and Artificial Intelligence. 49, 101-115 (2007)
[Buzmakov 2014] Buzmakov, A., Kuznetsov, S., Napoli, A. Concept Stability as a Tool for
Pattern Selection. CEUR Workshop proceedings, vol. 1257, ECAI 2014, pp. 51-58 (2014)
[Emilion et al. 2009] Emilion R., Levy G. Size of random Galois lattices. Discrete Applied Math.</p>
      <p>J. 157, 2945-2957 (2009)
[Speransky 2013] Speransky, S.O. Logic of probability and probability logic. Novosibirsk State</p>
      <p>University, Novosibirsk, PhD thesis. (2013)
[Vityaev 2006] Vityaev, E.E. The logic of prediction. In: Proceedings of the 9th Asian Logic
Conference Mathematical Logic in Asia, Novosibirsk, Russia, August 16–19, 2005, pp.
263–276. World Scientific, Singapore (2006)
[Vityaev et al. 2015] Vityaev, E.E., Martynovich, V.V. Probabilistic Formal Concepts with
Negation. Perspectives of System Informatics. A. Voronkov, I. Virbitskaite (Eds.),
LNCS vol. 8974, pp. 385-399 (2015)
[Martynovich et al. 2016] Vitaliy V. Martynovich, Euvgeniy E. Vityaev. Recovering Noisy
Contexts with Probabilistic Formal Concepts // Proceedings of the 2nd International
Workshop on Soft Computing Applications and Knowledge Discovery (SCAKD
2016), Moscow, Russia, July 18, 2016. CEUR Workshop Proceedings, v. Vol-1687, pp.
24-35.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>[Vityaev 1992] Vityaev</surname>
            <given-names>E.E.</given-names>
          </string-name>
          <article-title>Semantic approach to creation of knowledge bases. Se-mantic Probabilistic Conclusion of the Best Prediction PROLOGRAMS by Probabilistic Data Model. // Logic and Semantic Programming (Computing Systems</article-title>
          , vol.
          <volume>146</volume>
          ), Novosibirsk,
          <year>1992</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Vityaev et al. 2014
          <string-name>
            <surname>] Vityaev</surname>
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neupokoev</surname>
            <given-names>N.V.</given-names>
          </string-name>
          <article-title>Formal model of perception and image as ifx-point of anticipation</article-title>
          . Approaches to Thinking Modeling.
          <article-title>(Collected under the editorship of V.G</article-title>
          . Redko,
          <string-name>
            <surname>Ph</surname>
          </string-name>
          .D.-m.D.). URSS Editorial-Al, Moscow,
          <year>2014</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Vityaev et al. 2015
          <string-name>
            <surname>] Vityaev</surname>
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martynovich</surname>
            <given-names>V.V.</given-names>
          </string-name>
          <article-title>Formalization of “natural" classification and systematics through fixed prediction points //</article-title>
          <source>Siberian Electronic Mathematical Reports</source>
          , Vol.
          <volume>12</volume>
          , Institute of Mathematics, Moscow, Russia.
          <source>Siberian Electronic Mathematical Reports</source>
          , Volume
          <volume>12</volume>
          ,
          <string-name>
            <surname>Sobolev</surname>
            <given-names>Institute</given-names>
          </string-name>
          <source>of Mathematics, Siberian Branch of the Russian Academy of Sciences</source>
          ,
          <year>2015</year>
          .
          <fpage>1006</fpage>
          -
          <lpage>1031</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Vityaev et al. 2012
          <string-name>
            <surname>] Vityaev</surname>
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demin</surname>
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponomariov D.K</surname>
          </string-name>
          .
          <article-title>Probabilistic generalization of formal concepts // Programming</article-title>
          , T.
          <volume>38</volume>
          , #
          <volume>5</volume>
          ,
          <year>2012</year>
          , p.
          <fpage>18</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Smirnov 1985]
          <article-title>Smirnov S.D. Patterns psychology</article-title>
          . Moscow: Moscow State University,
          <year>1985</year>
          ,
          <year>231s</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Naiser 1981]
          <article-title>Naiser U. Knowledge and reality</article-title>
          . M.: “
          <source>Progress"</source>
          ,
          <year>1981</year>
          ,
          <volume>229</volume>
          c.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Mill 2011]
          <article-title>Mill J.S. Logic system of syllogistic and inductive, 2011, Prel</article-title>
          . and
          <string-name>
            <given-names>adj. V.C.</given-names>
            <surname>Finn</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          : Lenand, 832 s.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Rutkovsky 1884]
          <article-title>Rutkovsky L. Elementary textbook of logic</article-title>
          .
          <source>St. Petersburg</source>
          ,
          <year>1884</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Mayen et al. 1976]
          <string-name>
            <surname>Meyen</surname>
            <given-names>S.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schreider S</surname>
          </string-name>
          .A. Methodological aspects of classification theory // Problems of philosophy,
          <year>1976</year>
          , #
          <volume>12</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>[Smirnov 1938] Smirnov</surname>
            <given-names>E.S.</given-names>
          </string-name>
          <article-title>Design of a kind of taxonomic point of view // Zool</article-title>
          . Zhurn. Vol.
          <volume>17</volume>
          , No.
          <volume>3</volume>
          ,
          <issue>1938</issue>
          , Pp.
          <fpage>387</fpage>
          -
          <lpage>418</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Zabrodin 1981]
          <string-name>
            <surname>Zabrodin</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <source>On criteria of natural classification // NTI, ser.2</source>
          ,
          <year>1981</year>
          , #
          <volume>8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Cheng 1997] Cheng,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>From covariation to causation: A causal power theory</article-title>
          .
          <source>Psychological Review</source>
          ,
          <volume>104</volume>
          ,
          <fpage>367</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Gopnik et al. 2004]
          <string-name>
            <surname>Gopnik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glymour</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sobel</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schulz</surname>
            ,
            <given-names>L. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kushnir</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>A theory of causal learning in children: Causal maps and Bayes nets</article-title>
          .
          <source>Psychological Review</source>
          ,
          <volume>111</volume>
          ,
          <fpage>3</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>