=Paper=
{{Paper
|id=Vol-2648/paper5
|storemode=property
|title=Formalization of "Natural" Classification and "Natural" Concepts by Probabilistic Generalization of the Formal Concepts Analysis
|pdfUrl=https://ceur-ws.org/Vol-2648/paper5.pdf
|volume=Vol-2648
|authors=Evgenii E. Vityaev,Vladislav Degtiarev,Bayar Pak,Yuri Meister
}}
==Formalization of "Natural" Classification and "Natural" Concepts by Probabilistic Generalization of the Formal Concepts Analysis==
<pdf width="1500px">https://ceur-ws.org/Vol-2648/paper5.pdf</pdf>
<pre>
Formalization of “Natural” Classification and
“Natural” Concepts by Probabilistic Generalization of
Formal Concepts Analysis
Evgenii E. Vityaeva , Vladislav Degtiarevb , Bayar Pakb and Yuri Meisterb
a
    Sobolev Institute of Mathematic, Novosibirsk, Russia
b
    Novosibirsk State University, Novosibirsk, Russia


                                         Abstract
                                         In the previous works, a probabilistic generalization of the formal concepts analysis was developed.
                                         This generalization is induced by the problem of formal concepts determining under noise conditions,
                                         when the lattice of formal concepts exponentially grows. In this paper, probabilistic formal concepts
                                         with negation are determined, as well as a statistical method for detecting these probabilistic formal
                                         concepts. The purpose of this paper is to show that probabilistic formal concepts have a deeper meaning.
                                         It is argued that probabilistic formal concepts formalize the “natural” concepts described in cognitive
                                         sciences by “causal models”, which are characterized by a highly correlated structure of attributes. The
                                         same structure is specific for the “natural" classification of objects of the external world. The definition
                                         of “natural" classification given by J. Stuart Mill is fairly accurately formalized by probabilistic formal
                                         concepts.


1. Introduction
In previous works [Vityaev et al. 2012], [Vityaev et al. 2015], [Martynovich et al. 2016], a
probabilistic generalization of the Formal Concepts Analysis (FCA) was developed. This gen-
eralization solves the problem of the formal concepts discovery in noise conditions associated
with the exponential growth of the lattice of formal concepts in the presence of noise. This
issue is discussed in a more detail in section 2.
   Probabilistic generalization of formal concepts was obtained by generalizing the definition
of formal concepts as fix-points of implication. For this purpose, the probabilistic measure on
the objects of context and the most specific probabilistic implication were defined. Consistency
of fix-points based on maximally specific probabilistic implications is proved and then prob-
abilistic formal concepts are defined as fix-points of predictions based on maximally specific
probabilistic implications. Figuratively speaking, the obtained probabilistic formal concepts
not display data in all details as in photographs, but give an “idealized" representation of the
data.


Russian Advances in Artificial Intelligence: selected contributions to the Russian Conference on Artificial intelligence
(RCAI 2020), October 10-16, 2020, Moscow, Russia
" vityaev@math.nac.ru (E.E. Vityaev)

                                       © 2020 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
   In this work, probabilistic formal concepts with negation are defined, as well as a statistical
method of detecting these probabilistic formal concepts based on the works [Vityaev et al.
2014], [Vityaev et al. 2015], [Martynovich et al. 2016].
   The purpose of this work is to show that probabilistic formal concepts have deeper meaning
than their restoration in noise conditions. In [Vityaev et al. 2014], it was shown that the fix-
points of anticipation underlying probabilistic formal concepts model the process of perception
in exact accordance with the psychology of perception described in [Smirnov 1985], [Naiser
1981].
   In addition, this paper shows that probabilistic formal concepts simulate “natural" concepts
explored in cognitive sciences. “Natural" concepts are concepts about directly perceived ob-
jects, not abstract objects like production, plant and object. The following section, based on
research in cognitive sciences, provides principles of categorization and description of “natu-
ral" concepts. It follows that “natural" concepts reflect a highly correlated structure of features
of objects of the external world, which can be represented by “causal models". It is argued that
causal models are rather precisely modeled by probabilistic formal concepts.
   The highly correlated structure of features of objects of the external world is also manifested
in the “natural" classification (see section 3). J. Stuart Mill has observed that “natural" classes of
animals or plants are described by a potentially infinite number of properties [Mill 2011]. The
naturalists who built the “natural" classifications also noted that the more similar the compared
objects are in the greater number of significant features, the more likely their similarity in other
features [Rutkovsky 1884], and that the construction of the “natural" classification consists in
“indication" - from the infinite number of features we need to move to a limited number of
features that would replace all other features [Smirnov 1938]. This means that in the “natural"
classes the features are strongly correlated, for example, if classes 128 and features are binary,
the independent “indicator" features among them will be only 7 features, because 27 =128, and
the remaining features, which can be tens and hundreds (denoted by N) can be predicted by
these 7 features, which means the presence of N-7 regularities. Since the “indicator" features
can be different 7 features from N, the choice of which is equal to 𝐶𝑁7 , the number of regularities
that predict non-indicator features based on the indicator features is not much less than (𝑁 −
7) ∙ 𝐶𝑁7 . Highly correlated structure of features is well detected by the probabilistic formal
concepts; moreover, the following statement by J.S. Mill almost precisely defines the meaning
of probabilistic formal concepts: “Natural groups ... are defined by features ... However, at the
same time not only features which are certainly common for all the objects included into the
group but the whole set of those features is taken into account, of which all are found in the
majority of these objects and the majority - in all" [Mill 2011].


2. “Natural" concepts
In the works of Eleanor Rosch [Rosch 1973, 1978], [Rosch et al. 1975, 1976, 1978] the principles
of categorization of “natural" categories [Rosch 1973, 1978] are formulated on the basis of the
conducted experiments. Directly perceived objects (basic objects) – information rich bundles
of observed and functional properties that form a natural partition that creates a categoriza-
tion. These bundles form “prototypes" of objects classes: Categories can be viewed in terms
of their clear cases if the perceiver places emphasis on the correlational structure of perceived
attributes . . . By prototypes of categories we have generally meant the clearest cases of cate-
gory membership [Rosch et al. 1978], [Rosch 1978]. Rosch and Mervis (1975) have shown that
the more prototypical of a category a member is rated, the more attributes it has in common
with other members of the category and the fewer attributes in common with members of the
contrasting categories [Rosch et al. 1975, 1976].
   Later on, the theory of “natural" concepts of Eleanor Rosch was called a “prototype” theory.
Its main features are described in the following way: The prototype view (or probabilistic
view) keeps the attractive assumption that there is some underlying common set of features
for category members but relaxes the requirement that every member have all the features
[Ross et al. 2008].
   Further research has found that models based on features, similarities and prototypes are not
sufficient to describe classes. Theoretical, causal and ontological knowledge relating to class
objects must be taken into account. For example, people not only know that birds have wings,
can fly and nest in trees, but also that birds nest in trees because they can fly and fly because
they have wings.
   Based on this research, Bob Rehder, in his work “Categorization as causal reasoning", formu-
lated a causal-model theory: people’s intuitive theories about categories of objects consist of a
model of the category in which both a category’s features and the causal mechanisms among
those features are explicitly represented [Rehder 2003a].
   In the theory of causal models, the relationship of an object to a category is no longer based
on a set of features and proximity based on features, but on the similarity of the generating
causal mechanism: Specifically, a to-be-classified object is considered a category member to the
extent that its features were likely to have been generated by the category’s causal laws, such
that combinations of features that are likely to be produced by a category’s causal mechanisms
are viewed as good category members and those unlikely to be produced by those mechanisms
are viewed as poor category members [Rehder 2003a].
   Some researchers have used Bayesian networks or causal models to represent causal knowl-
edge [Cheng 1997], [Gopnik et al. 2004], [Griffiths et al. 2009]. However, these models cannot
model cyclical causal relationships because Bayesian networks do not support cycles.
   In [Rehder et al. 2011], Bob Rehder proposed a model of causal cycles based on the “dis-
closure" of causal graphical models (CGMs). The “disclosure" is done by creating a Bayesian
network that deploys over time. Such a network is already quite well suited for modeling “nat-
ural" concepts reflecting the highly correlated structure of the outside world. However, it does
not explicitly formalize causal link cycles.
   The model we propose further is directly based on cyclic causal links represented by proba-
bilistic formal concepts.


3. “Natural" classification
The first sufficiently detailed philosophical analysis of the “natural" classification belongs to
J.St. Mill [Mill 2011]. This analysis revealed all the main properties of the “natural" classifica-
tions, which were later confirmed by naturalists who built the “natural" classifications.
   According to J.St. Mill [Mill 2011], “artificial" classifications differ from “natural" classifica-
tions in that they can be based on any one or more attributes, so that different classes differ only
in that they include objects with different values of these attributes. But if we consider classes
of “animals" or “plants", they differ in such a large (potentially infinite) number of properties
that they cannot be enumerated" [Mill 2011].
   J.St. Mill gives the following definition of the “natural" classification: it is such a classification
which unites objects into groups concerning which it is possible to express the greatest number
of general propositions and it is based on such properties which serve as reasons for many
others or at least make up their true attributes. It also defines the notion of the “image" of the
class, which is the forerunner of “natural" concepts: “our concept of class – the image by which
this class is represented in our mind – is the concept of a certain specimen that possesses all
the features of this class ... to the highest degree".
   The properties of “natural" classes according to J. St. Mill were confirmed by naturalists.
For example, Wavell W.: “The more general statements about objects give an opportunity to
make classification, the more natural it is" [Mayen et al. 1976]. L. Rutkovsky writes about
the inexhaustible number of general properties of “natural" classes. [Rutkovsky 1884]: “The
greater the number of essential features similar to comparable objects, the more likely their
similarity in other respects". A similar statement is made by E.S. Smirnov: “The taxonomic
problem lies in ’indication’: from an infinite number of features we need to move to a limited
number of them, which would replace all other features" [Smirnov 1985].
   As a result, the problem of defining “natural" classifications has been formulated and is still
discussed in the literature [Zabrodin 1981], [The Nature of Classification 2013]. However, from
our point of view, the formalization of the “natural" classification is not yet sufficiently ade-
quate.


4. The problem of formal concepts discovery in noise
   conditions
Let us consider the known problem of formal concepts analysis [Ganter 2003]: standard meth-
ods of concept lattice construction on noisy data turn out to be unproductive, because they
are designed for accurate data, and in the presence of noise (data distortions) the concept lat-
tice turns out to be filled with noisy clones of initial concept classes [Klimushkin et al. 2010],
[Frequent Itemset Mining 2004]. Attempts to solve this problem “on the forehead" do not look
promising [Klimushkin et al. 2010], [Kuznetsov 2007]. The main idea of these works is to take
a full lattice of concepts, and get rid of noise, “side” concepts by filtering. A widespread eval-
uation of the “side” concept is the Stability Index [Kuznetsov 2007], [Buzmakov 2014]. The
Stability Index and other approaches based on the filtration of lattices have significant disad-
vantage – the filtered lattices of concepts often have exponential sizes [Emilion et al. 2009],
which is unacceptable for large contexts.
5. Basics of formal concepts analysis
Let’s start with the main definitions of formal concepts analysis. The majority of definitions
here start with classical works on FCA [Ganter 2003], [Ganter et al. 1999], others are taken
from [Ganter et al. 2004].
   Definition 1. Formal context is a triple (𝐺, 𝑀, 𝐼 ), where 𝐺 and 𝑀 – arbitrary sets of objects
and attributes, and 𝐼 ⊆ 𝐺 × 𝑀 – binary relation, expressing the belonging of an object to an
attribute.
   In formal context, the operators of the derivative play a key role.
   Definition 2.𝐴 ⊆ 𝐺, 𝐵 ⊆ 𝑀. Then:
   1. 𝐴↑ = {𝑚 ∈ 𝑀 | ∀𝑔 ∈ 𝐴, (𝑔, 𝑚) ∈ 𝐼 }
   2. 𝐵↓ = {𝑔 ∈ 𝐺 | ∀𝑚 ∈ 𝐵, (𝑔, 𝑚) ∈ 𝐼 }

   Operators of a derivative link a subset of objects and attributes of a context. In the future
we will also refer to both operators as a derivative with a single symbol ′.
   Definition 3. A pair (𝐴, 𝐵) – is a formal concept, if 𝐴↑ = 𝐵 and 𝐵↓ = 𝐴.
   Definition 4. An implication is a pair of attribute sets (𝐵, 𝐶), 𝐵, 𝐶 ⊆ 𝑀 that is recorded as
𝐵 → 𝐶. Implication 𝐵 → 𝐶 is truth on the context 𝐾 = (𝐺, 𝑀, 𝐼 ), if ∀𝑔 ∈ 𝐺(𝐵⊈𝑔 ′ ∨ 𝐶 ⊆ 𝑔 ′ ).
A set of all true implications on context K well be denoted as 𝐼 𝑚𝑝(𝐾 ).
   Definition 5. For any set L of implications we define a direct inference operator 𝑓𝐿 that add
all possible conclusions of implications to the original set of attributes X:

                              𝑓𝐿 (𝑋 ) = 𝑋 ∪ {𝐶 | 𝐵 ⊆ 𝑋 , 𝐵 → 𝐶 ∈ 𝐿}

  Theorem 1. [Ganter et al. 1999] For any subset 𝐵 ⊆ 𝑀, 𝑓𝐼 𝑚𝑝(𝐾 ) (𝐵) = 𝐵 ⇔ 𝐵′′ = 𝐵.
  Let us illustrate Theorem 1 in the following simple context.

Table 1
[Ganter et al. 1999] Very simple formal context K0

                                                m1   m2    m3
                                                1    1     0
                                                1    1     1
                                                0    1     1
                                                0    0     0


    We can reformulate the task of constructing a concept lattice in terms of implication and
output operator. Looking at Table 1, it is not difficult to notice that m1 and m3 attributes
completely define the object class. In fact, m1 infer m2 , and the same is true for m3 , from these
it follows that 𝑚1 → 𝑚2 and 𝑚3 → 𝑚2 . The sets {𝑚1 , 𝑚2 }, {𝑚2 , 𝑚3 } and {𝑚1 , 𝑚2 , 𝑚3 } will be
formal concepts. According to the theorem, they will also be fix-points of the direct inference
operator. For example,
                                    𝑓𝐼 𝑚𝑝(𝐾 )             𝑓𝐼 𝑚𝑝(𝐾 )
                             {𝑚1 } −−−−−→ {𝑚1 , 𝑚2 } −−−−−→ {𝑚1 , 𝑚2 }.
   It is therefore {𝑚1 , 𝑚2 } both a fix-point and a formal concept.
   Now let’s add some noise to the original context K0 from Table 1. In order to keep the noise
level at a fairly low level, let’s also expand the context by adding duplicates of objects:

Table 2
K0 with duplicates.

                                         𝑚1    𝑚2    𝑚3
                                         1     1     0
                                         1     1     1
                                         0     1     1
                                         1     1     0
                                         1     1     1
                                         0     1     1
                                         1     1     0
                                         1     1     1
                                         0     1     1
                                         0     0     0


Table 3
𝐾𝑛𝑜𝑖𝑠𝑒 a little noise

                                         𝑚1    𝑚2    𝑚3
                                         0     1     0
                                         1     1     1
                                         0     1     1
                                         1     1     0
                                         1     0     1
                                         0     1     1
                                         1     1     0
                                         1     1     1
                                         0     1     1
                                         0     0     0


   Every small change made to the context will have a dramatic impact on the conceptual lat-
tice. The first context in Table 2 is equivalent to the original context K0 and generates the
same conceptual lattice. However, the lattice with noise of the second context in Table 3 will
contain some fictitious notions {m2 } and {m1 , m3 } generated by noise, which will also be formal
notions. The number of fictitious notions will increase with the noise level, and the qualitative
estimation of speed is exponential [Emilion et al. 2009].
6. Probabilistic logic in a formal context
To avoid the exponential growth of the lattice, a stability index is introduced [Klimushkin et al.
2010], [Kuznetsov 2007],[Buzmakov 2014] to assess the degree of stability of the concept and
to select only sustainable concepts.
   We will present a fundamentally different way of the lattice construction, based on proba-
bilistic implications. For this purpose, we redefine the context within the framework of logic.
Further we will consider only finite contexts.
   Definition 6. For the finite context 𝐾 = (𝐺, 𝑀, 𝐼 ) we define a signature Ω𝐾 that contains
predicate symbols m(x) for each 𝑚 ∈ 𝑀. For the signature Ω𝐾 and the K context, considered
as a model, we define the interpretation of predicate symbols as follows 𝐾 ⊧𝑚(𝑥) ⇔ (𝑥, 𝑚) ∈ 𝐼 .
   Definition 7. For the signature Ω𝐾 let’s define the following variant of the first order logic:

   1. 𝑋𝐾 – set of variables;
   2. 𝐴𝑡 𝐾 – set of atomic formulas (atoms) 𝑚 (𝑥) where 𝑚 ∈ Ω𝐾 and 𝑥 ∈ 𝑋𝐾 ;
   3. 𝐿𝐾 – set of literals, including all atoms 𝑚 (𝑡) and their negations ¬𝑚(𝑡);
   4. Φ𝐾 – set of formulas defined inductively: every atom is a formula, for any Φ, Ψ ∈ Φ𝐾
      following syntactic constructions Φ ∧ Ψ, Φ ∨ Ψ, Φ → Ψ, ¬Φ are formulas as well.

  For a set of literals 𝐿 ⊆ 𝐿𝐾 we define their conjunctions as ∧𝐿 = ∧ 𝑃. In the same way
                                                                         𝑃∈𝐿
¬𝐿 = {¬𝑃 | 𝑃 ∈ 𝐿}.
  Definition 8. A single element set {𝑔}, together with signature Ω𝐾 predicates, forms a
model 𝐾𝑔 of this object. The fact of truth of a formula 𝜙 on an object model 𝐾𝑔 is written as
𝑔⊧𝜙 ⇔ 𝐾𝑔 ⊧𝜙.
  Definition 9. Let us define a probabilistic measure 𝜇 on the set G, in the Kolmogorov sense.
Then we can define a probabilistic measure on the set of formulas as:

                             𝜈 ∶ Φ𝐾 → [0, 1], 𝜈(𝜙) = 𝜇({𝑔 | 𝑔⊧𝜙}).

   Let us assume that the context has no statistically insignificant objects such as 𝜇({𝑔}) =
0, 𝑔 ∈ 𝐺.
   Definition 10. The set of literals M will be called 𝜈-joint if 𝜈(∧𝑀) > 0.
   In the future, the compatibility of the set of literals will be considered within a probabilis-
tic measure of a context, and in the absence of ambiguity the symbol of the measure will be
omitted.
   Let us consider some subset of atoms 𝐿 ⊆ 𝐴𝑡(𝐾 ). The formula 𝑚1 ∧ 𝑚2 … ∧ 𝑚𝑘 → 𝑚 =
∧{𝑚𝑖 } → 𝑚 defines implication ({𝑚𝑖 } , {𝑚}) on the context. The presence of a probabilistic
measure allows to determine the probabilistic implication.
   Let’s define the rules on the context.
   Definition 11. Let {𝐻1 , 𝐻2 , … , 𝐻𝑘 , 𝐶} ∈ 𝐿𝐾 , 𝐶 ∉ {𝐻1 , 𝐻2 , … 𝐻𝑘 }, 𝑘 ≥ 0.

   1. The rule 𝑅 = (𝐻1 , 𝐻2 ., 𝐻𝑘 → 𝐶) is called implication (𝐻1 ∧ 𝐻2 … ∧ 𝐻𝑘 → 𝐶);
   2. The premise 𝑅 ← of rule R is a set of literals {𝐻 1 , 𝐻2 ,..., 𝐻𝑘 };
   3. The conclusion of the rule is 𝑅 → = 𝐶;
   4. The length of the rule is called the premise power |𝑅 ← |;
   5. If 𝑅1← = 𝑅2← and 𝑅1→ = 𝑅2→ , then 𝑅1 = 𝑅2 .

  Definition 12. The probability 𝜂 of the rule R is the value

                                                       𝜈(𝑅 ← ∧ 𝑅 → )
                               𝜂(𝑅) = 𝜈(𝑅 → |𝑅 ← ) =                 .
                                                          𝜈(𝑅 ← )

  If the rule denominator 𝜈(∧𝑅 ← ) is equal to 0, then the probability is not defined.


7. Rule classes and consistency of predictions
The following sections are a combination of works [Ganter et al. 1999], [Vityayev 1992],
[Vityaev 2006], [Vityayev et al. 2012], [Vityaev et al. 2015], [Martynovich et al. 2016].
  Definition 13. Let us define a prediction operator by a set of rules  as

                         Π (𝐿) = 𝐿 ∪ {𝐶 | ∃𝑅 ∈  ∶ 𝑅 ← ⊆ 𝐿, 𝑅 → = 𝐶}.

  Definition 14. A closure 𝐿̄ of the set of literals L is the smallest fix-point of a prediction
operator containing L
                              𝐿̄ = Π (𝐿̄) = Π∞            𝑘
                                               (𝐿) = ⋃ Π (𝐿).
                                                         𝑘∈ℕ

   Definition 15. Rule R1 is a subrule of rule R2 if 𝑅1→ = 𝑅2→ , 𝑅1← ⊂ 𝑅2← . We denote this fact
as 𝑅1 ⊏𝑅2 .
   Definition 16. Rule R1 specifies rule R2 , or R1 > R2 if 𝑅2 ⊏𝑅1 and 𝜂(𝑅1 ) > 𝜂(𝑅2 ).
   Example 2. A rule 𝑚1 → 𝑚2 from the context 𝐾𝑛𝑜𝑖𝑠𝑒 (Table 3) has only one, unconditional
sub-rule (∅ → 𝑚2 )⊏(𝑚1 → 𝑚2 ) but it is wrong that the first rule specifies the second 𝜂(∅ →
        8
𝑚2 ) = 10   = 54 = 𝜂(𝑚1 → 𝑚2 ). However, (𝑚3 → 𝑚2 ) > (∅ → 𝑚2 ) since 𝜂(∅ → 𝑚2 ) = 10    8
                                                                                           < 56 =
𝜂(𝑚3 → 𝑚2 ).
   To achieve consistency and resistance to noise, some restrictions on the rule classes used
will be required [Vityayev 1992], [Speransky 2013].
   Class M1 defined below contains only those rules whose conditional probability is strictly
greater than the probability of an unconditional rule.
   Definition 17. 𝑅 ∈ 𝑀1 (𝐶) ⇔ 𝜂(𝑅) > 𝜈(𝑅 → ), 𝑅 → = 𝐶, 𝑅 ← = ∅.
   The next class M2 requires the property of maximum specificity – the impossibility to in-
crease the conditional probability by refining the rule [Vityaev 2006].
   Definition 18. 𝑅 ∈ 𝑀2 (𝐶) ⇔ (𝑅 ∈ 𝑀1 (𝐶))&(𝑅⊏𝑅̃ ⇒ 𝜂(𝑅̃ ) ≤ 𝜂(𝑅)).
   We have proved that maximally specific rules of M2 solve the problem of statistical ambiguity
[Vityaev 2006], [Vityaev et al. 2015].
   Example 3. A rule (𝑚3 → 𝑚2 ) can be specified to the rule (¬𝑚1 ∧ 𝑚3 → 𝑚2 ), because of
𝜂(𝑚3 → 𝑚2 ) = 65 < 1 = 33 = 𝜂(¬𝑚1 ∧ 𝑚3 → 𝑚2 ). The last rule meets all the requirements of
class M2 , and therefore (¬𝑚1 ∧ 𝑚3 → 𝑚2 ) ∈ 𝑀2 (𝑚2 ).
   The next class Imp contains all precise implication and corresponds to the set 𝐼 𝑚𝑝(𝐾 ) of
definition 4.
   Definition 19. 𝑅 ∈ 𝐼 𝑚𝑝(𝐶) ⇔ (𝑅 → = 𝐶).
  Definition 20. Define classes M1 , M2 , Imp by joining over all literals

                𝑀1 =        ∪      𝑀1 (𝐶); 𝑀2 =      ∪        𝑀2 (𝐶); 𝐼 𝑚𝑝 =        ∪      𝐼 𝑚𝑝(𝐶).
                       𝐶∈𝐿𝑖𝑡(𝐾 )                  𝐶∈𝐿𝑖𝑡(𝐾 )                    𝐶∈𝐿𝑖𝑡(𝐾 )

  Definition 21. A set of rules  will be called exact, if 𝐼 𝑚𝑝 ⊂ .
  Definition 22. By the system of rules we will call any subset  ⊆ 𝑀2 .
  A set of literals L is consistent if it does not simultaneously contain some atom C and its
negation ¬𝐶.
  Theorem 2. (Joint predictions [Vityaev et al. 2015]). If L is joint, then Π (𝐿) is also joint
and consistent for any system of rules  ⊆ 𝑀2 .


8. Probabilistic formal concepts
Fix-points of the prediction operator are obvious candidates for the role of the content of proba-
bilistic concepts. To determine the volume of notions let’s refer to [Kuznetsov 2007],[Buzmakov
2014]. The idea is to use all possible prototypes of the closing operator, i.e. all sets of M such
that Π (𝑀) = 𝐵 to compose their derivatives into a single volume of notion A. This will allow
to restore the initial correspondence of objects-concept using the prediction operator and also
to include all objects of the same class into a single volume of the notion.
   Definition 23. A probabilistic formal concept on the context K is a pair (A,B) that meets the
conditions:
                           Π (𝐵) = 𝐵, 𝐴 = ∪ 𝐶 ′ ,  ⊆ 𝑀2 (𝑑𝑒𝑓 . 22).
                                                  Π (𝐶)=𝐵
.
Example 4. Let’s consider a K𝑠𝑞𝑢𝑎𝑟𝑒𝑠 context consisting of two non-intersecting squares that
give two independent formal concepts. We will change the values of some cells to make the
task a bit more complicated.

Table 4
Context of Ksquares with 2 notions and superimposed noise.

                       G        m1    m2    m3        m4       m5     m6       m7       m8
                       g1       1     1     1         1        1      0        0        0
                       g2       1     1     1         1        0      0        0        0
                       g3       1     1     1         1        0      0        0        0
                       g4       1     1     1         1        0      0        0        0
                       g5       0     0     0         0        1      1        1        1
                       g6       0     0     0         0        1      1        0        1
                       g7       0     0     0         0        1      1        1        1
                       g8       0     0     0         0        1      1        1        1


  Note that the most specific rules of M2 will be (𝑚𝑖=1…4 → ¬𝑚𝑗=5…8 ), but not the rule 𝑚5 →
¬𝑚𝑖=1…4 for which you can find the following clarification 𝜂(𝑚5 ∧ 𝑚6 → ¬𝑚1 ) = 1 > 45 =
𝜂(𝑚5 ∧ → ¬𝑚1 ). This example shows how the noise is processed and invested in probability in
the process of rule refinement.
   To calculate a concept containing the feature m1 , we first calculate its closing Π ({𝑚1 }) =
{𝑚1 , 𝑚2 , 𝑚3 , 𝑚4 , ¬𝑚5 , ¬𝑚6 , ¬𝑚7 , ¬𝑚8 }. Then all the objects whose closing turns out to be ex-
actly the same are gathered in a single volume of the concept {g2 , g3 , g4 }.
   To distinguish probabilistic formal concepts from classical formal concepts on the context
K, we will call the latter simply formal concepts. The definition of set A is also based on the
following theorem linking probabilistic and simple formal concepts on context K.
   Theorem 3. Let K be a formal context.

   1. If (A,B) is a formal concept in K, then there is a probabilistic formal concept (N,M) in the
      same context, such as 𝐴 ⊆ 𝑁 , .
   2. If (N,M) is a probabilistic formal concept on K, then there is a family  of formal concepts
      in the same context, such that

                                        ∀(𝐴, 𝐵) ∈  (Π (𝐵) = 𝑀),
                                                 𝑁 = ∪ 𝐴
                                                      (𝐴,𝐵)∈


9. Statistical method for detection of probabilistic formal
   concepts
The procedure for obtaining all M2 rules by exhaustive search is exponential, because the
𝑂(2|𝑀| ) candidate rules (each atom or its negation can be included in the rule) are evaluated
at |G|. Of course, this estimation is rather rough and the procedure can be improved. Instead
a more radical simplification of the problem by finding a statistical approximation of M2 -rules
will be defined.
   Definition 24. R is a probabilistic law, if (𝑅̃ ⊏𝑅) ⇒ (𝑅̃ < 𝑅).
   Definition 25. A rule 𝑅̃ is semantically derived from the rule R, we denote as 𝑅 ⊳ 𝑅̃ , if R and
𝑅̃ – probabilistic law and 𝑅̃ > 𝑅.
   Definition 26. The probabilistic law R will be called the strongest if there is no probabilistic
law 𝑅̃ such as (𝑅̃ > 𝑅).
   Definition 27. We define the Semantic Probabilistic Inference (SPI) of some strongest prob-
abilistic law 𝑅𝑚 as such a sequence of probabilistic laws 𝑅0 ⊳ 𝑅1 ⊳ 𝑅2 . ⊳ 𝑅𝑚 that 𝑅0← = ∅ and 𝑅𝑚
is the strongest probabilistic law.
   Let us denote by  the set of all the strongest probabilistic laws obtained by various semantic
probabilistic inferences. Specification check R1 > R2 requires an inequality 𝜂(𝑅1 ) > 𝜂(𝑅2 ) check.
   In general, from a practical point of view, we cannot assume that a probability measure is
defined as in definition 9, so we will consider it unknown. The probabilistic inequalities used
in the SPI definition we will determine using some statistical criterion, e.g. the exact Fisher
independence criterion with a confidence level 𝛼. The resulting set of rules 𝛼 will already
make contradicting predictions. Therefore, to approximate the prediction operator Π (𝐿), it
is necessary to introduce an additional criterion of mutual consistency of predictions by rules
𝛼 on the set L.
   Definition 29. The rule 𝑅 ∈ 𝛼 is confirmed on the set of letters L, if 𝑅 ← ⊂ 𝐿 and 𝑅 → ∈ 𝐿.
In this case, we write that 𝑅 ∈ 𝑆𝑎𝑡(𝐿) ⊆ 𝛼 .
   Definition 30. The rule 𝑅 ∈ 𝛼 is refuted on the set of letters L, if 𝑅 ← ⊂ 𝐿 and 𝑅 → ∈ ¬𝐿. In
this case, we write that 𝑅 ∈ 𝐹 𝑎𝑙(𝐿) ⊆ 𝛼 .
   The difference between the quantity and quality of rules confirmed and refuted on the
concept-hypothesis L is the determinant for adding or deleting literals to the content of the
concept.
   Definition 31. The consistency criterion for the set L is the value

                                𝐼 𝑛𝑡(𝐿) = ∑ 𝛾 (𝑅) − ∑ 𝛾 (𝑅).
                                          𝑅∈𝑆𝑎𝑡(𝐿)       𝑅∈𝐹 𝑎𝑙(𝐿)

  The choice of rule evaluation 𝛾 may depend on additional task specifics. In the experiments
below, we are guided by Shannon’s considerations as

                                      𝛾 (𝑅) = −𝑙𝑜𝑔(1 + 𝜖 − 𝜂(𝑅)).

  Definition 32. Define a consistent prediction operator Υ(𝐿) that changes a set of literals L
by one element so as to strictly increase the consistency criterion:

   1. For all 𝐺 ∈ 𝐿𝐾 ⧵𝐿 calculate the effect of adding G: △+ = 𝐼 𝑛𝑡(𝐿 ∪ {𝐺}) − 𝐼 𝑛𝑡(𝐿);
   2. For all 𝐺 ∈ 𝐿 calculate the effect of removing G: △− = 𝐼 𝑛𝑡(𝐿⧵{𝐺}) − 𝐼 𝑛𝑡(𝐿);
   3. The operator Υ(𝐿) adds a letter G to the L set if △+ > 0 and △+ >△− ; the operator delete a
      letter G from L if △− > 0 and △− >△+ . If △− = △+ and △− > 0, the operator will delete literal
      G;
   4. If △+ ≤ 0 and △− ≤ 0, the operator returns L and we have a fix-point.

   Definition 33. A probabilistic formal concept discovered by a statistical method is a proba-
bilistic formal concept in which 𝛼 is a set of all strongest probabilistic laws (def. 26) obtained
by various semantic probabilistic inferences with a confidence level 𝛼 in the context K.


10. Experiments in the detection of probabilistic formal
    concepts by statistical method
Let us illustrate the detection of probabilistic formal concepts by statistical method on the ex-
ample of classes (concepts) detection of digits. Let’s encode the digits with 24 features, located
as shown in fig. 1. Each feature has 7 values (see fig. 2). Some lines of numbers are encoded
twice, for example, the vertical line at number 4 and 1 is encoded both by the vertical right line
(feature 3) in cells 3,7,11,15,19,23 and by the vertical left line (feature 5) in cells 4,8,12,16,20,24.
   Two experiments were conducted for two different contexts. The first contained 30 copies of
each digit (360 in total) randomly mixed together. The program does not know where the digit
is and it treats them as different images. The second context contained 10 copies of each of the
10 digits (without repeating sixes and nines, a second copy was taken) and 300 random images
in which each of the 24 features took a random value from the seven available. All numbers
and images were also randomly mixed. In each context a corresponding set 𝛼 of strongest
Figure 1: Digits encoding


Figure 2: Digits encoding


probabilistic laws with 𝛼 = 0.05 for the Fisher criterion were found. In the first experiment
66160 regularities with conditional probability greater than 0.9 and 𝛼= 0.05 were found. In the
second experiment 629494 regularities with conditional probability greater than 0.9 and 𝛼=
0.05 were found.
   In the first context, the method found as classes all 12 digits – they were fix-points, as well as
two fix-points for 6 and 9, which did not contain, respectively, signs 13 and 12, with which two
copies of digits differed. In the second context, the method identified 10 digits as fix-points.
   In both experiments, the stability of the probabilistic formal concepts found was tested with
regard to the lack of information on any feature of a digit. It was checked that if any digit
described by the set of literals L removes one of the literals 𝑚𝑖 or ¬𝑚𝑖 , i.e., removes one of
the values of the feature, the digit will still be assigned to the same class and the unknown
feature will be correctly identified and added to L, i.e. Π (𝐿⧵𝑚𝑖 ) = 𝐿. This means that the
removed value of the feature will be correctly predicted by the regularities from 𝛼 without
contradictions on the part of the logical relationship of this feature with other features. In the
first experiment, for all 360 digits, a feature was accidentally removed and then restored by the
prediction operator. In the second experiment, the removed features were restored in 96% of
cases.
Acknowledgments
The publication of this article was supported by the project «Investigation, analysis and com-
plex independent expertise of projects of the National technological initiatives, including the
accompanying of projects of «road map» NeuroNet», which is executed within the framework
of the state assignment №. 28.12487.2018/12.1 of the Ministry of Science and Higher Educa-
tion of the Russian Federation. The study was also carried out within the framework of the
state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0002) regarding
theoretical results in Sections 6-9.


References
[Vityaev 1992] Vityaev E.E. Semantic approach to creation of knowledge bases. Se-mantic
          Probabilistic Conclusion of the Best Prediction PROLOGRAMS by Probabilistic Data
          Model. // Logic and Semantic Programming (Computing Systems, vol. 146), Novosi-
          birsk, 1992, pp. 19-49.
[Vityaev et al. 2014] Vityaev E.E., Neupokoev N.V. Formal model of perception and image as
          fix-point of anticipation. Approaches to Thinking Modeling. (Collected under the
          editorship of V.G. Redko, Ph.D.-m.D.). URSS Editorial-Al, Moscow, 2014, pp. 155-172.
[Vityaev et al. 2015] Vityaev E.E., Martynovich V.V. Formalization of “natural" classification
          and systematics through fixed prediction points // Siberian Electronic Mathematical
          Reports, Vol. 12, Institute of Mathematics, Moscow, Russia. Siberian Electronic Math-
          ematical Reports, Volume 12, Sobolev Institute of Mathematics, Siberian Branch of
          the Russian Academy of Sciences, 2015. 1006-1031.
[Vityaev et al. 2012] Vityaev E.E., Demin A.V., Ponomariov D.K. Probabilistic generalization
          of formal concepts // Programming, T.38, #5, 2012, p.18-34.
[Smirnov 1985] Smirnov S.D. Patterns psychology. Moscow: Moscow State University, 1985,
          231s.
[Naiser 1981] Naiser U. Knowledge and reality. M.: “Progress", 1981, 229 c.
[Mill 2011] Mill J.S. Logic system of syllogistic and inductive, 2011, Prel. and adj. V.C. Finn,
          M.: Lenand, 832 s.
[Rutkovsky 1884] Rutkovsky L. Elementary textbook of logic. St. Petersburg, 1884.
[Mayen et al. 1976] Meyen S.V., Schreider S.A. Methodological aspects of classification theory
          // Problems of philosophy, 1976, #12.
[Smirnov 1938] Smirnov E.S. Design of a kind of taxonomic point of view // Zool. Zhurn. Vol.
          17, No.3, 1938, Pp. 387-418.
[Zabrodin 1981] Zabrodin V.Yu. On criteria of natural classification // NTI, ser.2, 1981, #8.
[Cheng 1997] Cheng, P. From covariation to causation: A causal power theory. Psychological
          Review, 104, 367-405.
[Gopnik et al. 2004] Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T. A theory
          of causal learning in children: Causal maps and Bayes nets. Psychological Review,
          111, 3-23.
[Griffiths et al. 2009] Griffiths, T. L., Tenenbaum, J. B. Theory-based causal induction. Psycho-
           logical Review, 116, 56.
[Rehder et al. 2001] Rehder, B., Hastie, R. Causal knowledge and categories: The effects of
           causal beliefs on categorization, induction, and similarity. Journal of Experimental
           Psychology: General, 130, 323–360.
[Rehder 2003a] Bob Rehder. Categorization as causal reasoning // Cognitive Science 27 (2003)
           709–748.
[Rehder 2003b] Rehder, B. A causal-model theory of conceptual representation and catego-
           rization. J. of Exper. Psych.: Learning, Memory, and Cognition, 29, 1141-1159.
[Rehder et al. 2011] Bob Rehder, Jay B. Martin. Towards A Generative Model of Causal Cycles
           // 33rd Annual Meeting of the Cognitive Science Society 2011, (CogSci 2011), Boston,
           Massachusetts, USA, 20-23 July 2011, V.1 pp. 2944-2949.
[Rosch 1973] Rosch, E.H. Natural categories // Cognitive Psychology 4. 1973. P. 328-350.
[Rosch et al. 1975] Rosch, E., Mervis, C.B. Family resemblances. Studies in the internal struc-
           ture of categories // Cognitive Psychology, 7. P. 573–605.
[Rosch et al. 1976] Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., Boyes-Braem, P. Basic
           objects in natural categories // Cognitive Psychology, 8. P. 382–439.
[Rosch et al. 1978] Rosch, Eleanor and Lloyd, Barbara B. (eds) Cognition and categorization.
           Hillsdale, NJ: Lawrence Erlbaum, 1978, P. 27-48.
[Rosch 1978] Rosch, E. Principles of Categorization // Rosch, E. & Lloyd, B.B. (eds), Cognition
           and Categorization, Lawrence Erlbaum Associates, Publishers, (Hillsdale), 1978. P.
           27–48
[Ross et al. 2008] B. H. Ross, E. G. Taylor, E. L. Middleton, and T. J. Nokes. Concept and Cate-
           gory Learning in Humans // H. L. Roediger, III (Ed.), Cognitive Psychology of Mem-
           ory. Vol. [2] of Learning and Memory: A Comprehensive Reference, 4 vols. (J.Byrne
           Editor), Oxford: Elsevier, 2008, P. 535-556.
[The Nature of Classification 2013] The Nature of Classification. Relationships and Kinds in
           the Natural Sciences. Palgrave Macmillan. 2013. 208p.
[Ganter 2003] Ganter, B. Formal Concept Analysis: Methods, and Applications in Computer
           Science. TU Dresden (2003).
[Ganter et al. 1999] Ganter, B., Wille, R. Formal concept analysis – Mathematical Foundations.
           Berlin-Heidelberg-New York, Springer (1999)
[Ganter et al. 2004] Ganter, B., Obiedkov, S. Implications in Triadic Formal Contexts. TU Dres-
           den, Springer (2004)
[Klimushkin et al. 2010] Klimushkin, M., Obiedkov, S., Roth, C. Approaches to the Selection
           of Relevant Concepts in the Case of Noisy Data. In: ICFCA 2010, LNAI 5986, 255-266.
           (2010)
[Frequent Itemset Mining 2004] Bayardo Jr., R., Goethals, B., Zaki, M. (eds.) Proc. of the IEEE
           ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004). CEUR-
           WS.org (2004)
[Kuznetsov 2007] Kuznetsov, S.O. On Stability of a Formal Concept. Annals of Mathematics
           and Artificial Intelligence. 49, 101-115 (2007)
[Buzmakov 2014] Buzmakov, A., Kuznetsov, S., Napoli, A. Concept Stability as a Tool for Pat-
           tern Selection. CEUR Workshop proceedings, vol. 1257, ECAI 2014, pp. 51-58 (2014)
[Emilion et al. 2009] Emilion R., Levy G. Size of random Galois lattices. Discrete Applied Math.
          J. 157, 2945-2957 (2009)
[Speransky 2013] Speransky, S.O. Logic of probability and probability logic. Novosibirsk State
          University, Novosibirsk, PhD thesis. (2013)
[Vityaev 2006] Vityaev, E.E. The logic of prediction. In: Proceedings of the 9th Asian Logic
          Conference Mathematical Logic in Asia, Novosibirsk, Russia, August 16–19, 2005, pp.
          263–276. World Scientific, Singapore (2006)
[Vityaev et al. 2015] Vityaev, E.E., Martynovich, V.V. Probabilistic Formal Concepts with
          Negation. Perspectives of System Informatics. A. Voronkov, I. Virbitskaite (Eds.),
          LNCS vol. 8974, pp. 385-399 (2015)
[Martynovich et al. 2016] Vitaliy V. Martynovich, Euvgeniy E. Vityaev. Recovering Noisy Con-
          texts with Probabilistic Formal Concepts // Proceedings of the 2nd International
          Workshop on Soft Computing Applications and Knowledge Discovery (SCAKD
          2016), Moscow, Russia, July 18, 2016. CEUR Workshop Proceedings, v. Vol-1687, pp.
          24-35.

</pre>