1. Introduction

Formalization of “Natural” Classification and “Natural” Concepts by Probabilistic Generalization of Formal Concepts Analysis

Evgenii E. Vityaev

Vladislav Degtiarev

Bayar Pak

Yuri Meister

0 0 Novosibirsk State University , Novosibirsk , Russia 1 Sobolev Institute of Mathematic , Novosibirsk , Russia

2020

In the previous works, a probabilistic generalization of the formal concepts analysis was developed. This generalization is induced by the problem of formal concepts determining under noise conditions, when the lattice of formal concepts exponentially grows. In this paper, probabilistic formal concepts with negation are determined, as well as a statistical method for detecting these probabilistic formal concepts. The purpose of this paper is to show that probabilistic formal concepts have a deeper meaning. It is argued that probabilistic formal concepts formalize the “natural” concepts described in cognitive sciences by “causal models”, which are characterized by a highly correlated structure of attributes. The same structure is specific for the “natural" classification of objects of the external world. The definition of “natural" classification given by J. Stuart Mill is fairly accurately formalized by probabilistic formal concepts.

1. Introduction

In previous works [Vityaev et al. 2012], [Vityaev et al. 2015], [Martynovich et al. 2016], a probabilistic generalization of the Formal Concepts Analysis (FCA) was developed. This generalization solves the problem of the formal concepts discovery in noise conditions associated with the exponential growth of the lattice of formal concepts in the presence of noise. This issue is discussed in a more detail in section 2.

Probabilistic generalization of formal concepts was obtained by generalizing the definition of formal concepts as fix-points of implication. For this purpose, the probabilistic measure on the objects of context and the most specific probabilistic implication were defined. Consistency of fix-points based on maximally specific probabilistic implications is proved and then probabilistic formal concepts are defined as fix-points of predictions based on maximally specific probabilistic implications. Figuratively speaking, the obtained probabilistic formal concepts not display data in all details as in photographs, but give an “idealized" representation of the data.

In this work, probabilistic formal concepts with negation are defined, as well as a statistical method of detecting these probabilistic formal concepts based on the works [Vityaev et al. 2014], [Vityaev et al. 2015], [Martynovich et al. 2016].

The purpose of this work is to show that probabilistic formal concepts have deeper meaning than their restoration in noise conditions. In [Vityaev et al. 2014], it was shown that the fixpoints of anticipation underlying probabilistic formal concepts model the process of perception in exact accordance with the psychology of perception described in [Smirnov 1985], [Naiser 1981].

In addition, this paper shows that probabilistic formal concepts simulate “natural" concepts explored in cognitive sciences. “Natural" concepts are concepts about directly perceived objects, not abstract objects like production, plant and object. The following section, based on research in cognitive sciences, provides principles of categorization and description of “natural" concepts. It follows that “natural" concepts reflect a highly correlated structure of features of objects of the external world, which can be represented by “causal models". It is argued that causal models are rather precisely modeled by probabilistic formal concepts.

The highly correlated structure of features of objects of the external world is also manifested in the “natural" classification (see section 3). J. Stuart Mill has observed that “natural" classes of animals or plants are described by a potentially infinite number of properties [Mill 2011]. The naturalists who built the “natural" classifications also noted that the more similar the compared objects are in the greater number of significant features, the more likely their similarity in other features [Rutkovsky 1884], and that the construction of the “natural" classification consists in “indication" - from the infinite number of features we need to move to a limited number of features that would replace all other features [Smirnov 1938]. This means that in the “natural" classes the features are strongly correlated, for example, if classes 128 and features are binary, the independent “indicator" features among them will be only 7 features, because 27 =128, and the remaining features, which can be tens and hundreds (denoted by N) can be predicted by these 7 features, which means the presence of N-7 regularities. Since the “indicator" features can be diferent 7 features from N, the choice of which is equal to 7 , the number of regularities that predict non-indicator features based on the indicator features is not much less than ( − 7) ∙ 7 . Highly correlated structure of features is well detected by the probabilistic formal concepts; moreover, the following statement by J.S. Mill almost precisely defines the meaning of probabilistic formal concepts: “Natural groups ... are defined by features ... However, at the same time not only features which are certainly common for all the objects included into the group but the whole set of those features is taken into account, of which all are found in the majority of these objects and the majority - in all" [Mill 2011].

2. “Natural" concepts

In the works of Eleanor Rosch [Rosch 1973, 1978], [Rosch et al. 1975, 1976, 1978] the principles of categorization of “natural" categories [Rosch 1973, 1978] are formulated on the basis of the conducted experiments. Directly perceived objects (basic objects) – information rich bundles of observed and functional properties that form a natural partition that creates a categorization. These bundles form “prototypes" of objects classes: Categories can be viewed in terms of their clear cases if the perceiver places emphasis on the correlational structure of perceived attributes . . . By prototypes of categories we have generally meant the clearest cases of category membership [Rosch et al. 1978], [Rosch 1978]. Rosch and Mervis (1975) have shown that the more prototypical of a category a member is rated, the more attributes it has in common with other members of the category and the fewer attributes in common with members of the contrasting categories [Rosch et al. 1975, 1976].

Later on, the theory of “natural" concepts of Eleanor Rosch was called a “prototype” theory. Its main features are described in the following way: The prototype view (or probabilistic view) keeps the attractive assumption that there is some underlying common set of features for category members but relaxes the requirement that every member have all the features [Ross et al. 2008].

Further research has found that models based on features, similarities and prototypes are not suficient to describe classes. Theoretical, causal and ontological knowledge relating to class objects must be taken into account. For example, people not only know that birds have wings, can fly and nest in trees, but also that birds nest in trees because they can fly and fly because they have wings.

Based on this research, Bob Rehder, in his work “Categorization as causal reasoning", formulated a causal-model theory: people’s intuitive theories about categories of objects consist of a model of the category in which both a category’s features and the causal mechanisms among those features are explicitly represented [Rehder 2003a].

In the theory of causal models, the relationship of an object to a category is no longer based on a set of features and proximity based on features, but on the similarity of the generating causal mechanism: Specifically, a to-be-classified object is considered a category member to the extent that its features were likely to have been generated by the category’s causal laws, such that combinations of features that are likely to be produced by a category’s causal mechanisms are viewed as good category members and those unlikely to be produced by those mechanisms are viewed as poor category members [Rehder 2003a].

Some researchers have used Bayesian networks or causal models to represent causal knowledge [Cheng 1997], [Gopnik et al. 2004], [Grifiths et al. 2009]. However, these models cannot model cyclical causal relationships because Bayesian networks do not support cycles.

In [Rehder et al. 2011], Bob Rehder proposed a model of causal cycles based on the “disclosure" of causal graphical models (CGMs). The “disclosure" is done by creating a Bayesian network that deploys over time. Such a network is already quite well suited for modeling “natural" concepts reflecting the highly correlated structure of the outside world. However, it does not explicitly formalize causal link cycles.

The model we propose further is directly based on cyclic causal links represented by probabilistic formal concepts.

3. “Natural" classification

The first suficiently detailed philosophical analysis of the “natural" classification belongs to J.St. Mill [Mill 2011]. This analysis revealed all the main properties of the “natural" classifications, which were later confirmed by naturalists who built the “natural" classifications.

According to J.St. Mill [Mill 2011], “artificial" classifications difer from “natural" classifications in that they can be based on any one or more attributes, so that diferent classes difer only in that they include objects with diferent values of these attributes. But if we consider classes of “animals" or “plants", they difer in such a large (potentially infinite) number of properties that they cannot be enumerated" [Mill 2011].

J.St. Mill gives the following definition of the “natural" classification: it is such a classification which unites objects into groups concerning which it is possible to express the greatest number of general propositions and it is based on such properties which serve as reasons for many others or at least make up their true attributes. It also defines the notion of the “image" of the class, which is the forerunner of “natural" concepts: “our concept of class – the image by which this class is represented in our mind – is the concept of a certain specimen that possesses all the features of this class ... to the highest degree".

The properties of “natural" classes according to J. St. Mill were confirmed by naturalists. For example, Wavell W.: “The more general statements about objects give an opportunity to make classification, the more natural it is" [Mayen et al. 1976]. L. Rutkovsky writes about the inexhaustible number of general properties of “natural" classes. [Rutkovsky 1884]: “The greater the number of essential features similar to comparable objects, the more likely their similarity in other respects". A similar statement is made by E.S. Smirnov: “The taxonomic problem lies in ’indication’: from an infinite number of features we need to move to a limited number of them, which would replace all other features" [Smirnov 1985].

As a result, the problem of defining “natural" classifications has been formulated and is still discussed in the literature [Zabrodin 1981], [The Nature of Classification 2013]. However, from our point of view, the formalization of the “natural" classification is not yet suficiently adequate.

4. The problem of formal concepts discovery in noise conditions

Let us consider the known problem of formal concepts analysis [Ganter 2003]: standard methods of concept lattice construction on noisy data turn out to be unproductive, because they are designed for accurate data, and in the presence of noise (data distortions) the concept lattice turns out to be filled with noisy clones of initial concept classes [Klimushkin et al. 2010], [Frequent Itemset Mining 2004]. Attempts to solve this problem “on the forehead" do not look promising [Klimushkin et al. 2010], [Kuznetsov 2007]. The main idea of these works is to take a full lattice of concepts, and get rid of noise, “side” concepts by filtering. A widespread evaluation of the “side” concept is the Stability Index [Kuznetsov 2007], [Buzmakov 2014]. The Stability Index and other approaches based on the filtration of lattices have significant disadvantage – the filtered lattices of concepts often have exponential sizes [Emilion et al. 2009], which is unacceptable for large contexts. from [Ganter et al. 2004]. attribute. and attributes, and ⊆

5. Basics of formal concepts analysis

Let’s start with the main definitions of formal concepts analysis. The majority of definitions here start with classical works on FCA [Ganter 2003], [Ganter et al. 1999], others are taken

Definition 1 . Formal context is a triple (, ,

), where and

– arbitrary sets of objects – binary relation, expressing the belonging of an object to an In formal context, the operators of the derivative play a key role.

Definition 2 . ⊆ , ⊆

. Then: 1. ↑ = { ∈ | ∀ ∈ , (, 2. ↓ = { ∈ | ∀ ∈ , (, ) ∈ } ) ∈ }

Operators of a derivative link a subset of objects and attributes of a context. In the future we will also refer to both operators as a derivative with a single symbol ′.

Definition 3. Definition 4. An implication is a pair of attribute sets (,

A pair (, ) – is a formal concept, if ↑ = and ↓ = .

), , ⊆ → . Implication

→ is truth on the context A set of all true implications on context K well be denoted as = (, , all possible conclusions of implications to the original set of attributes X:

Definition 5. For any set L of implications we define a direct inference operator that add that is recorded as ), if ∀ ∈ (⊈ ′ ∨ ⊆ ′ . ) ( ) = ∪ { | ⊆ , → ∈ }

Theorem 1. [Ganter et al. 1999] For any subset ⊆

Let us illustrate Theorem 1 in the following simple context.

, ( )( ) = ⇔ ′′ = . [Ganter et al. 1999] Very simple formal context K0 m

1 1 1 0 0 m 1 1 1 0 2 m

3 0 1 1 0

We can reformulate the task of constructing a concept lattice in terms of implication and output operator. Looking at Table 1, it is not dificult to notice that m and m attributes completely define the object class. In fact, m 1 infer m2, and the same is true for m , from these 3 it follows that 1 → 2 and 3 → 2. The sets { 1, 2}, { 2, 3} and { 1, 2, 3} will be formal concepts. According to the theorem, they will also be fix-points of the direct inference 1 3 operator. For example, ( )

( ) { 1} −−−−−→ { 1, 2} −−−−−→ { 1, 2}.

It is therefore { 1, 2} both a fix-point and a formal concept.

Now let’s add some noise to the original context K0 from Table 1. In order to keep the noise level at a fairly low level, let’s also expand the context by adding duplicates of objects:

Every small change made to the context will have a dramatic impact on the conceptual lattice. The first context in Table 2 is equivalent to the original context K 0 and generates the same conceptual lattice. However, the lattice with noise of the second context in Table 3 will contain some fictitious notions {m 2} and {m1, m3} generated by noise, which will also be formal notions. The number of fictitious notions will increase with the noise level, and the qualitative estimation of speed is exponential [Emilion et al. 2009].

6. Probabilistic logic in a formal context

To avoid the exponential growth of the lattice, a stability index is introduced [Klimushkin et al. 2010], [Kuznetsov 2007],[Buzmakov 2014] to assess the degree of stability of the concept and to select only sustainable concepts.

We will present a fundamentally diferent way of the lattice construction, based on probabilistic implications. For this purpose, we redefine the context within the framework of logic. Further we will consider only finite contexts.

Definition 6. For the finite context = (, , ) we define a signature Ω that contains predicate symbols m(x) for each ∈ . For the signature Ω and the K context, considered as a model, we define the interpretation of predicate symbols as follows ⊧ ( ) ⇔ (, ) ∈ .

Definition 7. For the signature Ω let’s define the following variant of the first order logic: 1. – set of variables; 2. – set of atomic formulas (atoms) ( ) where ∈ Ω and ∈ ; 3. – set of literals, including all atoms ( ) and their negations ¬ ( ); 4. Φ – set of formulas defined inductively: every atom is a formula, for any Φ, Ψ ∈ Φ following syntactic constructions Φ ∧ Ψ, Φ ∨ Ψ, Φ → Ψ, ¬Φ are formulas as well.

For a set of literals ⊆ we define their conjunctions as ∧ = ∧ . In the same way

∈ ¬ = {¬ | ∈ }.

Definition 8. A single element set { }, together with signature Ω predicates, forms a model of this object. The fact of truth of a formula on an object model is written as ⊧ ⇔ ⊧ .

Definition 9. Let us define a probabilistic measure on the set G, in the Kolmogorov sense. Then we can define a probabilistic measure on the set of formulas as:

∶ Φ → [0, 1], ( ) = ({ | ⊧ }).

Let us assume that the context has no statistically insignificant objects such as ({ }) = 0, ∈ .

Definition 10. The set of literals M will be called -joint if (∧ ) > 0.

In the future, the compatibility of the set of literals will be considered within a probabilistic measure of a context, and in the absence of ambiguity the symbol of the measure will be omitted.

Let us consider some subset of atoms ⊆ ( ). The formula 1 ∧ 2 … ∧ → = ∧{ } → defines implication ({ } , { }) on the context. The presence of a probabilistic measure allows to determine the probabilistic implication.

Let’s define the rules on the context.

Definition 11. Let { 1, 2, … , , } ∈ , ∉ { 1, 2, … }, ≥ 0. 1. The rule = ( 1, 2., → ) is called implication ( 1 ∧ 2 … ∧ → ); 2. The premise ← of rule R is a set of literals { 1, 2,..., }; 3. The conclusion of the rule is → = ; 4. The length of the rule is called the premise power | ← ; | 5. If 1← = 2← and 1→ = 2→, then 1 = 2.

Definition 12. The probability of the rule R is the value ( ) = ( →| ←) = ( ← ∧ →) ( ←) .

If the rule denominator (∧ ←) is equal to 0, then the probability is not defined.

7. Rule classes and consistency of predictions

The following sections are a combination of works [Ganter et al. 1999], [Vityayev 1992], [Vityaev 2006], [Vityayev et al. 2012], [Vityaev et al. 2015], [Martynovich et al. 2016].

Definition 13. Let us define a prediction operator by a set of rules  as Π( ) = ∪ { | ∃ ∈  ∶ ← ⊆ , → = }.

Definition 14. A closure ̄ of the set of literals L is the smallest fix-point of a prediction operator containing L ̄ = Π( ̄ ) = Π( ) = ⋃ Π( ).

∞ ∈ℕ

Definition 15. Rule R1 is a subrule of rule R2 if 1→ = 2→, 1← ⊂ 2←. We denote this fact as 1⊏ 2.

Definition 16. Rule R1 specifies rule R 2, or R1 > R2 if 2⊏ 1 and ( 1) > ( 2).

Example 2. A rule 1 → 2 from the context (Table 3) has only one, unconditional sub-rule (∅ → 2)⊏( 1 → 2) but it is wrong that the first rule specifies the second (∅ → 2) = 180 = 54 = ( 1 → 2). However, ( 3 → 2) > (∅ → 2) since (∅ → 2) = 180 < 65 = ( 3 → 2).

To achieve consistency and resistance to noise, some restrictions on the rule classes used will be required [Vityayev 1992], [Speransky 2013].

Class M1 defined below contains only those rules whose conditional probability is strictly greater than the probability of an unconditional rule.

Definition 17. ∈ 1( ) ⇔ ( ) > ( →), → = , ← = ∅.

The next class M2 requires the property of maximum specificity – the impossibility to increase the conditional probability by refining the rule [Vityaev 2006].

Definition 18. ∈ 2( ) ⇔ ( ∈ 1( ))&(⊏ ̃ ⇒ ( ̃ ) ≤ ( )).

We have proved that maximally specific rules of M2 solve the problem of statistical ambiguity [Vityaev 2006], [Vityaev et al. 2015].

Example 3. A rule ( 3 → 2) can be specified to the rule (¬ 1 ∧ 3 → 2), because of ( 3 → 2) = 65 < 1 = 33 = (¬ 1 ∧ 3 → 2). The last rule meets all the requirements of class M2, and therefore (¬ 1 ∧ 3 → 2) ∈ 2( 2).

The next class Imp contains all precise implication and corresponds to the set ( ) of definition 4.

Definition 19. ∈ ( ) ⇔ ( → = ).

Definition 20. Define classes M 1, M2, Imp by joining over all literals

8. Probabilistic formal concepts

Fix-points of the prediction operator are obvious candidates for the role of the content of probabilistic concepts. To determine the volume of notions let’s refer to [Kuznetsov 2007],[Buzmakov 2014]. The idea is to use all possible prototypes of the closing operator, i.e. all sets of M such that Π( ) = to compose their derivatives into a single volume of notion A. This will allow to restore the initial correspondence of objects-concept using the prediction operator and also to include all objects of the same class into a single volume of the notion.

Definition 23. A probabilistic formal concept on the context K is a pair (A,B) that meets the conditions: Π( ) = , =

∪ Π( )= ′,  ⊆ 2 ( .

22). .

Example 4. Let’s consider a K context consisting of two non-intersecting squares that give two independent formal concepts. We will change the values of some cells to make the task a bit more complicated.

Note that the most specific rules of M 2 will be ( =1…4 → ¬ =5…8), but not the rule 5 → ¬ =1…4 for which you can find the following clarification ( 5 ∧ 6 → ¬ 1) = 1 > 54 = the process of rule refinement. ( 5∧ →

¬ 1). This example shows how the noise is processed and invested in probability in To calculate a concept containing the feature m1, we first calculate its closing Π({ 1}) = { 1, 2, actly the same are gathered in a single volume of the concept {g2, g3, g4}.

8}. Then all the objects whose closing turns out to be ex

To distinguish probabilistic formal concepts from classical formal concepts on the context K, we will call the latter simply formal concepts. The definition of set A is also based on the following theorem linking probabilistic and simple formal concepts on context K.

Theorem 3. Let K be a formal context. 1. If (A,B) is a formal concept in K, then there is a probabilistic formal concept (N,M) in the 2. If (N,M) is a probabilistic formal concept on K, then there is a family  of formal concepts

9. Statistical method for detection of probabilistic formal concepts

same context, such as ⊆ , in the same context, such that ̃ – probabilistic law and ̃ > . law ̃ such as ( ̃ > ).

Definition 27.

. 2 ∀(, ) ∈  (Π( ) = ), = (, ∪)∈ will be defined.

The procedure for obtaining all M rules by exhaustive search is exponential, because the (2| |) candidate rules (each atom or its negation can be included in the rule) are evaluated at |G|. Of course, this estimation is rather rough and the procedure can be improved. Instead a more radical simplification of the problem by finding a statistical approximation of M

2-rules

Definition 24. R is a probabilistic law, if ( ̃ ⊏ ) ⇒ ( ̃ < ).

Definition 25.

A rule ̃ is semantically derived from the rule R, we denote as ⊳ ̃ , if R and Definition 26. The probabilistic law R will be called the strongest if there is no probabilistic

We define the Semantic Probabilistic Inference (SPI) of some strongest probis the strongest probabilistic law. abilistic law

as such a sequence of probabilistic laws 0 ⊳ 1 ⊳ 2. ⊳ that 0← = ∅ and Let us denote by  the set of all the strongest probabilistic laws obtained by various semantic probabilistic inferences. Specification check R1 > R2 requires an inequality ( 1) > ( 2) check.

In general, from a practical point of view, we cannot assume that a probability measure is defined as in definition 9, so we will consider it unknown. The probabilistic inequalities used in the SPI definition we will determine using some statistical criterion, e.g. the exact Fisher independence criterion with a confidence level . The resulting set of rules  make contradicting predictions. Therefore, to approximate the prediction operator Π( ), it is necessary to introduce an additional criterion of mutual consistency of predictions by rules will already  on the set L.

In this case, we write that ∈ this case, we write that ∈ ( ) ⊆  .

Definition 29. The rule ∈  is confirmed on the set of letters L, if ← ⊂ and → ∈ .

Definition 30.

The rule ∈ 

is refuted on the set of letters L, if ← ⊂ and → ∈ ¬ . In

The diference between the quantity and quality of rules confirmed and refuted on the concept-hypothesis L is the determinant for adding or deleting literals to the content of the concept.

Definition 31. The consistency criterion for the set L is the value ( ) =

∑ ∈ ( ) ( ) −

∑ ∈ ( ) ( ). below, we are guided by Shannon’s considerations as

The choice of rule evaluation may depend on additional task specifics. In the experiments ( ) = − (1 + − ( )).

Definition 32.

Define a consistent prediction operator by one element so as to strictly increase the consistency criterion: Υ( ) that changes a set of literals L

G; 1. For all ∈ ⧵ calculate the efect of adding G: △+ = ( ∪ { }) − ( ); 2. For all ∈ calculate the efect of removing G: △−= ( ⧵{ }) − ( ); 3. The operator Υ( ) adds a letter G to the L set if △+> 0 and △+>△−; the operator delete a letter G from L if △−> 0 and △−>△+. If △−= △+ and △−> 0, the operator will delete literal 4. If △+≤ 0 and △−≤ 0, the operator returns L and we have a fix-point.

Definition 33. A probabilistic formal concept discovered by a statistical method is a probaby various semantic probabilistic inferences with a confidence level in the context K. bilistic formal concept in which 

is a set of all strongest probabilistic laws (def. 26) obtained 10. Experiments in the detection of probabilistic formal concepts by statistical method Let us illustrate the detection of probabilistic formal concepts by statistical method on the example of classes (concepts) detection of digits. Let’s encode the digits with 24 features, located as shown in fig. 1. Each feature has 7 values (see fig. 2). Some lines of numbers are encoded twice, for example, the vertical line at number 4 and 1 is encoded both by the vertical right line (feature 3) in cells 3,7,11,15,19,23 and by the vertical left line (feature 5) in cells 4,8,12,16,20,24.

Two experiments were conducted for two diferent contexts. The first contained 30 copies of each digit (360 in total) randomly mixed together. The program does not know where the digit is and it treats them as diferent images. The second context contained 10 copies of each of the 10 digits (without repeating sixes and nines, a second copy was taken) and 300 random images in which each of the 24 features took a random value from the seven available. All numbers and images were also randomly mixed. In each context a corresponding set  probabilistic laws with = 0.05 for the Fisher criterion were found. In the first experiment 66160 regularities with conditional probability greater than 0.9 and = 0.05 were found. In the second experiment 629494 regularities with conditional probability greater than 0.9 and =

In the first context, the method found as classes all 12 digits – they were fix-points, as well as two fix-points for 6 and 9, which did not contain, respectively, signs 13 and 12, with which two copies of digits difered. In the second context, the method identified 10 digits as fix-points.

In both experiments, the stability of the probabilistic formal concepts found was tested with regard to the lack of information on any feature of a digit. It was checked that if any digit described by the set of literals L removes one of the literals or ¬ , i.e., removes one of the values of the feature, the digit will still be assigned to the same class and the unknown feature will be correctly identified and added to L, i.e. removed value of the feature will be correctly predicted by the regularities from  without contradictions on the part of the logical relationship of this feature with other features. In the ifrst experiment, for all 360 digits, a feature was accidentally removed and then restored by the prediction operator. In the second experiment, the removed features were restored in 96% of Π( ⧵ ) = . This means that the cases.

Acknowledgments

The publication of this article was supported by the project «Investigation, analysis and complex independent expertise of projects of the National technological initiatives, including the accompanying of projects of «road map» NeuroNet», which is executed within the framework of the state assignment №. 28.12487.2018/12.1 of the Ministry of Science and Higher Education of the Russian Federation. The study was also carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project no. 0314-2019-0002) regarding theoretical results in Sections 6-9. [Grifiths et al. 2009] Grifiths, T. L., Tenenbaum, J. B. Theory-based causal induction. Psychological Review, 116, 56. [Rehder et al. 2001] Rehder, B., Hastie, R. Causal knowledge and categories: The efects of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130, 323–360. [Rehder 2003a] Bob Rehder. Categorization as causal reasoning // Cognitive Science 27 (2003) 709–748. [Rehder 2003b] Rehder, B. A causal-model theory of conceptual representation and categorization. J. of Exper. Psych.: Learning, Memory, and Cognition, 29, 1141-1159. [Rehder et al. 2011] Bob Rehder, Jay B. Martin. Towards A Generative Model of Causal Cycles // 33rd Annual Meeting of the Cognitive Science Society 2011, (CogSci 2011), Boston, Massachusetts, USA, 20-23 July 2011, V.1 pp. 2944-2949. [Rosch 1973] Rosch, E.H. Natural categories // Cognitive Psychology 4. 1973. P. 328-350. [Rosch et al. 1975] Rosch, E., Mervis, C.B. Family resemblances. Studies in the internal structure of categories // Cognitive Psychology, 7. P. 573–605. [Rosch et al. 1976] Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., Boyes-Braem, P. Basic objects in natural categories // Cognitive Psychology, 8. P. 382–439. [Rosch et al. 1978] Rosch, Eleanor and Lloyd, Barbara B. (eds) Cognition and categorization.

Hillsdale, NJ: Lawrence Erlbaum, 1978, P. 27-48. [Rosch 1978] Rosch, E. Principles of Categorization // Rosch, E. & Lloyd, B.B. (eds), Cognition and Categorization, Lawrence Erlbaum Associates, Publishers, (Hillsdale), 1978. P. 27–48 [Ross et al. 2008] B. H. Ross, E. G. Taylor, E. L. Middleton, and T. J. Nokes. Concept and Category Learning in Humans // H. L. Roediger, III (Ed.), Cognitive Psychology of Memory. Vol. [2] of Learning and Memory: A Comprehensive Reference, 4 vols. (J.Byrne Editor), Oxford: Elsevier, 2008, P. 535-556. [The Nature of Classification 2013] The Nature of Classification. Relationships and Kinds in the Natural Sciences. Palgrave Macmillan. 2013. 208p. [Ganter 2003] Ganter, B. Formal Concept Analysis: Methods, and Applications in Computer

Science. TU Dresden (2003). [Ganter et al. 1999] Ganter, B., Wille, R. Formal concept analysis – Mathematical Foundations.

Berlin-Heidelberg-New York, Springer (1999) [Ganter et al. 2004] Ganter, B., Obiedkov, S. Implications in Triadic Formal Contexts. TU Dresden, Springer (2004) [Klimushkin et al. 2010] Klimushkin, M., Obiedkov, S., Roth, C. Approaches to the Selection of Relevant Concepts in the Case of Noisy Data. In: ICFCA 2010, LNAI 5986, 255-266. (2010) [Frequent Itemset Mining 2004] Bayardo Jr., R., Goethals, B., Zaki, M. (eds.) Proc. of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004). CEURWS.org (2004) [Kuznetsov 2007] Kuznetsov, S.O. On Stability of a Formal Concept. Annals of Mathematics and Artificial Intelligence. 49, 101-115 (2007) [Buzmakov 2014] Buzmakov, A., Kuznetsov, S., Napoli, A. Concept Stability as a Tool for Pattern Selection. CEUR Workshop proceedings, vol. 1257, ECAI 2014, pp. 51-58 (2014) [Emilion et al. 2009] Emilion R., Levy G. Size of random Galois lattices. Discrete Applied Math.

J. 157, 2945-2957 (2009) [Speransky 2013] Speransky, S.O. Logic of probability and probability logic. Novosibirsk State

University, Novosibirsk, PhD thesis. (2013) [Vityaev 2006] Vityaev, E.E. The logic of prediction. In: Proceedings of the 9th Asian Logic Conference Mathematical Logic in Asia, Novosibirsk, Russia, August 16–19, 2005, pp. 263–276. World Scientific, Singapore (2006) [Vityaev et al. 2015] Vityaev, E.E., Martynovich, V.V. Probabilistic Formal Concepts with Negation. Perspectives of System Informatics. A. Voronkov, I. Virbitskaite (Eds.), LNCS vol. 8974, pp. 385-399 (2015) [Martynovich et al. 2016] Vitaliy V. Martynovich, Euvgeniy E. Vityaev. Recovering Noisy Contexts with Probabilistic Formal Concepts // Proceedings of the 2nd International Workshop on Soft Computing Applications and Knowledge Discovery (SCAKD 2016), Moscow, Russia, July 18, 2016. CEUR Workshop Proceedings, v. Vol-1687, pp. 24-35.

[Vityaev 1992] Vityaev

E.E.

Semantic approach to creation of knowledge bases. Se-mantic Probabilistic Conclusion of the Best Prediction PROLOGRAMS by Probabilistic Data Model. // Logic and Semantic Programming (Computing Systems , vol. 146 ), Novosibirsk, 1992 , pp. 19 - 49 .

[Vityaev et al. 2014 ] Vityaev

E.E.

, Neupokoev

N.V.

Formal model of perception and image as ifx-point of anticipation . Approaches to Thinking Modeling. (Collected under the editorship of V.G . Redko, Ph .D.-m.D.). URSS Editorial-Al, Moscow, 2014 , pp. 155 - 172 .

[Vityaev et al. 2015 ] Vityaev

E.E.

, Martynovich

V.V.

Formalization of “natural" classification and systematics through fixed prediction points //

Siberian Electronic Mathematical Reports , Vol. 12 , Institute of Mathematics, Moscow, Russia. Siberian Electronic Mathematical Reports , Volume 12 , Sobolev

Institute

of Mathematics, Siberian Branch of the Russian Academy of Sciences , 2015 . 1006 - 1031 .

[Vityaev et al. 2012 ] Vityaev

E.E.

, Demin

A.V.

, Ponomariov D.K . Probabilistic generalization of formal concepts // Programming , T. 38 , # 5 , 2012 , p. 18 - 34 .

[Smirnov 1985] Smirnov S.D. Patterns psychology . Moscow: Moscow State University, 1985 , 231s .

[Naiser 1981] Naiser U. Knowledge and reality . M.: “ Progress" , 1981 , 229 c.

[Mill 2011] Mill J.S. Logic system of syllogistic and inductive, 2011, Prel . and

adj. V.C.

Finn , M. : Lenand, 832 s.

[Rutkovsky 1884] Rutkovsky L. Elementary textbook of logic . St. Petersburg , 1884 .

[Mayen et al. 1976] Meyen

S.V.

, Schreider S .A. Methodological aspects of classification theory // Problems of philosophy, 1976 , # 12 .

[Smirnov 1938] Smirnov

E.S.

Design of a kind of taxonomic point of view // Zool . Zhurn. Vol. 17 , No. 3 , 1938 , Pp. 387 - 418 .

[Zabrodin 1981] Zabrodin

Yu . On criteria of natural classification // NTI, ser.2 , 1981 , # 8 .

[Cheng 1997] Cheng, P. From covariation to causation: A causal power theory . Psychological Review , 104 , 367 - 405 .

[Gopnik et al. 2004] Gopnik , A. , Glymour , C. , Sobel , D. M. , Schulz , L. E. , Kushnir , T. A theory of causal learning in children: Causal maps and Bayes nets . Psychological Review , 111 , 3 - 23 .