=Paper= {{Paper |id=None |storemode=property |title=Classification Reasoning as a Model of Human Commonsense Reasoning |pdfUrl=https://ceur-ws.org/Vol-939/paper9.pdf |volume=Vol-939 |dblpUrl=https://dblp.org/rec/conf/ecai/Naidenova12 }} ==Classification Reasoning as a Model of Human Commonsense Reasoning== https://ceur-ws.org/Vol-939/paper9.pdf
    CLASSIFICATION REASONING AS A MODEL OF HUMAN
              COMMONSENSE REASONING

                                    Xenia A. Naidenova

              Military Medical Academy, Saint-Petersburg, Russian Federation

                                  ksennaidd@gmail.com



       Abstract. In this article, it is proposed to consider classification reasoning
       based on inducing and using implicative dependencies as a model of common-
       sense reasoning. The main concept of this reasoning is a good classification test
       considered as a formal concept of the FCA. The Galois lattice is used for con-
       structing good classification tests. Special rules are determined for constructing
       Galois lattices over a given context. All the operations of lattice construction
       take their interpretation in human mental acts.



       Keywords: Commonsense reasoning, Classification test, Machine Learning,
       Inductive-deductive reasoning, Formal Concept Analysis.


1      Introduction

   The symbolic methods of machine learning work on objects with symbolic, Boo-
lean, integer, and categorical attributes. From this point of view, these methods can be
considered as the methods of mining conceptual knowledge or the methods of con-
ceptual learning. Currently the theory of symbolic machine learning is not recognized
as a model of classification reasoning, although precisely this reasoning constitutes an
integral part of any mode of reasoning (1, 2). The sole exception to this is the DSM
method of hypothesis generation developed by V.K. Finn (3) and based on simulating
inductive reasoning rules revealed in human thinking by D. S. Mill (1). There is also a
tradition to consider induction separately from deduction. Classification task of min-
ing hypotheses distinguishing and describing classes of a given object classification
is conventionally solved separately from hypothesis’s application except the deduc-
tive-inductive integrated model developed by Zakrevskij (4) and based on representa-
tion of data and knowledge in Boolean space of attributes.
   However the role of classification in human reasoning is enormous. Classification,
as a process of thinking, performs the following global operations: 1) forming knowl-
edge and data contexts adequate to a current situation of reasoning; 2) reducing the
domain of the search for a solution of some problem; 3) generalizing or specifying
object descriptions; 4) interpreting logical expressions on a set of all thinkable ob-
jects; 5) revealing essential elements of reasoning (objects, attributes, values of attrib-
utes etc); 6) revealing the links of object sets and their descriptions with external
contexts interrelated with them. This list can be continued.
   Reasoning requires a lot of techniques related to increasing its efficiency such as
valuation, anticipation, making hypotheses, generalization and specification. One of
the important techniques is decomposition of the main problem into sub-problems. It
implies using the following operations: choosing sub-problems, ordering sub-
problems (ordering arguments, attributes, objects, variables, etc.), optimizing sub-
problem selection, and some others. The most familiar examples of sub-problem or-
dering are so called tree-like scanning and level-wise scanning methods. Some inter-
esting variations of selecting sub-problems are the choice of a more flexible sub-
problem, for example, one with minimal difference from a previous sub-problem and
a sub-problem with minimal possible number of new solutions. Intermediate results of
reasoning are used for decreasing or locally bounding the number of sub-problems.
   We limit our consideration of classification reasoning to a special class of logical
reasoning based on mining and using conceptual knowledge the elements of which
are objects, attributes (values of attributes), classifications (partitions of objects into
disjoint blocks), and links between them. If we take into account that implications
express relations between concepts (the object  the class, the object  the property,
the property  the class), we can assume that schemes of mining and applying impli-
cations form the core of classification processes, which, in turn, form the basis of
human commonsense reasoning.
   Our approach is based on the concept of a good diagnostic test (GDT) for a given
classification of objects (5). A good classification test has a dual nature: on the one
hand, it is a logical expression in the form of implication or functional dependency,
on the other hand, it generates the partition of a training set of objects equivalent to
the given classification of this set or the partition that is nearest to the given classifica-
tion with respect to the inclusion relation between partitions. Inferring good test al-
lows in principle mining from data not only structures of formal concepts but also
structures of classification ordered by the inclusion relation.
   Mathematical structure for GDTs’ construction is Galois’s lattice. The formal
model of classification as an algebraic lattice has been obtained in two independent
ways. One way goes back to the work of great psychologist J. Piaget who introduced
the concept of grouping (2) to explain methods of object classification developed by
7-11 years children. In this book, a conception of classification is given based on
mutually coordinated operations on objects, classes of objects, and properties of ob-
jects.
   The coordinated classification operations generate logical implicative assertions.
The classification operations are connected with understanding the operations of
quantification: “not all c are a”, “all b are c”, “no b are c”, “some c are b”, “some b
are not a” and so on. The violation of the coordinated classification operations implies
the violation of reasoning. Piaget J. shows that a key problem of personal understand-
ing operational classification is the problem of understanding the inclusion relation.
He adds that the lattice structure is the source of classification operations (2, pp. 195,
387-389).
   The idea that classification is a lattice arose also from practical tasks of pattern
recognition. In 1974, J. Shreider has described the classification algebra (6) as idem-
potent semigroup with the unit element. In 1974, N. Boldyrev advanced (7) the for-
malization of pattern recognition system as algebra with two binary operations of
refinement and generalization defined by an axiom system including lattice axioms.
   The paper is organized as follows: basic definitions are given in Section 2, Section
3 describes briefly a model of lattice construction as inductive-deductive common-
sense or classification reasoning; some words of conclusion terminate this article.


2      Basic Definitions

   IMPLICATIVE ASSERTIONS (logical rules of the first kind) describe regular re-
lationships connecting together objects, properties and classes of objects. We consider
the following forms of assertions: implication (a, b, c  d), forbidden rule (a, b, c 
false (never), diagnostic rule (x, d  a; x, b  not a; d, b  false), rule of alterna-
tives (a or b  true (always); a, b  false), compatibility (a, b, c  VA, where VA is
the occurrence’s frequency of the rule).
   In our consideration, COMMONSENSE REASONING RULES (CRRs) are rules
with the help of which implicative assertions are used, updated and inferred from
instances. The deductive CRRs infer consequences from observed facts with the use
of implicative assertions. An analysis of human commonsense reasoning shows that
these rules are the following ones: modus ponens: “if A, then B”; A; hence B; modus
ponendo tollens: “either A or B” (A, B – alternatives); A; hence not B; modus tollendo
ponens: “either A or B” (A, B – alternatives); not A; hence B; modus tollens: “if A,
then B”; not B; hence not A; generating hypothesis: “if A, then B”; B; A is possible.
   The inductive CRRs are the canons formulated by John Stuart Mill (1): Method of
Agreement, Method of Difference, Joint Method of Agreement and Difference,
Method of Concomitant Changes, and Method of Residuum. These methods are not
rules but they are the processes in which implicative assertions are generated and used
immediately. Therefore inductive inferences are not separated from deductive ones.
   Let G = {1, 2,…, N} be the set of objects’ indices (objects, for short) and M = {m1,
m2, …, mj, …mm} be the set of attributes’ values (values, for short). Each object is
described by a set of values from M. The object descriptions are represented by rows
of a table the columns of which are associated with the attributes taking their values in
M (see, please, Table 1).
   The definition of good tests is based on correspondences of Galois on I = GM (8)
and two relations G  M, M  G. Let A  G, B  M. Denote by Bi, Bi  M, i = 1,…,
N the description of object with index i. We define the relations G  M, M  G as
follows: G  M: A = val(A) = {intersection of all Bi: Bi  M, i  A} and M  G: B
= obj(B) = {i: i  G, B  Bi}. Of course, we have obj(B) = {intersection of all obj(m):
obj(m)  G, m  B}.
   Operations val(A), obj(B) are reasoning operations (derivation operators) related to
discovering general features of objects and all objects possessing a given set of fea-
tures.
   We introduce two generalization operations: generalization_of(B) = B =
val(obj(B)); generalization_of(A) =A = obj(val(A)). These operations are actually
closure operators (8). A set A is closed if A = obj(val(A)). A set B is closed if B =
val(obj(B)). For g  G and m  M, {g} is denoted by g and called object intent, and
{m} is denoted by m and called value extent.
   The generalization (specification) operations are usual mental acts. Suppose that
somebody has seen two films with the participation of Gerard Depardieu. After that
he tries to know all the films with his participation. Suppose that one can know that
Gerard Depardieu acts with Pierre Richard in several films. After that he can discover
that these films are the films of the same producer Francis Veber.
   For representing a classification, we use factually the way proposed by S.O.
Kuznetsov in (9) for the case when the set M is the set of attribute’s values. Let a
context K = (G, M, I) be given. In addition to values of M, a target value ω  M of an
attribute is considered. The set G of all objects is partitioned into two subsets: the set
G+ of objects having property ω (positive objects), the set G− of objects not having
property ω (negative objects). We have K = K+  K−, where K+ = (G+, M, I+), K− =
(G−, M, I−), G = G+  G− (G− = G\ G+). Diagnostic test is defined as follows.
   Definition 1. A diagnostic test for G+ is a pair (A, B) such that B  M (A = obj(B)
≠ Ø), A  G+ and B  val(g) & B  val(g), g, g  G−. Equivalently, obj(B) ∩ G− =
.
   In general case, a set B is not closed for diagnostic test (A, B), i. e., a diagnostic test
is not obligatory a concept of FCA. This condition is true only for the special class of
tests called ‘maximally redundant ones’.
   Definition 2. A diagnostic test (A, B), B  M (A = obj(B)  ) for G+ is maxi-
mally redundant (GMRT) if obj(B  m)  A, for all m  B and m  M.
   Definition 3. A diagnostic test (A, B), B  M (A = obj(B)  ) for G+ is irredun-
dant if any narrowing B* = B\m, m  B implies that (obj(B*), B*)) is not a test for
G+.
   Definition 4. A diagnostic test (A, B), B  M (A = obj(B)  ) for G+ is good if
and only if any extension A* = A  i, i  A, i  G+ implies that (A*, val(A*)) is not a
test for G+.
   If a good test (A, B), B  M (A = obj(B)  ) for G+ is irredundant, then any nar-
rowing B* = B\m, m  B implies that (obj(B*), B*)) is not a test for G+. If a good test
(A, B), B  M (A = obj(B)  ) for G+ is maximally redundant, then any extension B*
= B  m, m  B, m  M implies that (obj(B*  m), B*) is not a good test for G+.
   Definition 5. Let t be a set of values such that (obj(t), t) is a test for a given set of
objects. We say that the value m  M, m  t is essential in t if (obj(t\m), (t\m)) is not a
test for a given set of object.
   Definition 6. Let s be a subset of objects belonging to a given positive class of ob-
jects; assume also that (s, val(s)) is not a test. The object tj, j  s is said to be an essen-
tial in s if (s\j, val(s\j)) proves to be a test for a given set of positive objects.
     To illustrate using essential values and generalization operations in the process of
good tests’ generation, we consider a partition of objects in Table 1 into positive and
negative ones. Let G(+) be equal to {4,5,6,7,8} and splus(m) = obj(m)  G(+), m  T.
The value ‘Red’ corresponds to a test for positive objects because obj(Red) =
splus(Red)  G(+). Delete ‘Red’ from consideration. The value ‘Tall’ is essential one
in object 7 and does not correspond to a test: obj(Tall) = {3,4,5,7,8} ≠ splus(Tall).
The projection of the value ‘Tall’ on the set of positive objects is in Table 2. Here
splus(Bleu) = {5,7,8}, val(splus(Bleu)) = ‘Tall Bleu’, obj(Tall Bleu) = splus(Tall
Bleu), hence ‘Tall Bleu’ corresponds to a test for Class 2. We have also that ‘Tall
Brown’ corresponds to a test but not a good one. We delete ‘Bleu’ and ‘Brown’ from
the projection as shown in Table 3.

                         Table 1. Example of a data classification
             Index ofexample    Height   Color of hair   Color of eyes   Class
             1                  Low      Blond           Bleu            1
             2                  Low      Brown           Bleu            1
             3                  Tall     Brown           Hazel           1
             4                  Tall     Blond           Hazel           2
             5                  Tall     Brown           Bleu            2
             6                  Low      Blond           Hazel           2
             7                  Tall     Red             Bleu            2
             8                  Tall     Blond           Bleu            2

              Table 2. The projection of the value ‘Tall’ on objects of G(+)
             Index of example   Height   Color of hair   Color of eyes   Class
             4                  Tall     Blond           Hazel           2
             5                  Tall     Brown           Bleu            2
             7                  Tall                     Bleu            2
             8                  Tall     Blond           Bleu            2

      Table 3. The reduction of the projection of the value ‘Tall’ on objects of G(+)
             Index of example   Height   Color of hair   Color of eyes   Class
             4                  Tall     Blond           Hazel           2
             5                  Tall                                     2
             7                  Tall                                     2
             8                  Tall     Blond                           2
    Now rows 5 and 7 do not correspond to tests for Class 2 and they can be deleted.
The intersection of the remaining rows of the projection is ‘Tall Blond’. We have that
obj(Tall Blond) = {4,8}  G(+) and (obj(Tall Blond), Tall Blond) is a test for Class 2.
As we have found all the tests for Class 2 containing ‘Tall’ we delete ‘Tall’ from the
objects of this class. Return to Table 1. We can delete rows 5, 7, and 8 because they
do not correspond to tests for Class 2: value Tall is essential one in these rows. The
intersection of the remaining objects of Class 2 gives a test (obj(Blond Hazel), Blond
Hazel) because obj(Blond Hazel) = splus(Blond Hazel) = {4,6}  S(+).


3      Inferring good classification tests as commonsense reasoning

   We shall consider two interconnected lattices OBJ = (2G, , ) = (2G, ) and VAL
= (2M, , ) = (2M, ), where 2G, 2M designate the set of all subsets of objects and the
set of all subsets of values, respectively; s  2G, t  2M. Inferring the chains of lattice
elements ordered by the inclusion relation lies in the foundation of generating all di-
agnostic tests: (1) s0  …  si  si+1  …  sm (val(s0)  val(s1)  …  val(si) 
val(si+1)  …  val(sm)) ; (2) t0  …  ti  ti+1  …  tm (obj(t0)  obj(t1)  … 
obj(ti)  obj(ti+1)  …  obj(t m)). The dual ascending and descending processes of
lattice generation are determined as follows: (3) t0  t1  …  ti  ti+1  …  tm
(obj(t0)  obj(t1)  …  obj(ti)  obj(ti+1)  …  obj(t m)) ; (4) s0  s1 …  si 
si+1  …  sm (val(s0)  val(s1) …  val(si )  val(si+1)  …  val(sm)).
    The following inductive transitions from one element of a chain to its nearest ele-
ment in the lattice are used: (i) from sq to sq+1, (ii) from tq to tq+1, (iii) from sq to sq-1,
(iv) from tq to tq-1, where q, q+1, q-1 are the cardinalities of enumerated subsets of
objects and values: sq, sq+1, and sq-1,  G; tq, tq+1, and tq-1  M.
    The transitions can be smooth and boundary. Under smooth transition, generating
sets of values (objects) is performed with preserving a given property of them. These
properties are, for example, “to be a test for a given class of objects”, “to be an irre-
dundant set of values”, “not to be a test for a given class of objects”, and some others.
A transition is said to be boundary if it changes a given property of sets of values
(objects) into the opposite one.
    For realizing the smooth inductive transitions, the following inductive reasoning
rules are used: generalization rule, specification rule, and dual generalization and
specification rules.
    The generalization rule is used to get all the sets of objects sq+1 = {i1, i2, … iq, iq+1}
from a set sq = {i1, i2, … iq} such that (sq, val(sq)) and (sq+1, val(sq+1)) are tests for a
given class of objects. The termination condition of generalization chain is: for all the
extension sq+1 of sq, (sq+1, val(sq+1)) is not a test for a given class of objects.
    The specification rule is used to get all the sets of values tq+1 = {m1, m2, …, mq+1}
from a set tq = {m1, m2, …, mq} such that tq and tq+1 are irredundant sets of values and
(obj(tq), tq) and (obj(tq+1), tq+1) are not tests for a given class of objects. The termina-
tion condition for specification chain is: for all the extensions tq+1 of tq, tq+1 is either a
redundant set of values or a test for a given class of objects.
    The dual generalization and specification rules relate to narrowing the collection of
values and objects, respectively.
    These rules realize the Joint Method of Agreement and Difference.
    All inductive transitions take their interpretations in human mental acts. The ex-
tending of a set of objects with checking the satisfaction of a given condition is a
typical method of inductive reasoning. In pattern recognition, the process of inferring
hypotheses about the unknown values of some attributes is reduced to the maximal
expansion of a collection of the known values of some attributes in such a way that
none of the forbidden pairs of values would belong to this expansion. The contraction
of a collection of values is used, for instance, in order to delete from it redundant or
non-informative values. The contraction of a collection of objects is used, for in-
stance, in order to isolate a certain cluster in a class of objects. Thus, we distinguish
lemons in the citrus fruits.
    The smooth transitions require the use of searching for admissible values (objects)
for extending or narrowing the set of values (objects). Consider some methods for
choosing objects admissible for extending s. Let S(test) be the partially ordered set of
elements s = {i1, i2, … iq}, q = 1, 2, …, nt - 1 obtained as a result of generalizations
and satisfying the following condition: (s, val(s)) is a test for a given class of positive
objects, nt is the number of positive objects. Let STGOOD be the partially ordered set
of elements s satisfying the condition: (s, val(s)) is a GMRT for a given class of posi-
tive objects.
    Method 1. Suppose that S(test) and STGOOD are not empty and s  S(test). Con-
struct the set V = { s’, s  s’, s’  {S(test)  STGOOD}}. The set V is the union of
all elements in S(test) and STGOOD containing s, hence, s is in the intersection of
these elements. If we want an extension of s not to be included in any element of
{S(test)  STGOOD}, we must use, for extending s, the objects not appearing simul-
taneously with s in V. The set of objects, candidates for extending s, is equal to
CAND(s) = nts\V, where nts = { s, s  S(test)}.
    An object j*  CAND(s) is not admissible for extending s if at least for one object
i  s the pair {i, j*} either does not correspond to a test or it corresponds to a good
test (it belongs to STGOOD). Let Q be the set of forbidden pairs of objects for extend-
ing s: Q = {{i, j}  S(+): ({i, j}, val({i, j}) is not a test for a given class of positive
objects }. Then the set of admissible objects is select(s) = {i, i  CAND(s): (j) (j 
s), {i, j}  {STGOOD or Q}}. The set Q can be generated in the beginning of search-
ing for all GMRTs for a given class of positive objects.
    Method 2. In this method, the set CAND(s) is determined as follows. Let s* = {s
 j} be an extension of s, where j  s. Then val(s*)  val(s). Hence the intersection
of val(s) and val(j) must be not empty. The set CAND(s) = {j: j  nts\s, val(j)  val(s)
 Ø}.
The knowledge acquired during the process of generalization (the sets Q, CAND(s),
S(test), STGOOD) is used for pruning the search in the domain space.
    The boundary inductive transitions are used to get: (1) all the sets tq from a set tq-1
such that (obj(tq-1), tq-1) is not a test but (obj(tq), tq) is a test, for a given set of objects;
(2) all the sets tq-1 from a set tq such that (obj(tq), tq) is a test, but (obj(tq-1), tq-1) is not a
test for a given set of objects; (3) all the sets sq-1 from a set sq such that (sq, val(sq)) is
not a test, but (sq-1, val(sq-1)) is a test for a given set of objects; (4) all the sets of sq
from a set sq-1 such that (sq-1, val(sq-1)) is a test, but (sq,val(sq)) is not a test for a given
set of objects. The boundary inductive transitions realize the Method of Difference or
Method of Concomitant Changes. For their implementation, we use the inductive
diagnostic rule (IDR) and dual inductive diagnostic rule (DIDR). These rules require
searching for essential values (IDRs) and essential objects (DIDRs).
    All the boundary transitions are also interpreted as human reasoning operations.
Transition 1 is used for distinguishing two diseases with similar symptoms. Transition
2 can be interpreted as including a certain class of objects into a more general one.
For instance, squares can be named parallelograms, all whose sides are equal. In some
intellectual psychological texts, a task is given to remove the “superfluous” (inappro-
priate) object from a certain group of objects (rose, butterfly, phlox, and dahlia) (tran-
sition 3). Transition 4 can be interpreted as the search for a refuting example.
    Inductive reasoning rules generate implicative assertions or logical rules of the first
kind, as shown in Table 4.
    Table 4. Rules of the first kind obtained with the use of inductive reasoning rules
Inductive rules                  Action                       Inferring rules of the first kind
Generalization rule              Extending s (narrowing t)    Implications
Specification rule               Extending t (narrowing s)    Implications
Inductive diagnostic rule        Searching for essential      Diagnostic rules, forbidden rules
                                 values
Dual inductive diagnostic rule   Searching for essential      Compatibility rules (approximate
                                 objects                      implications)
   During the lattice construction, the implicative assertions based on tests, are gener-
ated and used immediately. The knowledge acquired during the process of generaliza-
tion (specialization) is used for pruning the search space (current context) with the use
of deductive reasoning rules.


4       Conclusion

   This work is an attempt to consider a large class of machine-learning tasks as a
model of commonsense reasoning process based on using well-known deduction and
induction logical rules. For this goal, we have chosen the task of inferring good classi-
fication tests for a given partitioning on a given set of objects because a lot of well-
known machine-learning problems such as inferring functional, implicative, and asso-
ciative dependencies from a dataset are reduced to this task.


5       References
 1. Mill, J. S.: The system of logic. Russian Publishing Company “Book Affair”, Moscow
    (1900) (in Russian).
 2. Piaget, J.: Genesis of the elementary logical structures. Classification and serializations.
    “EKSMO-Press“, Moscow (2002) (in Russian).
 3. Finn, V.K.: Synthesis of cognitive procedures and induction problem. VINITI, NTI, Ser.
    2(1, 2) (1999) (in Russian).
 4. Zakrevskij, A.D.: A common logical approach to data mining and pattern recognition. In:
    Triantaphyllou, E. and Felici, G. (eds.). Data mining and knowledge discovery approaches
    based on rule induction techniques (pp. 1-43). Springer (2004).
 5. Naidenova, X.: Reducing machine learning tasks to the approximation of a given classifi-
    cation on a given set of examples. In Proceedings of the 5-th National Conference at Arti-
    ficial Intelligence (Vol. 1, pp. 275-279), Kazan, Tatarstan (1996) (in Russian).
 6. Shreider, J.: Algebra of classification. VINITI, NTI, Series 2 (9), pp. 3-6 (1974) (in Rus-
    sian).
 7. Boldyrev, N. G.: Minimization of Boolean partial functions with a large number of “Don’t
    Care” conditions and the problem of feature extraction. Proceedings of International Sym-
    posium “Discrete Systems” (pp.101-109). Riga, Latvia (1974).
 8. Ore, O.: Galois Connections. Transactions of the American Mathematical Society, 55(1),
    493-513 (1944).
 9. Kuznetsov, S. O.: Machine learning on the basis of Formal Concept Analysis. Automation
    and Remote Control, 62(10), pp. 1543-1564 (2001).