=Paper= {{Paper |id=Vol-1921/paper9 |storemode=property |title=Notes on Relation Between Symbolic Classifiers |pdfUrl=https://ceur-ws.org/Vol-1921/paper9.pdf |volume=Vol-1921 |authors=Vladimir Parkhomenko,Xenia Naidenova,Alexey Buzmakov,Alexander Schukin }} ==Notes on Relation Between Symbolic Classifiers== https://ceur-ws.org/Vol-1921/paper9.pdf
    Notes on Relation Between Symbolic Classifiers

       Xenia Naidenova1 , Alexey Buzmakov2 , Vladimir Parkhomenko3 , and
                               Alexander Schukin3
                  1
                   Military Medical Academy, Saint-Petersburg, Russia
                                 ksennaidd@gmail.com
                   2
                     NRU Higher School of Economics, Perm, Russia
                                  avbuzmakov@hse.ru
    3
      Peter the Great St. Petersburg Polytechnic University, Saint-Petersburg, Russia
               parhomenko.v@gmail.com, alexander.schukin@spbstu.ru



         Abstract. Symbolic classifiers allow for solving classification task and
         provide the reason for the classifier decision. Such classifiers were studied
         by a large number of researchers and known under a number of names in-
         cluding tests, JSM-hypotheses, version spaces, emerging patterns, proper
         predictors of a target class, representative sets etc. Here we consider such
         classifiers with restriction on counter-examples and discuss them in terms
         of pattern structures. We show how such classifiers are related. In par-
         ticular, we discuss the equivalence between good maximally redundant
         tests and minimal JSM-hyposethes and between minimal representations
         of version spaces and good irredundant tests.

         Keywords: machine learning · symbolic classifier · version spaces · JSM-
         method · minimal JSM-hypotheses · test theory · irredundant test · good
         test · representative sets · jumping emerging patterns · formal concept
         analysis · pattern structure · semiconcepts


1      Introduction

Symbolic classifiers are being developed more and more extensively nowadays.
One of the main advantages of such classifiers is their ability to provide logical
rules that can be interpreted by experts. In this paper, we consider classifiers
with a restriction on counter-examples, which is widely spread. The reason of
its popularity is the opportunity to control the entrance of an attribute to the
premise of classification rules, in particular, if a counter-example is forbidden,
some combinations of its attributes are guaranteed not to be in the premise.
    These classifiers were studied by a large number of researchers and known
under a number of names. Accordingly, tests [7,44,41,29], JSM-hypotheses [13],
version spaces [25], emerging patterns [42,6], representative sets [43,45] (first
mentioned in [3] as representative samples), proper predictors of a target class
[16] are considered in this paper. Establishing links between these classifiers
would allow for mutually enriching the approaches. The main motivation of the
paper is to provide an appropriate framework on the basis of pattern structures.
    The rest of the paper is organized as follows. Sec. 2 presents the basic defi-
nitions of Formal Concept Analysis and pattern structures to obtain a language
for symbolic classifiers comparison. Sec. 3 is devoted to the interpretation of
diagnostic tests, version spaces and good maximally redundant tests in the lan-
guage of pattern structures. Finally, the relation between good tests, minimal
JSM-hypotheses and version spaces is discussed in Sec. 4.


2   Main definitions of FCA and pattern structures

Let us consider the main ideas of FCA, a deeper review of FCA and its appli-
cations can be found in [39,37,38]. Let (2G , ⊆) and (2M , ⊆) be posets, where G
and M are called the set of objects and the set of attributes, respectively, ⊆ is
the inclusion relation. The binary relation I ⊆ G × M provides links between
G and M . The triple K = (G, M, I) is called a formal context. In Formal Con-
                          def


cept Analysis (FCA), Galois connection is a pair of derivation operators between
(2G , ⊆) and (2M , ⊆) usually denoted with one symbol (·)0 [15,40]. In this paper
for clarity, they are denoted with two different symbols (·)↑ and (·)↓ , since they
represent different operators.

                        A↑ := {m ∈ M | (∀g ∈ A)[gIm]};                          (1)
                          ↓
                        B := {g ∈ G | (∀m ∈ B)[gIm]}.                           (2)

    A pair (A, B) such that A ⊆ G, B ⊆ M and (A = B ↓ ) & (A↑ = B) [40] is
called a formal concept, left and right parts of which are called the concept extent
and the concept intent, respectively. The seminal paper on FCA is written by
R. Wille 1982 and reprinted in 2009 [40]. Some similar results were obtained in
[2] however did not get much attention. Maximal rectangles in Boolean matrices
[35] can also be considered as formal concepts. Retrospective analysis of more
earlier papers from USSR and other countries can be seen in [19,11,33].
    The superpositions of operators (·)↑↓ and (·)↓↑ form closure operators. For-
mulas (3–8), where A2 , A1 , A ⊆ G and B2 , B1 , B ⊆ M , reflect the properties of
Galois connection [40].

                           If A ⊆ A1 , then A1 ↑ ⊆ A↑ ;                         (3)
                                                ↓     ↓
                           if B ⊆ B1 , then B1 ⊆ B ;                            (4)
                                  ↑↓
                           A⊆A ;                                                (5)
                                  ↓↑
                           B⊆B ;                                                (6)
                              ↑    ↑↓↑
                           A =A          ;                                      (7)
                           B ↓ = B ↓↑↓ .                                        (8)

    Let us note in Proposition 1, that the antecedents of equations 3 and 4 can
satisfy ⊂ instead of ⊆.
Proposition 1. If A1 , A ⊆ G and B1 , B ⊆ M , then

                            if A ⊂ A1 , then A1 ↑ ⊆ A↑ ;                        (9)
                                                  ↓        ↓
                            if B ⊂ B1 , then B1 ⊆ B .                          (10)

   Proof is the same as the similar proof for ⊆ case in antecedent made by
R. Wille and B. Ganter in 1999. Later we will rely on the following Corollaries.

Corolarry 1. Let A1 ⊆ G and A ⊂ A1 , then

1. A1 ↑ ⊂ A↑ , if ∃m [(m ∈ A↑ ) ∧ (m ∈
                                     / (A1 \ A)↑ )];
2. A1 = A , if ∀m [(m ∈ A ) → (m ∈ (A1 \ A)↑ )].
      ↑    ↑                ↑
                                                                                  t
                                                                                  u

Corolarry 2. Let B1 ⊆ M and B ⊂ B1 , then

1. B1 ↓ ⊂ B ↓ , if ∃g [(g ∈ B ↓ ) ∧ ((B1 \ B) 6⊂ g ↑ )];
2. B1 ↓ = B ↓ , if ∀g [(g ∈ B ↓ ) → (g ∈ (B1 \ B)↓ )].                            t
                                                                                  u

    Formal concepts are partially ordered by inclusion of extents or intents: for
A1 , A ⊆ G, and B1 , B ⊆ M the order is given by

             (A1 , B1 ) ≤ (A, B) :⇔ A1 ⊆ A (or dually ⇔ B ⊆ B1 ).              (11)

    The set of ordered formal concepts forms the formal concept lattice B(G, M, I)
(Galois lattice) of an input formal context.
    Although FCA deals with binary contexts, many other types of data can
be processed. In particular, a many-valued context M is considered in FCA.
M def
    = (G, M, W, J), where G, M , W are the set of object names, attributes and
attribute values, J is the ternary relation J ⊆ G × M × W , which gives the value
w ∈ W of the attribute m ∈ M to the object g ∈ G (notation (g, m, w) ∈ J or
m(g) = w). The statements (g, m, w) ∈ J and (g, m, v) ∈ J lead to w = v. M
can be represented with a table, where columns and rows corresponds to M and
G, respectively.
    The procedure of transformation of many-valued contexts to binary contexts
is called conceptual scaling [15]. The disadvantage of scaling is the essential
growth of the number of attributes. Additionally, since scaled attributes are
considered independently, a lattice building procedure cannot be implemented
in the most efficient way.
    There are related papers that avoid these disadvantages [5,31,17,12,14,34,8,4].
In particular, [17] introduced so called pattern structures (PSs) P = (G, (D, u), δ),
                                                                    def


where G and D are the sets of objects and the set of descriptions, δ : G → D
is the mapping giving the correspondence between objects and their descrip-
                                          def
tions, (D, u) forms a semilattice. δ(G) = {δ(g) | g ∈ G} produces the complete
subsemilattice (Dδ , u) of (D, u) [17].
    Elements of D are called patterns. For arbitrary patterns c, d ∈ D the sub-
sumption order v is given by: c v d :⇔ c u d = c.
   Derivation operators of Galois connection between (2G , ⊆) and (D, v) [17]
are typically denoted by (·) or (·) . As earlier, for readability it is replaced by
two symbols (12) and (13) in the following way:

                     A⬘ := {ug∈A δ(g) | ∀A ⊆ G};                                (12)
                     d := {g ∈ G | (∀d ∈ (D, u)) [d v δ(g)]}.
                      ⬙
                                                                                (13)

    A pair (A, d), A ⊆ G, d ∈ (D, u), (A⬘ = d) ∧ (d⬙ = A) is a pattern concept.
The set of pattern concepts is ordered by extents and intents and forms the lattice
B(G, (D, u), δ). For any PS P , there can be several formal contexts generating
an isomorphic lattice. Every such context is called representative context R(P ).
A possible construction procedure is discussed in Theorem 1 from [17]. Note, that
the notations of intents and extents, some propositions and their corollaries, e.g.
Proposition 1, Corollaries 1 and 2, from FCA are applicable to PSs, and vice
versa.
    Many kinds of data can be represented by PSs. Let us now show how to rely
on PSs in a classification task. Let K be the set of class labels (or possible class
descriptions in general). For a subset of objects we know their labels by means
of a function ε : G → K. Let us call α : D → K the classification function or, in
some sense, the recognition (forecasting) algorithm: see, please, Figure 1, where
the appropriate commutative diagram is depicted. For the sake of simplicity,
suppose that K divides G into two disjoint classes G+ and G− . If the class of
an object is unknown, it can be referred to Gτ .


                                   G        ε   K


                                                  α
                                        δ

                                                 D


             Fig. 1. Commutative diagram of classification function α


   Pattern structures P+ = (G+ , (D, u), δ), P− = (G− , (D, u), δ) and Pτ =
                             def                      def                         def


(Gτ , (D, u), δ) are called positive, negative and unknown pattern structures, re-
spectively. Then P± is a classification structure P+ ∪ P− . Let E = (G, (K, u), )
                                                                  def


be a pattern structure of class descriptions. Let L = P± × E = (G± , (D, u) ×
                                                      def


(K, u), β(g)), where β(g) = hδ(g), (g)i, be a learning PS.


3   Interpretation of Diagnostic Tests, Version Spaces and
    Good Maximally Redundant Tests
Let us discuss application of PSs for dealing with the classification task.
3.1     Diagnostic Tests and I-tests

A notion of a diagnostic test was introduced in 1955 by Chegis and Yablonskii for
checking electronic schemas [7]. In their work, the authors suggest an approach
that allows for identification of the type of an electronic schema malfunction. In
the data, the input and output of the schemas are paired with the type of the
malfunction. Accordingly, such data do not allow for counter-examples, since a
schema can be either valid or invalid. Moreover, all possible inputs are considered
to be present in the data. The diagnostic test is defined as a set of the data rows
that allow for identification of the malfunction.
    Later, diagnostic tests were used in a meaning being very close to a special
kind of functional dependencies (FDs) in the data [10,44,41]. In particular, [44]
defines a diagnostic test in the following way (rewritten in terms of many-valued
contexts):

Definition 1. Given a many-valued context (G, M, W, J), a diagnostic test is a
set of attributes B ⊆ M having the following property: the context (G, B, W, J ∩
G × B × W ) does not contain identical rows in distinct classes.

    It should be mentioned that it is not a trivial task to express FDs in terms of
PSs. Indeed, FDs subsume that there are a number of “attributes” and only some
of them are considered within an FD. However, in PSs there are no attributes in
general. What could be an analogue of the attributes for graphs or sequences?
    A possible way for expressing FDs in terms of PSs is the following. Given a PS
(G, (D, u), δ), let D be represented as D = D1 × D2 × · · · × DN , where (Di , ui ) ×
(Dj , uj ) = (Di × Dj , ui,j ) and ha, bi ui,j hc, di = ha ui c, b uj di. Accordingly, δ
is decomposed as δ(g) = hδ1 (g), δ2 (g), . . . , δN (g)i, where δi : G → Di . Given a
subset D ⊆ {Di } let δD (g) = hδj1 (g), δj2 (g), . . . i, where ∀ji [Dji ∈ D]. Then a
diagnostic test can be expressed as follows.

Definition 2. A diagnostic test for the learning context L = (G± , (D, u) ×
                                                                   def


(K, u), hδ(g), (g)i) is a subset D ⊆ {Di } such that, if there are g1 , g2 ∈ G± and
δD (g1 ) = δD (g2 ), then (g1 ) = (g2 ).

    Very similar entities to diagnostic tests are premises of implications with
the conclusion from the set of classes K. Different kinds of such premises are
known as hypotheses [13], representative sets [3], tests [30]1 , jumping emerging
patterns [42], and others. In order to distinguish such notions from diagnostic
tests let us call them i-tests.

Definition 3 (i-test in terms of PSs). htest is called an i-test for P± iff

                          htest v δ(g), ∃g ∈ G+ , htest ⬙ 6= ∅;                    (14)
                          htest 6v δ(g), ∀g ∈ G− .                                 (15)
1
    In this paper, premises with their extents are considered.
    The notions of diagnostic tests and i-tests are deeply interconnected. There
is an algorithm that allows for reducing the tasks of finding diagnostic tests and
i-tests to each other [28]. Moreover natural links between FDs and dependencies
in a partition lattice are studied in [27,9,1], and relation between these three
notions are shown in [33]. A related notion to i-test is semiconcept introduced
by R. Wille.

3.2   Semiconcepts
Semiconcept is a relaxation of a formal concept and was introduced in 1990 by
R. Wille [23].
Definition 4 (Object semiconcept, OSC). A pair (A, A⬘ ) is called an object
semiconcept.
Definition 5 (Attribute semiconcept, ASC). A pair (d⬙ , d) is called an
attribute semiconcept.
    The left part of ASC is closed and is called the extent of ASC, while the right
part is not necessary closed and is called the generator of ASC. This relaxation
of concepts allows for expressing different kinds of i-tests (representative sets,
jumping emerging patterns, etc.). Similarly, the right part of an OSC is closed
and is called the intent, while the left part is not necessary closed and can be
called the generator. Semiconcept notation is useful for a description of i-tests
inferring algorithms.
    I-tests are used for solving the classification task. The classification task itself
is well defined in the works of Mitchell under the name of version spaces. Let us
consider them in more detail.

3.3   Version Spaces
Version spaces are proposed in 1977 [25] with a detailed description given in
1978 [26] by T. Mitchell. Although T. Mitchell works only with very specific lan-
guage of classifiers in the report (this language is closed to i-tests and hypotheses
for a formal context), the formalization of version spaces is very general. It al-
lows for any language of descriptions of objects and any language of classifiers.
Let us recall a particular case of version spaces, where the language describing
the classifiers is (D, u) [18].
Definition 6. A classifier α is any predicate on D, in particular, α : (D, u) →
K.
    Defining classifiers by predicates allows for constructing a classifier as any
rule that can be applied to unknown objects. Although a classifier can be any
predicate, but not all classifiers are consistent with the data in hand.
Definition 7. A classifier α is called consistent under restriction on counter-
example if {g ∈ G± | α(δ(g))} = G+ .
    A consistent classifier divides all possible objects, that are called instances,
into two groups (or more groups in a multiclass case) of positive objects and
negative objects. When the classifier is consistent, it means that all the instances
from the dataset (called examples in the thesis of Mitchell) belong to the correct
group. There is a number of such classifiers that are consistent with the data.

Definition 8. A set of consistent classifiers is called a version space VS.

    In terms of examples, i.e. the objects of the dataset, there is the only possible
“consistent” classifier. Indeed, there are two class labels. Accordingly there is the
only one partion of the set of examples (the objects with known class labels from
the dataset) in two groups with the same class label at each group. However,
in terms of instances, i.e., all possible objects, the set of classifiers can be huge.
These classifiers are naturally ordered by means of set inclusion operations on
instances. Correspondingly, if the language of classifiers is (D, u), then the order
of classifiers is opposite to the order given by v of descriptions in D.
    The version space can be described by the minimal and maximal boundaries
of classifiers in terms of the classifier order.

Definition 9. VS max = {p ∈ VS | @p1 ∈ VS such that p1 v p}

Definition 10. VS min = {p ∈ VS | @p1 ∈ VS such that p1 w p}

   For convenience let us call the classifiers that are not from VS min and VS max
“middle” classifiers.

Definition 11. A middle VS is given by VS mid = VS \ VS min \ VS max .

    Although version spaces are presented in this paper based on [25,26,18] there
are some important details of interpretation of version spaces in [25,26], which
were clarified in the further manuscripts by T. Mitchell (see, e.g. [24, p.31]).
This interpretation is associated with the following specification of the order
v: in definition of VS min one shall use this order w.r.t. the extent of VS max . To
clarify the order-based notation Mitchell just gives subscription g to the order v.
To be more careful with the abbreviations used, we will use the notation vVS ⬙ .
                                                                                max
The order is discussed in subsection 4.3. Now we shall discuss JSM-hypotheses
to show their relations with proper predictors and i-tests.


3.4   JSM-hypotheses

JSM-hypotheses were proposed in 1981 by F.K. Finn [13] and were formulated
in 2001 by Ganter & Kuznetsov in terms of PSs.
    Minimal JSM-hypotheses are proposed by S.O. Kuznetsov in [20,21]. Later
this formalism is reformulated in terms of PSs [18].

Definition 12. A minimal JSM-hypothesis hmin is a hypothesis (i-test) that
@htest [(htest @ hmin ) ∧ (htest = htest ⬙ ⬘ ) ∧ (hmin = hmin ⬙ ⬘ )].
   Another important notion is proper premise, which is closely related to JSM-
hypothesis [18]. We give it in terms of PSs.
Definition 13. Given a PS P and d ∈ D, d 6= ∅ is a proper premise of P , if
d⬙ ⬘ 6= d and @c @ d such that c⬙ ⬘ = d⬙ ⬘ .
      Let us denote descriptions of positive and negative objects by D+ and D− .
Definition 14. For a classification structure P± , d ∈ D+ is a proper premise
for a classification pattern (say, K = {k+ , k− }), if d is a proper premise and
k+ 6@ d and k+ @ d⬙ ⬘ .
   One extra notion is proposed in [18] to show a direct relation between PSs
and VS (VSmin ).
Definition 15. For P± , d ∈ D+ is a proper predictor iff the following condi-
tions are satisfied:
 – d⬙ ∩ G− = ∅,
 – d⬙ ∩ G+ 6= ∅,
 – if d1 v d and d1 6= d, then d1 ⬙ ∩ G− 6= ∅.

3.5     Good Tests
The notion of good tests as FDs was developed in 1985 [28]. Later in 1992, good
tests were extended the case of i-tests (the implication case) [30]. This extension
is a transformation algorithm for reducing one case to another.
    The algorithm reducing search for diagnostic tests (as FDs) to the search for
implications is given in Fig. 2 [30]. Similar ideas can also be found in [22,1].
    The input of the algorithm is a many-valued context M and a labelling func-
tion . The output is two new contexts K+ = (G+ , M, I+ ), K− = (G− , M, I− )
                                               def                   def


such that i-tests found in K+ are diagnostic tests in M , and objects from K−
are counter-examples.
    In line 10 of this algorithm, the Boolean matrix (the classification context is
obtained as K± = K+ ∪ K− ) can be called an indiscernibility matrix [33]. After
                   def


this step, if it is necessary, only maximal elements of G− can be stored. After
running the described algorithm one can infer the initial set of diagnostic tests
from the found i-tests.
    Usually, the definition of a classifier is given according to the class label,
however a classification structure P± is omitted for simplicity [22,32]. If an i-
test is referred to positive or negative objects, then the appropriate situation
will be denoted by P± or P∓ , respectively.
Definition 16. An i-test (h⬙ , h) for P± is called an i-test extended with its
extents h⬙ , or ASC-test.
   In order to define good i-tests we should first introduce irredundant i-tests.
The predecessor of this notion was introduced by S. Yablonskiy in 1955 [7] under
the name elementary test.
          Input: the many-valued context   M   def
                                                = (G, M, W, J), the class membership
                   :G→K
                                                         K      def
          Output: positive and negative binary contexts + = (G+ , M, I+ ),
                   K     def
                                                                K
                       − = (G− , M, I− ) such that i-tests found in + are diagnostic
                            M                        K
                     tests in , and objects from − are counter-examples
    1.    for ∀gi , gj ∈ G do
    2.        if i < j then
    3.             G ← (gi , gj );

    4.    for ∀(gi , gj ) ∈ G do
    5.        if m(gi ) = m(gj ) then
    6.            (gi , gj )Im;
    7.       if (gi ) = (gj ) then
    8.           G+ ← (gi , gj );
    9.       else G− ← (gi , gj );
10.       I+ = I ∩ (G+ × M ), I− = I ∩ (G− × M );
11.       for ∀g+ ∈ G+ , ∀g− ∈ G− do
12.           if g+ ↑ ⊆ g− ↑ then
13.               G+ ← G+ \ g+ ;


                 Fig. 2. Algorithm DiagnosticTestsScalingAndInferring


Definition 17. An i-test for P± is called irredundant and denoted by hirr iff
there is no smaller i-test h @ hirr for P± .
    Notions of maximally redundant and good tests on FDs are proposed by
X.A. Naidenova and Y. Polegaeva in 1985 [28] and adopted to implications in
[30].
Definition 18. An i-test for P± is maximally redundant (or closed) i-test and
                     ⬙⬘
denoted by hmax iff hmax = hmax .
Definition 19. An i-test for P± is called good and denoted by hgood , iff there
                                                          ⬘
is no object g ∈ G+ , g 6∈ hgood ⬙ such that (hgood ⬙ ∪ g) is an i-test.
   An irredundant, maximally redundant or good i-test can be named with
prefix “ASC-” if it is represented as ASC, i.e. a pair (h⬙ , h). Maximally redundant
ASC-test is named as closed ASC-test for short.

4        Relations
4.1      Venn diagram
Let us illustrate several kinds of ASC-tests proposed in the previous section
using the Venn diagram in Fig. 3. Definitions of predicates that form following
sets are omitted for simplicity, for P± (P∓ ):
                                                ⬙
 – U is the set of ASC-tests {(h , h)| h is an i-test};
                                       ⬙
 – X is the set of good ASC-tests {(h , h)| h is a good i-test};
                                              ⬙
 – Y is the set of irredundant ASC-tests {(h , h)| h is an irredundant i-test};
                                        ⬙                               ⬙⬘
 – Z is the set of closed ASC-tests {(h , h)| h is an i-test, and h = h    }.

    Each set named as bold italic number is a set of ASC-tests, which can be
inferred using intersection (or/and negation, subtraction) of sets U, X, Y , and
Z. Each number-named set is bounded by lines in Fig. 3. Let us show, that the
set 6 , i.e., the set (Y ∩ Z) \ X, is ∅.

Proposition 2. Any irredundant and closed ASC-test is good.

Proof. Consider the opposite situation. Let this (A, h) exists and suppose it is
in the set (Y ∩ Z) \ X 6= ∅. Since it is not a good i-test, ∃g such that g ∈ / h⬙ ,
g ∈ G+ and (h ∪ g) is an i-test. This means that h ⊂ (h ∪ g) and using
                ⬙      ⬘                                  ⬙       ⬙


(9) , we get (h⬙ ∪ g)⬘ v h⬙ ⬘ . The situation satisfies property 1 from Corollary
1 and strict inclusion take place otherwise (h⬙ ∪ g)⬘ is not an i-test. Note that
h = h⬙ ⬘ , because h is a closed set. It means that an i-test (h⬙ ∪ g)⬘ @ h exists.
However h is irredundant, which means that (h⬙ ∪g)⬘ strictly included in h does
not exist.                                                                       t
                                                                                 u


                     Z — closed ASC-tests
                                                            X — good ASC-tests




                                                    4
   U — all                                  3               1
                                                                         7 — at the same time good,
   semiconcepts                                     7                    irredundant and closed ASC-
   (h⬙ , h), where                              6       5                tests
   h is an i-test
                                                    2


                                    8



                 8 — at the same time neither good,
                 nor irredundant,                                   Y — irredundant
                 nor closed ASC-tests                               ASC-tests



                            Fig. 3. Venn diagram of ASC-test types.



Example 1. Let us illustrate the kinds of ASC-tests by the toy example based on
Tab. 1 from [36]. Let object descriptions be a conjunction of object properties.
Let hi be an ASC-test, where i = 1, 8 are indices of sets from the Venn diagram
mentioned above:
 – h1 = (g2 , (m1 = 1 ∧ m2 = 2 ∧ m3 = 0)).
 – h2 = (g1 , (m2 = 1 ∧ m4 = 0)).
 – h3 = (g1 , δ(g1 )).
 – h4 = ({g1 , g3 }, (m1 = 0 ∧ m2 = 1)).
 – h5 = ({g1 , g3 }, (m1 = 0)).
 – @h6 according to Prop.2.
 – h7 = ∅, because it is not represented in the table.
 – h8 = (g1 , (m2 = 1 ∧ m3 = 1 ∧ m4 = 0)).



                            Table 1. Learning structure


                                G m1 m2 m3 m4 K
                                g1 0 1 1 0 1
                                g2 1 2 0 1 1
                                g3 0 1 0 1 1
                                g4 1 2 1 0 2
                                g5 1 1 0 1 2
                                g6 1 1 1 2 2




    To illustrate an ASC-test from set 7 in Fig. 3, let us use data from Tab. 2.
It is (g1 , m1 = 1 ∧ m2 = 0).


      Table 2. Learning structure to illustrate an ASC-test from set 7 in Fig. 3


                                    G m1 m2 Cl
                                    g1 1 0 1
                                    g2 0 1 2




4.2   Closed Good I-tests and JSM-hypotheses
The next Proposition reveals the following idea: if there exists a concept with the
minimal intent, which is an i-test, then its extent is not included in any extent
of other closed i-test, and vice versa. The proof can be divided into two parts.

Proposition 3. The definitions of minimal JSM-hypotheses and closed good i-
tests are equivalent.
Proof. On the one side, for each closed good i-test being the intent of (A, h)
@g ∈ G+ , g * h⬙ such, that (h⬙ ∪g)⬘ is an i-test. According to Proposition 2 this
means that @g such that i-test (h⬙ ∪ g)⬘ @ h. Take into account that (h⬙ ∪ g)⬘
is obligatory an intent, e.g. ((h⬙ ∪ g)⬘ ⬙ , (h⬙ ∪ g)⬘ ). Closed i-test h, could be
generated only by intersection of h⬙ with object descriptions not included in
this extent. Thus h is an extent, and h is a minimally possible intent.
    On the other side, for each minimal JSM-hypotheses being the intent of
(h⬙ , h) does not exist d @ h such that d is a closed i-test. Suppose, that h is
not a closed good i-test, i.e. ∃g ∈ G+ , g 6⊆ h⬙ such that (h⬙ ∪ g)⬘ is an i-test,
and (h⬙ ∪ g)⬙ @ h according to Proposition 2. The contradiction finishes the
proof.                                                                            t
                                                                                  u

4.3     Discussion about other relations
Let us recall some results about relation between representations of VS and
minimal JSM-hypotheses (if there exists only one maximal representation of
VS) [18]:

 – the minimal JSM-hypothesis forms the maximal boundary of VS;
 – the set of all proper predictors for the target class forms the minimal bound-
   ary of VS.

    As a result of transitivity, a closed ASC-test is also a maximal representation
of VS. However we claim, that a proper predictor for target class is not a minimal
representation of VS w.r.t. implicit representation of order vVS ⬙ discussed in
                                                                   max
subsection 3.3 instead of v from [18]. Let us show the difference in the following
small example with the use of Tab. 3.


         Table 3. Learning structure for counter-example modified from Tab. 1


                                 G m1 m2 m3 m4 K
                                 g1 0 1 1 0 1
                                 g3 0 1 0 1 1
                                 g5 1 1 0 1 2
                                 g6 1 1 1 2 2




      There are the following classifiers inferred from Tab. 3:
 – proper predictors for P± are m1 = 0 and m4 = 0;
 – VSmax is m1 = 0 ∧ m2 = 1;
 – VSmin is only m1 = 0.
   It is clear that elements of VSmin are irredundant good i-tests. Let us prepare
supplementary Proposition 4.
Proposition 4. Given a good i-test h ∈ D, if d @ h is an i-test, then d is a
good i-test.

Proof. Let d @ h be an i-test for P+ , then h⬙ ⊆ d⬙ according to equation (10).
If h⬙ ⊂ d⬙ , then according to property 1 of Corollary 2 there exists the object
g such that d v g ⬘ , and there exists c v h, d 6v c, c 6v g ⬘ . Such object can exist
only in P− and falsify h, otherwise there is a contradiction to the condition of
this Proposition.
    If h⬙ = d⬙ , then according to property 2 of Corollary 2 and to the definition
of closed good i-test, there does not exists the object discussed above. Thus d is
an i-test with the same extent as a good i-test h, i.e. also a good i-test.         t
                                                                                    u

Proposition 5. If a version space is given and there exists only one VSmax ,
then there can exist several elements of VSmin that are irredundant good i-tests.

Proof. Minimal JSM-hypotheses are VSmax (Theorem 1, [18]). Closed good i-
tests are minimal JSM-hypotheses (proved in Proposition 3). Thus by transitivity
closed i-test is VSmax . In Proposition 4 it is shown that all i-test subsets of VSmax
are exactly good i-tests (or they are not i-tests). Thus these i-tests satisfy the
condition vVSmax ⬙ . Let us choose within these i-tests those, that satisfy the
condition of VSmin . They are exactly irredundant good i-tests.                      t
                                                                                     u

   Let us note, that one of the last properties of Theorem 1, saying “that version
space is represented by a convex set in the lattice of all pattern intents with
maximal element minimal JSM-hyposesis” should be improved.

Proposition 6. If the version space VS has only one VSmax , then elements from
VSmax , VSmin , and VSmid form join-semilattice of i-tests w.r.t. w with maximal
element VSmax .

Proof. Note, that VSmax due to equivalence with closed good i-test can exist
only in the set 4 ∪ 7 of Fig. 3. VSmin can exist only in the set 7 ∪ 5 according
to Proposition 5. If there is the set VSmid , then by the definition it consists of
elements from 1 = Z \ {4 ∪ 5 ∪ 7 }.
     Because of the fact that VSmid and VSmin refers to one and only one VSmax ,
all these i-tests are in equivalence relation under the object covering. This means,
                                   def
that for any two h, h1 from VS = VSmax ∪ VSmin ∪ VSmid , h⬙ is equal to h1 ⬙ ,
and the natural order is vVSmax ⬙ . Then the join-semilattice (VS, vVSmax ⬙ ) is
obtained.                                                                         t
                                                                                  u


5    Conclusion
The equivalence between good maximally redundant tests (GMRTs) and minimal
JSM-hyposethes and between minimal representations of version spaces and ir-
redundant tests, that are included in GMRTs, are shown. Extracting functional
dependencies from pattern structures is an important question for the future
work.
References

 1. Baixeries, J., Kaytoue, M., Napoli, A.: Computing functional dependencies with
    pattern structures. In: Szathmary, L., Priss, U. (eds.) Proceedings of The Ninth In-
    ternational Conference on Concept Lattices and Their Applications. CEUR Work-
    shop Proceedings, vol. 972, pp. 175–186. CEUR-WS.org (2012)
 2. Barbut, M., Monjardet, B.: Ordre et classification: algèbre et combinatoire. Col-
    lection Hachette université: Méthodes mathématiques des sciences de l’homme,
    Hachette (1970)
 3. Baskakova, L., Zhuravlev, Y.: A model of recognition algorithms with representa-
    tive samples and systems of supporting sets. U.S.S.R. Comput. Math. Math. Phys.
    21(5), 189–199 (1981)
 4. Brézellec, P., Soldano, H.: Tabata: A learning algorithm performing a bidirectional
    search in a reduced search space using a tabu strategy. In: Proceedings of the 13th
    European Conference on Artificial Intelligence. pp. 420–424 (1998)
 5. Brito, P.: Order structure of symbolic assertion objects. I.E.E.E. Trans. Knowl.
    Data Eng. 6(5), 830–835 (1994)
 6. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: A new approach to classification by
    means of jumping emerging patterns. In: CEUR Workshop Proceedings. vol. 939,
    pp. 15–22 (2012)
 7. Chegis, I., Yablonskii, S.: Logical methods of electric circuit control. Trudy Mian
    SSSR 51, 270–360 (1958), (in Russian)
 8. Cohen, W.W., Hirsh, H.: The learnability of description logics with equality con-
    straints. Machine Learning 17(2), 169–199 (1994)
 9. Cosmadakis, S.S., Kanellakis, P.C., Spyratos, N.: Partition semantics for relations.
    Journal of Computer and System Sciences 33(2), 203 – 233 (1986)
10. Dmitriev, A., Zhuravlev, Y.I., Krendelev, F.: On mathematical principles for clas-
    sification of objects and phenomena. Discrete Analysis 7, 3–17 (1966), in Russian
11. Domenach, F., Leclerc, B.: On the roles of galois connections in classification. In:
    Exploratory Data Analysis in Empirical Research, pp. 31–40. Studies in Classifi-
    cation, Data Analysis, and Knowledge Organization, Springer (2003)
12. Ferré, S., Ridoux, O.: Introduction to logical information systems. Information
    Processing & Management 40(3), 383 – 419 (2004)
13. Finn, V.: On machine-oriented formalization of plausible reasoning in the style of
    F.Backon–J.S. Mill. Semiotika i Informatika 20, 35–101 (1983), (in Russian)
14. Ganascia, J.G.: Tdis: an algebraic formalization. In: Proceedings of the 13th Inter-
    national Joint Conference on Artificial Intelligence. vol. 2, pp. 1008–1015 (1993)
15. Ganter, B., Wille, R.: Formal concept analysis: mathematical foundations.
    Springer, Berlin (1999)
16. Ganter, B., Kuznetsov, S.O.: Formalizing hypotheses with concepts. In: Conceptual
    Structures: Logical, Linguistic, and Computational Issues: Proceedings of the 8th
    International Conference on Conceptual Structures. pp. 342–356 (2000)
17. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delu-
    gach, H.S., Stumme, G. (eds.) Conceptual Structures: Broadening the Base: Pro-
    ceedings of the 9th International Conference on Conceptual Structures. pp. 129–142
    (2001)
18. Ganter, B., Kuznetsov, S.: Hypotheses and version spaces. In: Conceptual Struc-
    tures for Knowledge Creation and Communication, Lecture Notes in Computer
    Science, vol. 2746, pp. 83–95 (2003)
19. Kuznetsov, S.: Galois connections in data analysis: Contributions from the soviet
    era and modern russian research. In: Ganter, B., Stumme, G., Wille, R. (eds.)
    Formal Concept Analysis, LNCS, vol. 3626, pp. 196–225. Springer (2005)
20. Kuznetsov, S.: Interpretation on Graphs and Complexity Characteristics of a
    Search for Specific Patterns. Nauchno-Tekhnicheskaya Informatsiya, Seriya 2 In-
    formatsionnye protsessy i sistemy 23(1), 23–27 (1989)
21. Kuznetsov, S.: JSM-method as a Machine Learning System. Itogi Nauki i Tekhniki,
    ser. Informatika 15, 17–50 (1991), (in Russian)
22. Kuznetsov, S.: Mathematical aspects of concept analysis. Journal of Mathematical
    Sciences 80(2), 1654–1698 (1996)
23. Luksch, P., Wille, R.: A Mathematical Model for Conceptual Knowledge Systems.
    In: Bock, H.H., Ihm, P. (eds.) Proceedings of the 14th Annual Conference of the
    Gesellschaft fur Klassifikation (GfKl 1990). pp. 156–162 (1991)
24. Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edn.
    (1997)
25. Mitchell, T.M.: Version spaces: A candidate elimination approach to rule learning.
    In: Proceedings of the 5th International Joint Conference on Artificial Intelligence
    - Volume 1. pp. 305–310. IJCAI’77, Morgan Kaufmann Publishers Inc., San Fran-
    cisco, CA, USA (1977)
26. Mitchell, T.M.: Version Spaces: An Approach to Concept Learning. Tech. rep.,
    Standford University California (1978)
27. Naidenova, K.: The relational model of experimental data analysis. Trans, of Ac.
    Sci. USSR, series’ Technical Cybernetics’,(4, 1982) pp. 103–119 (1982)
28. Naidenova, X.A., Polegaeva, J.G.: Model of human reasoning for deciphering for-
    est’s images and its implementation on computer. In: Theses of papers and reports
    of school-seminar "Semiotic aspects of the intellectual activity formalization". Ku-
    taisy, Georgia Soviet Socialist Republic (1985), (in Russian)
29. Naidenova, X.A., Polegaeva, J.G.: An Algorithm of Finding the Best Diagnostic
    Tests. In: Mintz, G., Lorents, E. (eds.) The 4-th All Union Conference "Application
    of Mathematical Logic Methods". pp. 87 – 92 (1986), (in Russian)
30. Naidenova, X.: Machine Learning as a diagnostic task. In: Arefiev, I. (ed.)
    Knowledge-Dialog-Solution, Materials of the Short-Term Scientific Seminar, pp. 26
    – 36. State North-West Technical University Press, St Petersburg, Russia (1992),
    in Russian
31. Naidenova, X.: The Data-Knowlege Transformation. In: Solovyev, V. (ed.) Text
    Processing and Cognitive Technologies, vol. 3, pp. 130–151. Pushchino (1999)
32. Naidenova, X.: Good classification tests as formal concepts. In: Domenach, F.,
    Ignatov, D., Poelmans, J. (eds.) Proceedings of the 10th International Conference
    on Formal Concept Analysis. vol. 7278, pp. 211–226 (2012)
33. Naidenova, X.: Machine Learning Methods for Commonsense Reasoning Processes:
    Interactive Models (2010)
34. Nguifo, E.M., Njiwoua, P.: Iglue: A lattice-based constructive induction system.
    Intelligent data analysis 5(1), 73–91 (2001)
35. Norris, E.M.: Maximal rectangular relations. In: Karpiński, M. (ed.) Fundamentals
    of Computation Theory: Proceedings of the 1977 International FCT-Conference,
    Poznań-Kórnik, Poland September 19–23, 1977, vol. 56, pp. 476–481. Springer
    Berlin Heidelberg, Berlin, Heidelberg (1977)
36. Peskov, N.V.: Searching for informative fragments of object descriptions in the
    recognition tasks. Ph.D. thesis (2004)
37. Poelmans, J., Ignatov, D.I., Kuznetsov, S.O., Dedene, G.: Formal concept analysis
    in knowledge processing: A survey on applications. Expert Systems with Applica-
    tions 40(16), 6538 – 6560 (2013)
38. Poelmans, J., Ignatov, D.I., Kuznetsov, S.O., Dedene, G.: Fuzzy and rough formal
    concept analysis: a survey. International Journal of General Systems 43(2), 105–134
    (2014)
39. Poelmans, J., Kuznetsov, S.O., Ignatov, D.I., Dedene, G.: Formal concept analysis
    in knowledge processing: A survey on models and techniques. Expert Systems with
    Applications 40(16), 6601 – 6623 (2013)
40. Wille, R.: Restructuring lattice theory: An approach based on hierarchies of con-
    cepts. Lecture Notes in Computer Science 5548, 314–339 (2009)
41. Zakrevskiy, A.: Algorithms of discrete automata synthesis. Moscow: Nauka (1971),
    in Russian
42. Zhang, X., Dong, G., Ramamohanarao, K.: Information-Based Classification by
    Aggregating Emerging Patterns. In: Leung, K., Chan, L.W., Meng, H. (eds.) Intell.
    Data Eng. Autom. Learn. – IDEAL 2000. Data Mining, Financ. Eng. Intell. Agents,
    LNCS, vol. 1983, pp. 48–53. Springer (2000)
43. Zhuravlev, Y., Ryazanov, V., Senko, O.: Recognition. Mathematical methods. Soft-
    ware. Practical Applications. Moscow: Fazis (2006), in Russian
44. Zhuravlev, Y.I., Nikiforov, V.V.: Recognition algorithms based on computation of
    estimates. Cybernetics 7(3), 387–400 (1971)
45. Zhuravlev, Y., Ryazanov, V., Senko, O., Biryukov, A., Vetrov, D., Dokukin, A.,
    Kropotov, D.: The program system for intellectual data analysis, recognition and
    forecasting. WSEAS Transactions on Information Science and Applications 2(1),
    55–58 (2005)