Classification by Selecting Plausible Formal
           Concepts in a Concept Lattice

                      Madori Ikeda and Akihito Yamamoto

     Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan
        m.ikeda@iip.ist.i.kyoto-u.ac.jp, akihiro@i.kyoto-u.ac.jp


      Abstract. We propose a classification method using a concept lattice,
      and apply it to thesaurus extension. In natural language processing, solv-
      ing a practical task by extending many thesauri with a corpus is time-
      consuming. The task can be represented as classifying a set of test data
      for each of many sets of training data. The method enables us to decrease
      the time-cost by avoiding feature selection, which is generally performed
      for each pair of a set of test data and a set of training data. More pre-
      cisely, a concept lattice is generated from only a set of test data, and then
      each formal concept is given a score by using a set of training data. The
      score represents plausibleness as neighbors of an unknown object, and
      the unknown object classified into classes to which its neighbors belong.
      Therefore, once we make the lattice, we can classify test data for each set
      of training data by only scoring, which has a small computational cost.
      By experiments using practical thesauri and corpora, we show that our
      method classifies more accurately than k-nearest neighbor algorithm.

      Keywords: Formal Concept, Concept Lattice, Classification, Thesaurus
      Extension


1   Introduction
In this paper, we propose a method for classifying data that generates a concept
lattice and selects appropriate formal concepts in the lattice. The method enables
us to avoid feature selection superﬁcially in classiﬁcation by securing storage
enough to maintain both the selected concepts and redundant concepts. This
contributes to saving time in solving practical problems concerning with a great
variety of large data, e.g. thesaurus extension.
    Classiﬁcation can be divided into feature selection and classiﬁcation with the
selected features. Selecting features, which aﬀects classiﬁcation results, is very
important, and many methods have been proposed for it [6, 14]. Generally, the
selection is executed for a pair of a set of test data and a set of training data.
Moreover, the selection is time-consuming when the size of these data is large
or many noise are contained in them, as well as when test data are assumed
to be classiﬁed into multi-classes. Therefore, classifying raw and large test data
for each of many sets of training data can be costly from a computational point
of view. Our method can overcome the problem. The method generates a con-
cept lattice from a set of test data in advance by following the mathematical
    Classification by Selecting Plausible Formal Concepts in a Concept Lattice    23

deﬁnitions naı̈vely. When a set of training data is given, the method gives each
formal concept a score by simple calculation using the training data. The score
represents plausibleness of the concept as a set of neighbors of an unknown ob-
ject to be classiﬁed. Then the method selects some of the concepts based on
the score and ﬁnds the neighbors. Each unknown object is classiﬁed into classes
to which its neighbors belong. The method is thus faster because it uses the
same concept lattice for every set of training data without feature selection. In
addition, we can easily maintain a concept lattice updated with novel test data
using well known methods [4, 17, 19]. Storing such a lattice can be costly, since
the lattice must store both concepts that end up being selected or not. We claim
this disadvantage can be mitigated by the low cost of memory storage.
    We apply our method to the problem of thesaurus extension. Thesauri are
semantic dictionaries of terms, and many kinds of thesauri are available now. In
almost of all thesauri, each terms have several semantic deﬁnitions, and the def-
initions of terms often vary from a thesaurus to another thesaurus. Corpora are
also linguistic resources of another type that consist of sentences in a natural lan-
guage, and recently some of them contain huge amount of sentences with many
kinds of characteristics generated and attached to them by applying parsers and
syntactic analyzers. Thesaurus extension can be regarded as classiﬁcation and
has been researched in the area of NLP (natural language processing) [1, 9, 18].
As classiﬁcation, a thesaurus is a set of training data, and a corpus is a set of
test data. Many proposed methods calculate similarities among terms by using
features of them that are selected from the characteristics contained in a corpus.
Then an unregistered term is put into the original thesaurus properly by ﬁnding
registered terms similar to the unregistered term. This is a practical way of the
extension because it is so easy to acquire many syntactic characteristics of terms
for classifying unregistered terms semantically. However, many thesauri to be
extended exist, and the selected features for a thesaurus might not be useful for
another thesaurus. Moreover, these linguistic resources are often updated. These
methods do not take the practical problems in NLP into account. Our method
does not depend on feature selection, but generates a concept lattice without
training data, and is robust to update. We apply thesaurus extension to two
thesauri and two corpora that are freely available.
    This paper is organized as follows. First, we introduce the deﬁnitions of formal
concepts and concept lattices in the next section. In Section 3, we explain our
classiﬁcation method based on the deﬁnitions, and we compare the method with
related works in Section 4. In Section 5, we deﬁne thesaurus extension in terms
of classiﬁcation, and show our experimental results. Conclusions are placed in
Section 6.


2     Formal Concepts and Concept Lattices

We introduce the deﬁnitions of formal concepts and concept lattices according
to [5, 7] with a running example.
24      Madori Ikeda, Akihito Yamamoto

                  Table 1. An example context K0 = (G0 , M0 , I0 )

                               m1 m2 m3 m4 m5 m6 m7
                            g1 × ×
                            g2 × ×      ×
                            g3 × ×      ×
                            g4    ×     ×     ×
                            g5    ×        × ×
                            g6    ×        × ×
                            g7       ×     ×     ×


Definition 1 ([7]) Let G and M be mutually disjoint ﬁnite sets, and I ⊆ G×M .
Each element of G is called an object, and each element of M is called an attribute,
and (g, m) ∈ I is read as the object g has the attribute m. A triplet (G, M, I) is
called a formal context (context for short).

Example 1 We introduce a context K0 = (G0 , M0 , I0 ) as a running example
where G0 = {g1 , g2 , ..., g7 }, M0 = {m1 , m2 , ..., m7 }, and every element of I0 is
represented with a cross in Table 1. For example, the object g4 has the attributes
m2 , m4 , and m6 .


Definition 2 ([7]) For a context (G, M, I), subsets A ⊆ G and B ⊆ M , we de-
ﬁne AI = { m ∈ M | ∀g ∈ A, (g, m) ∈ I }, B I = { g ∈ G | ∀m ∈ B, (g, m) ∈ I }.
A formal concept of the context is a pair (A, B) where AI = B and A = B I .

Definition 3 ([7]) For a formal concept c = (A, B), A and B are called the
extent and the intent respectively, and we let Ex(c) = A and In(c) = B. For
arbitrary formal concepts c and c′ , there is an order c ≤ c′ iﬀ Ex(c) ⊆ Ex(c′ ) (or
equally In(c) ⊇ In(c′ )).

Definition 4 ([7]) The set of all formal concepts of a context K = (G, M, I)
with the order ≤ is denoted by B(G, M, I) (for short, B(K)) and is called the
concept lattice of K. For a concept lattice, the least concept is called the bottom
and is denoted by ⊥, and the greatest concept is called the top and is denoted
by ⊤.

Example 2 The concept lattice B(K0 ) of the context K0 = (G0 , M0 , I0 ) given
in Table 1 is shown in Figure 1. Each circle represents a formal concept c ∈
B(K0 ) with Ex(c) and In(c) on its side. The gray concept and numbers called
scores beside concepts are explained in Section 3. In the ﬁgure, each edge repre-
sents an order ≤ between two concepts, and the greater concept is drawn above,
and transitional orders are omitted. In the lattice, the concept c1 is the top and
the concept c11 is the bottom.
    Classification by Selecting Plausible Formal Concepts in a Concept Lattice                 25

                                     g1 g2 g3 g4 g5 g6 g7 c1 0.187...


                                    m2                                   m5
                      g1 g2 g3 g4 g5 g6 c2 0.181...                     g5 g6 g7 c3 0.472...


               m1m2              m2m4           m2m6
              g1 g2 g3 c4 0.15 g2 g3 g4 c5 0.2 g4 g5 g c6 0.666...
                                                      6


               m1m2m4        m2m4m6                    m2m5m6               m3m5m7
                       c
                 g2 g3 7 0.2     g4 c8 0                  g5 g6 c9 0.666...   g7 c10 1


                              m1 m2 m3 m4 m5 m6 m7 c 0
                                                    11


                       Fig. 1. A concept lattice B(K0 ) with scores


Definition 5 ([7]) For a concept lattice B(G, M, I), the formal concept γg =
      II      I
({ g } , { g } ) of an object g ∈ G is called the object concept.

Definition 6 ([7]) For every formal concept c ∈ B(K), the subset of formal
concepts { c′ ∈ B(K) | c′ ≥ c } is denoted by ↑ c and called the principal ﬁlter
generated by c.

Definition 7 ([5]) Let S be an ordered set and let x, y ∈ S. We say x is covered
by y if x < y and x ≤ z < y implies x = z.

Definition 8 For a concept lattice B(K), a path is a string of formal concepts
c0 , c1 , ..., cn satisfying that ci is covered by ci+1 for every i ∈ [0, n − 1], and its
length is n.

Example 3 For the concept lattice B(K0 ) shown in Figure 1, γg4 = c8 and
↑ γg4 = { c1 , c2 , c5 , c6 , c8 }. Length of the longest path from ⊥ to ⊤ is four, and
length of the shortest path from ⊥ to ⊤ is three.

    In addition, several algorithms have been proposed [4, 17, 19] in order to
update a concept lattice B(K) when a new object or a new attribute is added
to a context K = (G, M, I), e.g. K turns into (G′ , M ′ , I ′ ) where G′ ⊃ G, M ′ ⊃
M , and I ′ ⊃ I. Thus we can easily modify a concept lattice by using these
algorithms.


3     Classification with a Concept Lattice
We illustrate our classiﬁcation method after formalizing classiﬁcation problems.
26      Madori Ikeda, Akihito Yamamoto

                        Table 2. An example training set τ0 = (T0 , L0 )

          g ∈ T0     g1            g2               g3               g5             g6            g7
          L0 (g) { l1 , l2 } { l2 , l3 , l4 } { l4 , l5 , l6 } { l1 , l6 , l7 } { l6 , l7 } { l1 , l7 , l8 }


Definition 9 We let L be a ﬁnite set that is disjoint with both G and M , and
every element of L is called a label. We assume a function L∗ : G → 2L as a
target classiﬁcation rule, and L∗ (g) is called the set of labels of g.

Each label l ∈ L represents a class. Note that, for every object g ∈ G, the value
of L∗ (g) might not be a singleton and might share some labels with the value of
L∗ (g ′ ) of another object g ′ ∈ G, i.e. L∗ (g) ∩ L∗ (g ′ ) ̸= ∅. Therefore this is a multi-
class classiﬁcation problem and is regarded as an extension of the binary-class
classiﬁcation problems [11–13].

Definition 10 For a subset T ⊆ G and a function L : T → 2L satisfying that
L(g) = L∗ (g) if g ∈ T , a pair (T, L) is called a training set. For a training set
(T, L), every object g ∈ G is called an unknown object if g ̸∈ T , otherwise it is
called a known object.

Example 4 A training set τ0 = (T0 , L0 ) where T0 = { g1 , g2 , g3 , g5 , g6 , g7 } and
L0 : T0 → 2{ l1 ,l2 ,...,l8 } is shown in Table 2. The object g4 is excluded from G0
of the context K0 = (G0 , M0 , I0 ) given in Example 1 and is unknown for τ0 .

Classiﬁcation problems can be deﬁned as obtaining a function L̂ : G → 2L , and
a classiﬁcation is successful when a function L̂ such that ∀g ∈ G. L̂(g) = L∗ (g)
is obtained from a given training set (T, L).
    Our method is designed for classifying only test data that can be expressed
as a context (G, M, I) and consists of the following steps.

 1. constructing the concept lattice B(K) of a context K = (G, M, I),
 2. calculating scores of formal concepts using a given training set τ = (T, L),
 3. ﬁnding the set of neighbors for each unknown object u ∈ G \ T based on the
    scores, and
 4. deciding a function L̂ by referring L(g) for every known object g ∈ T .

    The ﬁrst step is achieved by simply following the deﬁnitions of formal con-
cepts.
    In order to ﬁnd the neighbors, formal concepts are given scores, and some
of the scored concepts are extracted based on their score. The score of every
formal concept is a real number calculated with a training set, and it changes
for another training set.

Definition 11 For every formal concept c ∈ B(K) and a training set τ = (T, L),
we deﬁne Ex(c, τ ) = Ex(c) ∩ T .
  Classification by Selecting Plausible Formal Concepts in a Concept Lattice        27

Definition 12 For every formal concept c ∈ B(K) and a training set τ = (T, L),
we deﬁne σ(c, τ ) as a real number in [0, 1] and call it the score of the concept c
under the training set τ . The value σ(c, τ ) is calculated as follows:
              
              
               0                                          if |Ex(c, τ )| = 0,
              
              
              
                                                          if |Ex(c, τ )| = 1, and
              1∑|Ex(c,τ )|−1 ∑|Ex(c,τ )|
   σ(c, τ ) =                             sim(g ,
                                               i jg )
              
              
                  i=1
                        (        j=i+1
                                       )                   otherwise,
              
                          |Ex(c, τ )|
              
              
              
                               2
                            |L(gi ) ∩ L(gj )|
    where sim(gi , gj ) =                     .
                            |L(gi ) ∪ L(gj )|
The function sim calculates similarity between know objects gi and gj , and the
function σ calculates the average of similarities among objects in Ex(c, τ ). The
purpose of deﬁning the score σ(c, τ ) is to estimate similarity among all objects
in Ex(c) that includes not only known objects but also unknown objects.
   After scoring formal concepts, the set of neighbors is found for each unknown
object based on the scores. The neighbors are known objects extracted from the
extent of plausible concepts.
Definition 13 For every object g ∈ G in a concept lattice B(G, M, I) and a
training set τ = (T, L), a formal concept c ∈ ↑ γg is called plausible w.r.t. g and
τ if σ(c, τ ) ≥ σ(c′ , τ ) for any other concept c′ ∈ ↑ γg and |Ex(c)| ≥ |Ex(c′′ )| for
any other concept c′′ ∈ ↑ γg such that σ(c, τ ) = σ(c′′ , τ ). The set of plausible
formal concepts w.r.t. g and τ is denoted by p(g, τ ) ⊆ ↑ γg.
We intend the score of a formal concept c to represent similarities among objects
in Ex(c). We therefore deﬁne objects in Ex(c) ∋ u of the concept c that has
the highest score as neighbors of an unknown object u. However, it sometimes
happens that some formal concepts have the highest score at the same time. In
this case, we deﬁne a concept c consisting of the largest size Ex(c) as a plausible
concept. This is based on our policy that, among formal concepts that have the
same score, the larger the size of Ex(c) is, the less the concept c has noises in
Ex(c).
Definition 14 For every unknown object u ∈ G \ T under a concept lattice
B(G, M, I) and a training set τ = (T, L), a set N(u, τ ) of neighbors is deﬁned as
                                       ∪
                          N(u, τ ) =         Ex(c, τ ).
                                            c∈p(u,τ )

   At the last step, a function L̂ is constructed as
               {∪
                                   ′
                   g ′ ∈N(g,τ ) L(g )      if g is unknown for τ = (T, L),
       L̂(g) =
                 L(g)                      otherwise.
In this paper, we employed this simple deﬁnition although it could be deﬁned
by many ways.
28      Madori Ikeda, Akihito Yamamoto

Example 5 Suppose that we obtained the context K0 = (G0 , M0 , I0 ) given in
Example 1 and constructed the concept lattice B(K0 ) as shown in Figure 1
at the ﬁrst step. Then, suppose that the training set τ0 = (T0 , L0 ) shown in
Example 4 was given. The score σ(c, τ0 ) of every formal concept c ∈ B(K0 )
under the training set τ0 can be calculated at the second step and is shown as
the number beside each formal concept c in Figure 1. Plausible formal concepts
of the unknown object g4 decided as the third step are represented as gray
and bold circles in Figure 1. There is only one plausible concept c6 , and thus
N(g4 , τ0 ) = { g5 , g6 }. Finally, we can obtain a function Lˆ0 at the last step, and
Lˆ0 (g4 ) = { l1 , l6 , l7 }.


4    Comparison with Related Works

We concern a task classifying objects in a context for each of many training
sets, and we assume that training sets have multi-classes, and that an object
might be classiﬁed into several classes. This is a practical task in NLP (natural
language processing) that is extending many thesauri by using a corpus. Clas-
siﬁcation results required by every pair of the context and a training set must
be diﬀerent each other. Classiﬁcation thus needs to be executed for each of the
pairs, and the greater the number of the pairs is, the more solving the task is
time-consuming. Our research is motivated to save the time required for the task,
and our classiﬁcation method can overcome the problem. The proposed method
constructs a concept lattice from a context in advance, and then the method
classiﬁes unknown objects of a training set given later. The same concept lattice
is used repeatedly in order to classify unknown objects of each training set, and
each classiﬁcation is performed by scoring formal concepts and ﬁnding neighbors
of unknown objects. This is based on an idea that as many processes requiring
no training sets as possible should be executed before training sets are given.
Because of this idea, our method is diﬀerent from some researches.
    Learning models based on a concept lattice are proposed in [11–13]. In these
researches, a hypothesis (a set of attributes) is constructed from positive and neg-
ative examples (objects) by using a formal concept lattice, and it is determined
whether an unknown object is positive or not when the hypothesis is acquired.
This is a binary-class classiﬁcation problem, but unknown objects sometimes
are not classiﬁed when hypotheses are not constructed appropriately in these
approaches. Our method certainly classiﬁes unknown objects into malt-classes
by scoring formal concepts. In order to classifying data, some approaches gen-
erate decision trees from concept lattices [2, 12]. The decision tree is generated
for a pair of a context and a training set. Our method however is not intended
to generate decision trees because manipulating a concept lattice that is already
constructed is not preferable in order to reduce the time for the task we concern.
    As classiﬁcation for a pair of a context and a training set, our method is
similar to k-NN (k-nearest neighbor algorithm) [3] on the point that they ﬁnd
and refer neighbors of an unknown object in order to classify it. However, our
method often decreases the number of the neighbors and does not need to be
  Classification by Selecting Plausible Formal Concepts in a Concept Lattice                   29


                          c 0.187...   D(g4 , gi )<
                                                  _6
      g1 g2 g3 g4 g5 g6 g7 1                                                            Ex( c1 )

                                                                                        g
                                                                                         7
               m2     0.181...                                         Ex(c6 )
 g1 g2 g3 g4 g5 g6 c2                  D(g4 , gi )<
                                                  _3
                                                                              g5
                                                                                        Ex(c2)
                                                                  g3               g6
                                                                  g2       g4
  m2m4     0.2 m m        0.666...
        c         2 6
g2 g3 g4 5    g4 g5 g6 c6          D(g4 , gi )<
                                              _2                       Ex(c5)

                                                                                g1

        m2m4m6   0
               c
            g4 8                       D(g4 , gi )<
                                                  _0


                    a.                                                     b.

       Fig. 2. Formal concepts ↑ γg4 in the concept lattice B(K0 ) with distance


given the number. In this section, we illustrate such diﬀerences between the two
methods with an example, and we describe that the diﬀerences cause our method
to classify more accurately.
    In the illustration, we use the symmetric diﬀerence for measuring dissimilarity
between two objects.
Definition 15 For two objects g, g ′ ∈ G of a context (G, M, I), we deﬁne a
distance D(g, g ′ ) between g and g ′ as

                    D(g, g ′ ) = | { g } ∪ { g ′ } | − | { g } ∩ { g ′ } |.
                                       I         I           I         I


This distance is also known as the Hamming distance between bit vectors in
information theory. Figure 2.a shows a part of the concept lattice B(G0 , M0 , I0 )
that is a set of formal concepts ↑ γg4 with the order ≤, and Figure 2.b shows
a space that every object g ∈ G0 is placed according to the distance D(g4 , g)
from the unknown object g4 . From these ﬁgures, we observe that each formal
concept c ∈↑ γg, as the extent Ex(c), represents a set of objects located within a
certain distance from an object g. It is also found that the greater a concept is,
the more the concept includes many dissimilar objects, i.e. for every g ∈ G and
c′ , c′′ ∈↑ γg, max({ D(g, g ′ ) | g ′ ∈ Ex(c′ ) }) ≤ max({ D(g, g ′′ ) | g ′′ ∈ Ex(c′′ ) }) if
c′ ≤ c′′ .
      Suppose that we have the context K0 = (G0 , M0 , I0 ) given in Example 1
and the training set τ0 = (T0 , L0 ) given in Example 4, and that we have to ﬁnd
neighbors N(g4 , τ0 ) in order to complete a function Lˆ0 . As Figure 2.b shows,
the known objects g2 , g3 , g5 , and g6 are nearest to the unknown object g4 . In
adopting k-NN, each of the four objects is equally a candidate of a neighbor of g4 .
30      Madori Ikeda, Akihito Yamamoto

When k > 4, we have to ﬁnd k−4 more candidates that are less similar to g4 than
g2 , g3 , g5 , g6 , and such candidates might be noises and might decrease accuracy
of obtained function Lˆ0 . Otherwise, we need to reject 4 − k candidates from g2 ,
g3 , g5 , g6 according to some policy. The rejection also aﬀects the accuracy if
values of the function L0 for the candidates are diﬀerent. In this case, the values
of L0 (g2 ), L0 (g3 ), L0 (g5 ), and L0 (g6 ) are mutually distinct, and the value of
Lˆ0 (g4 ) varies depending on remaining candidates. Therefore, the ﬁxed number
k of k-NN may lead accuracy of obtained function L̂ to be decreased. Generally,
for the purpose of avoiding such problems, feature selection is repeated for each
training set in advance.
     By contrast, the nearest objects g2 , g3 , g5 , and g6 are divided into two extents
Ex(c5 ) = { g2 , g3 , g4 } and Ex(c6 ) = { g4 , g5 , g6 } in our method. The concepts
c5 and c6 are discriminated by their scores, and c6 is more plausible than c5 .
Consequently, the objects g2 , g3 ∈ Ex(c5 ) are neighbors of the unknown object g4 .
Therefore, the number of the neighbors is often less than one in k-NN. Moreover,
the number is depends on the size of the extent of every plausible concept, so
the number k is not necessary in our method.
     We claim that the process of our method, dividing candidates of neighbors
into extents and discriminating them by scores, improves both the precision and
the recall of an obtained function L̂.
Definition 16 Under a target function L∗ and an obtained function L̂, the
precision prec(g) and the recall rec(g) for every object g ∈ G is deﬁned as

                           |L̂(g) ∩ L∗ (g)|                |L̂(g) ∩ L∗ (g)|
               prec(g) =                      , rec(g) =                    .
                               |L̂(g)|                         |L∗ (g)|

Generally, a larger number k of candidates of neighbors results in a lower pre-
cision and a higher recall. While k is ﬁxed for all unknown objects in k-NN, it
is ﬂexible for each unknown object in our method. More precisely, our method
tries to make a precision higher by making k for an unknown object less, but
the method also tries to make a recall higher by making k greater when it can
keep a precision high. Thus, the two values are better than ones in k-NN. We
conﬁrm this assertion by experiments in the next section.


5    Thesaurus Extension and Experiments
We cast the problem of thesaurus extension as a classiﬁcation problem to which
we apply the method proposed in this paper. A thesaurus is a dictionary register-
ing terms based on their senses. It is common to all thesauri available now that
every registered term (known object) is sorted according to some senses (classes).
Extending a thesaurus is putting an unregistered term (unknown object) on some
proper positions corresponding to senses in the thesaurus. It is a classiﬁcation
problem deﬁned in Section 3 when we regard a training set τ = (T, L) as an
original thesaurus, T as a set of registered terms, L(g) for every term g ∈ T as a
set of labels identifying its senses, and unknown objects as unregistered terms.
  Classification by Selecting Plausible Formal Concepts in a Concept Lattice        31

    In this section, we compare our method with k-NN (k-nearest neighbor algo-
rithm) [3] on the point of accuracy by experiments before illustrating resources
used in the experiments.

5.1   Contexts and Training Sets for Experiments
We prepared two Japanese corpora, a case frame corpus published by Gengo-
Shigen-Kyokai (which means “Language Resource Association” in English) [8]
and the Google 4-gram [10], and two Japanese thesauri, Japanese WordNet 1.0
[15] and Bunruigoihyo [16]. We generated the set G1 of 7,636 nouns contained by
all of these linguistic resources. For our experiments, we constructed a context
K1 = (G1 , M1 , I1 ) from the case frame corpus and a context K2 = (G1 , M2 , I2 )
from the Google 4-gram so that they satisfy I1 ∩ I2 = ∅. Additionally, we con-
struct the third context K3 = (G1 , M3 , I3 ) where M3 = M1 ∪M2 and I3 = I1 ∪I2 .
We also constructed a pair (G1 , L1∗ ) from Japanese WordNet 1.0, and (G1 , L2∗ )
from Bunruigoihyo. We generated ten training sets from each pair in each ex-
periment adopting 10-fold cross validation.
     Table 3 shows the statistics for the three concept lattices of the three contexts.
                                                    I                 I
Numbers on the row beginning with “max | { g } i |”, “min | { g } i |”, and “mean
       I
| { g } i |” respectively represent the maximum, the minimum, and the mean of
the numbers of attributes that an object g ∈ G1 has. Numbers on the row of
“mean |Ex(c)|” represent the mean of |Ex(c)| of every formal concept c ∈ B(Ki )
for i ∈ { 1, 2, 3 }. Every numbers on the row beginning with “max height” and
“min height” respectively indicate length of the longest and the shortest path
from ⊥ to ⊤. Observing the table, we ﬁnd that every object has a few attributes
in all of the three contexts, and that every formal concept c ∈ B(Ki ) splits the
objects into small groups. In other words, formal contexts constructed from the
practical corpora are very sparse, and, for all concepts, not so many objects are
needed to score it.
     Table 4 shows the statistics for the pairs. Numbers on the row beginning
with “max |Li∗ (g)|”, “min |Li∗ (g)|”, and “mean |Li∗ (g)|” respectively indicate
the maximum, the minimum, and the mean of |Li∗ (g)| of every object g ∈
G1 for i ∈ { 1, 2 }. Note that, in both of the two thesauri, many terms have
several senses, and many senses are represented by several terms, i.e. |Li∗ (g)| > 1
and Li∗ (g) ∩ Li∗ (g ′ ) ̸= ∅ for many terms g, g ′ ∈ G1 . They have quite diﬀerent
deﬁnitions of terms. Moreover, we have to note that the two thesauri share no
identiﬁers of senses, i.e. L1∗ (g) ∩ L2∗ (g) = ∅ for every term g ∈ G1 . In the
remains of this subsection, we describe contents of the contexts and the pairs in
detail.
     The case frame corpus, which is used for construct the context K1 , consists of
case frame structures that are acquired from about 1.6 billion Japanese sentences
on the Web by syntactic analysis. In Japanese, every predicate relates to some
nouns with some case terms in a sentence, and such relations are called case frame
structures. In K1 = (G1 , M1 , I1 ), every element of M1 is a pair of a predicate
and a case term that the predicate relates to a noun in G1 with the case term in
the corpus, and every element of I1 is such a relation between a noun in G1 and
32      Madori Ikeda, Akihito Yamamoto

 Table 3. Statistics for the concept lattices

                      B(K1 ) B(K2 ) B(K3 )      Table 4. Statistics for the thesauri
         |G1 |         7,636 7,636 7,636
         |Mi |        19,313 7,135 26,448                         (G1 , L1∗ ) (G1 , L2∗ )
     max | { g }Ii |      17     27     32       ∪ |G1 |                7636        7636
     min | { g }Ii |       1      1      2      | g∈G1 Li∗ (g)|         9560         595
     mean | { g }Ii |   3.85   4.70   8.55       max |Li∗ (g)|            19           9
       |B(Ki )|       11,960 20,066 30,540       min |Li∗ (g)|             1           1
     mean |Ex(c)|       2.55   6.04   4.89       mean |Li∗ (g)|         2.19        2.89
      max height           6      9     10
      min height           2      2      2


Table 5. A context from case frame structures    Table 6. A context from 4-grams

                  ⟨hoeru,ga⟩ ⟨hoeru,ni⟩                   ga otoko ni hoete iru
           inu        ×                              inu × × ×
          otoko                  ×                  otoko          × × ×


a pair in M1 . For example, in a Japanese sentence “inu ga otoko ni hoete iru(in
English, A dog is barking to a man)”, the predicate “hoeru(bark)” relates to the
noun “inu(dog)” with the case term “ga(be/do)” and also relates to the noun
“otoko(man)” with the case term “ni(to)”. These relations are represented as
(inu,⟨hoeru,ga⟩) and (otoko,⟨hoeru,ni⟩) respectively and are shown in Table 5.
Note that the corpus also has the frequency of each relation, and we used only
relations satisfying 0.05 ≤ (f /n) ≤ 0.95 where f is the frequency of a relation
and n is the sum of the frequencies of relations including the same noun that
the relation holds.
    The Google 4-gram, which is used to construct the context K2 , is acquired
from about 20 billion Japanese sentences on the Web. A Japanese sentence can
be regarded as a string of POSs (part of speech) that are words included in
the sentence, and a 4-gram is a string of POSs whose length is four. In K2 =
(G1 , M2 , I2 ), every element of G1 is the ﬁrst POS, and every element of M2 is
POS following the ﬁrst in a sentence, and every element of I2 is a relation of
“following”. For example, from the same sentence “inu ga otoko ni hoete iru” that
is a string of six POSs, we can obtain two 4-grams starting with a noun, “inu ga
otoko ni” and “otoko ni hoete iru”, and the context shown in Table 6 is obtained.
This corpus also has the frequency of each 4-gram, and, in order to construct
(G1 , M2 , I2 ), we use only 4-grams satisfying the condition 0.05 ≤ (f /n) ≤ 0.95
(f is the frequency of a 4-gram and n is the sum of the frequencies of 4-grams
containing the same noun the 4-gram holds).
    Japanese WordNet 1.0 is used to construct the pair (G1 , L1∗ ), and we use
values called lemmas in the thesaurus as terms. Bunruigoihyo is used to con-
struct the pair (G1 , L2∗ ), and we use values called midashi-honntai (entry) in
  Classification by Selecting Plausible Formal Concepts in a Concept Lattice   33

               Table 7. Accuracies of obtained functions L̂1 and L̂2

                                            (G1 , L1∗ )      (G1 , L2∗ )
                                  method precision recall precision recall
           K1 = (G1 , M1 , I1 ) our method 0.039 0.274 0.164 0.533
                                   1-NN    0.026 0.024 0.103 0.103
                                   5-NN    0.007 0.036 0.031 0.150
                                  10-NN    0.004 0.038 0.016 0.169
           K2 = (G1 , M2 , I2 ) our method 0.007 0.079 0.028 0.248
                                   1-NN    0.007 0.007 0.027 0.027
                                   5-NN    0.002 0.013 0.014 0.070
                                  10-NN    0.002 0.018 0.010 0.100
           K3 = (G1 , M3 , I3 ) our method 0.030 0.072 0.132 0.250
                                   1-NN    0.009 0.009 0.039 0.039
                                   5-NN    0.004 0.018 0.017 0.085
                                  10-NN    0.002 0.024 0.011 0.116


the thesaurus as terms and use values called bunrui-bango (class number) in it
as senses.


5.2   Experimental Results

We carried out experiments to compare our method with k-NN (k-nearest neigh-
bor algorithm).
     In the experiments, both our method and other methods were executed with
six combinations of the three concept lattices B(Ki ) for i ∈ { 1, 2, 3 } and the
two pairs (G1 , Lj∗ ) for j ∈ { 1, 2 }. We adopted 10-fold cross validation on all
combinations. On each combination, the set of objects G1 was split into ten
sets Ul for l ∈ { 1, 2, ..., 10 } that are almost equal about their size. The l-th
classiﬁcation was executed for a concept lattice B(Ki ) and a training set (G1 \
Ul , Lj∗ ). We used the precision and the recall in order to evaluate accuracy of
both methods. The precision and the recall of a method are mutually deﬁned as
the means of precisions and recalls of the ten classiﬁcations, and the precision
and the recall of the l-th classiﬁcation are deﬁned as the means of the values
of the obtained function L̂j for unknown objects Ul . We have to note that our
research is intended to solve a task classifying unknown objects in a contest for
every one of many training sets, but each classiﬁcation in the experiments was
executed for a pair of a context and a training set.
     We compare our method with k-NN for k ∈ { 1, 5, 10 } and show the results
in Table 7. In the table, every value is rounded oﬀ to the third decimal place.
The results shows our method is better than k-NN in both the precision and
the recall over the combinations. The results shows that we can conclude that
this is due to the fact that our method is more ﬂexible than the others on the
numbers of neighbors.
34      Madori Ikeda, Akihito Yamamoto

6    Conclusions

We proposed a classiﬁcation method that uses a concept lattice and introduces
scores of concepts in order to ﬁnd neighbors of each unknown object. The method
tries to make a precision higher by making the number of the neighbors less, but
the method also tries to make a recall higher by making the number greater
when it can keep a precision high. Moreover, the method does not need feature
selection by using the lattice and the scores. We intend the method to apply
a task that classiﬁes a set of test data for every one of many sets of training
data. The task is practical in NLP, e.g. thesaurus extension By avoiding fea-
ture selection, the method has an advantage of time-cost in the task. We made
sure that our method classiﬁes unknown objects more accurately than k-NN
(nearest neighbor algorithm) [3] by experiments applying both of the methods
to thesaurus extension.
    The classiﬁcation results given in Figure 7 shows that the accuracy of our
method is not enough although it is better than ones of the other method. The
results also show that the accuracies vary over combinations of concept lattices
and training sets. It is expected that the variation depends especially on the
structure of a concept lattice because the structure aﬀects directly what kind
of object a formal concept contains in its extent. Therefore analyzing relations
among structures of the lattices and classiﬁcation results would be a future work
for improvement in the accuracy.


References

1. Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching Very Large Ontologies Using
   the WWW. In: Proc. of the ECAI 2000 Workshop gOntology Learningh (2000)
2. Belohlavek, R., De Baets, B., Outrata, J., Vychodil, V.: Inducing Decision Trees via
   Concept Lattices. International Journal of General Systems, vol. 38, Num. 4, pp.
   455–467 (2009)
3. Belur, V., D.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques.
   IEEE Computer Society Press, Los Alamitos, CA (1991)
4. Choi, V., Huang, Y.: Faster Algorithms for Constructing a Galois Lattice, Enumer-
   ating All Maximal Bipartite Cliques and Closed Frequent Sets. In: SIAM Conference
   on Discrete Mathematics (2006)
5. Davey, B., A., Priestly, H., A.: Introduction to Lattice and Order. Cambridge Uni-
   versity Press, Cambridge (2002)
6. Deng, H., Runger, G.: Feature Selection via Regularized Trees. In: Neural Networks
   (IJCNN), Proc. of the 2012 International Joint Conference on, pp. 1–8. IEEE (2012)
7. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
   Springer-Verlag New York, Inc., Secaucus, NJ (1999)
8. Gengo Shigen Kyokai, http://www.gsk.or.jp
9. Jun Wang, J., Ge, N.: Automatic Feature Thesaurus Enrichment: Extracting
   Generic Terms From Digital Gazetteer. In: Digital Libraries, 2006. JCDL ’06. Proc.
   of the 6th ACM/IEEE-CS Joint Conference on, pp. 326–333. IEEE (2006)
10. Kudo, T., Kazawa, H.: Web Japanese N-gram Version 1, Gengo Shigen Kyokai
  Classification by Selecting Plausible Formal Concepts in a Concept Lattice         35

11. Kuznetsov, S., O.: Complexity of Learning in Concept Lattices from Positive and
   Negative Examples. Discrete Applied Mathematics, vol. 142, issues. 1–3, pp. 111–
   125 (2004)
12. Kuznetsov, S., O.: Machine Learning and Formal Concept Analysis. Lecture Notes
   in Computer Science, vol. 2961, pp. 287–312 (2004)
13. Kuznetsov, S., O.: Mathematical Aspects of Concept Analysis. Journal of Mathe-
   matical Sciences, vol. 80, no. 2, pp. 1654–1698 (1996)
14. Lopez, F., G., Torres, M., G., Melian, B., Perez, J., A., M., Moreno-Vega, J., M.:
   Solving Feature Subset Selection Problem by a Parallel Scatter Search. European
   Journal of Operational Research, vol. 169, no. 2, pp. 477–489 (2006)
15. Mok, S., W., H., Gao, H., E., Bond, F.: Using Wordnet to Predict Numeral Clas-
   sifiers in Chinese and Japanese. In: Proc. of the 6th International Conference of the
   Global WordNet Association (GWC-2012), Matsue (2012)
16. National Institute for Japanese Language and Linguistics, http://www.ninjal.
   ac.jp/archives/goihyo
17. Soldano, H., Ventos, V., Champesme, M., Forge, D.: Incremental Construction of
   Alpha Lattices and Association Rules. In: Proc. of the 14th International Conference
   on Knowledge-based and Intelligent Information and Engineering Systems: Part II
   (KES’10), pp. 351–360. Springer, Heidelberg (2010)
18. Uramoto, N.: Positioning Unknown Words in a Thesaurus by Using Information
   Extracted from a Corpus. In: Proc. of the 16th Conference on Computational Lin-
   guistics (COLING ’96), vol. 2, pp. 956–961. Association for Computational Linguis-
   tics, Stroudsburg, PA (1996)
19. Valtchev, P., Missaoui, R.: Building Concept (Galois) Lattices from Parts: Gener-
   alizing the Incremental Methods. In: Proc. of the ICCS’01, pp. 290–303. Springer,
   Heidelberg (2001)