Classification by Selecting Plausible Formal Concepts in a Concept Lattice Madori Ikeda and Akihito Yamamoto Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan m.ikeda@iip.ist.i.kyoto-u.ac.jp, akihiro@i.kyoto-u.ac.jp Abstract. We propose a classification method using a concept lattice, and apply it to thesaurus extension. In natural language processing, solv- ing a practical task by extending many thesauri with a corpus is time- consuming. The task can be represented as classifying a set of test data for each of many sets of training data. The method enables us to decrease the time-cost by avoiding feature selection, which is generally performed for each pair of a set of test data and a set of training data. More pre- cisely, a concept lattice is generated from only a set of test data, and then each formal concept is given a score by using a set of training data. The score represents plausibleness as neighbors of an unknown object, and the unknown object classified into classes to which its neighbors belong. Therefore, once we make the lattice, we can classify test data for each set of training data by only scoring, which has a small computational cost. By experiments using practical thesauri and corpora, we show that our method classifies more accurately than k-nearest neighbor algorithm. Keywords: Formal Concept, Concept Lattice, Classification, Thesaurus Extension 1 Introduction In this paper, we propose a method for classifying data that generates a concept lattice and selects appropriate formal concepts in the lattice. The method enables us to avoid feature selection superficially in classification by securing storage enough to maintain both the selected concepts and redundant concepts. This contributes to saving time in solving practical problems concerning with a great variety of large data, e.g. thesaurus extension. Classification can be divided into feature selection and classification with the selected features. Selecting features, which affects classification results, is very important, and many methods have been proposed for it [6, 14]. Generally, the selection is executed for a pair of a set of test data and a set of training data. Moreover, the selection is time-consuming when the size of these data is large or many noise are contained in them, as well as when test data are assumed to be classified into multi-classes. Therefore, classifying raw and large test data for each of many sets of training data can be costly from a computational point of view. Our method can overcome the problem. The method generates a con- cept lattice from a set of test data in advance by following the mathematical Classification by Selecting Plausible Formal Concepts in a Concept Lattice 23 definitions naı̈vely. When a set of training data is given, the method gives each formal concept a score by simple calculation using the training data. The score represents plausibleness of the concept as a set of neighbors of an unknown ob- ject to be classified. Then the method selects some of the concepts based on the score and finds the neighbors. Each unknown object is classified into classes to which its neighbors belong. The method is thus faster because it uses the same concept lattice for every set of training data without feature selection. In addition, we can easily maintain a concept lattice updated with novel test data using well known methods [4, 17, 19]. Storing such a lattice can be costly, since the lattice must store both concepts that end up being selected or not. We claim this disadvantage can be mitigated by the low cost of memory storage. We apply our method to the problem of thesaurus extension. Thesauri are semantic dictionaries of terms, and many kinds of thesauri are available now. In almost of all thesauri, each terms have several semantic definitions, and the def- initions of terms often vary from a thesaurus to another thesaurus. Corpora are also linguistic resources of another type that consist of sentences in a natural lan- guage, and recently some of them contain huge amount of sentences with many kinds of characteristics generated and attached to them by applying parsers and syntactic analyzers. Thesaurus extension can be regarded as classification and has been researched in the area of NLP (natural language processing) [1, 9, 18]. As classification, a thesaurus is a set of training data, and a corpus is a set of test data. Many proposed methods calculate similarities among terms by using features of them that are selected from the characteristics contained in a corpus. Then an unregistered term is put into the original thesaurus properly by finding registered terms similar to the unregistered term. This is a practical way of the extension because it is so easy to acquire many syntactic characteristics of terms for classifying unregistered terms semantically. However, many thesauri to be extended exist, and the selected features for a thesaurus might not be useful for another thesaurus. Moreover, these linguistic resources are often updated. These methods do not take the practical problems in NLP into account. Our method does not depend on feature selection, but generates a concept lattice without training data, and is robust to update. We apply thesaurus extension to two thesauri and two corpora that are freely available. This paper is organized as follows. First, we introduce the definitions of formal concepts and concept lattices in the next section. In Section 3, we explain our classification method based on the definitions, and we compare the method with related works in Section 4. In Section 5, we define thesaurus extension in terms of classification, and show our experimental results. Conclusions are placed in Section 6. 2 Formal Concepts and Concept Lattices We introduce the definitions of formal concepts and concept lattices according to [5, 7] with a running example. 24 Madori Ikeda, Akihito Yamamoto Table 1. An example context K0 = (G0 , M0 , I0 ) m1 m2 m3 m4 m5 m6 m7 g1 × × g2 × × × g3 × × × g4 × × × g5 × × × g6 × × × g7 × × × Definition 1 ([7]) Let G and M be mutually disjoint finite sets, and I ⊆ G×M . Each element of G is called an object, and each element of M is called an attribute, and (g, m) ∈ I is read as the object g has the attribute m. A triplet (G, M, I) is called a formal context (context for short). Example 1 We introduce a context K0 = (G0 , M0 , I0 ) as a running example where G0 = {g1 , g2 , ..., g7 }, M0 = {m1 , m2 , ..., m7 }, and every element of I0 is represented with a cross in Table 1. For example, the object g4 has the attributes m2 , m4 , and m6 . Definition 2 ([7]) For a context (G, M, I), subsets A ⊆ G and B ⊆ M , we de- fine AI = { m ∈ M | ∀g ∈ A, (g, m) ∈ I }, B I = { g ∈ G | ∀m ∈ B, (g, m) ∈ I }. A formal concept of the context is a pair (A, B) where AI = B and A = B I . Definition 3 ([7]) For a formal concept c = (A, B), A and B are called the extent and the intent respectively, and we let Ex(c) = A and In(c) = B. For arbitrary formal concepts c and c′ , there is an order c ≤ c′ iff Ex(c) ⊆ Ex(c′ ) (or equally In(c) ⊇ In(c′ )). Definition 4 ([7]) The set of all formal concepts of a context K = (G, M, I) with the order ≤ is denoted by B(G, M, I) (for short, B(K)) and is called the concept lattice of K. For a concept lattice, the least concept is called the bottom and is denoted by ⊥, and the greatest concept is called the top and is denoted by ⊤. Example 2 The concept lattice B(K0 ) of the context K0 = (G0 , M0 , I0 ) given in Table 1 is shown in Figure 1. Each circle represents a formal concept c ∈ B(K0 ) with Ex(c) and In(c) on its side. The gray concept and numbers called scores beside concepts are explained in Section 3. In the figure, each edge repre- sents an order ≤ between two concepts, and the greater concept is drawn above, and transitional orders are omitted. In the lattice, the concept c1 is the top and the concept c11 is the bottom. Classification by Selecting Plausible Formal Concepts in a Concept Lattice 25 g1 g2 g3 g4 g5 g6 g7 c1 0.187... m2 m5 g1 g2 g3 g4 g5 g6 c2 0.181... g5 g6 g7 c3 0.472... m1m2 m2m4 m2m6 g1 g2 g3 c4 0.15 g2 g3 g4 c5 0.2 g4 g5 g c6 0.666... 6 m1m2m4 m2m4m6 m2m5m6 m3m5m7 c g2 g3 7 0.2 g4 c8 0 g5 g6 c9 0.666... g7 c10 1 m1 m2 m3 m4 m5 m6 m7 c 0 11 Fig. 1. A concept lattice B(K0 ) with scores Definition 5 ([7]) For a concept lattice B(G, M, I), the formal concept γg = II I ({ g } , { g } ) of an object g ∈ G is called the object concept. Definition 6 ([7]) For every formal concept c ∈ B(K), the subset of formal concepts { c′ ∈ B(K) | c′ ≥ c } is denoted by ↑ c and called the principal filter generated by c. Definition 7 ([5]) Let S be an ordered set and let x, y ∈ S. We say x is covered by y if x < y and x ≤ z < y implies x = z. Definition 8 For a concept lattice B(K), a path is a string of formal concepts c0 , c1 , ..., cn satisfying that ci is covered by ci+1 for every i ∈ [0, n − 1], and its length is n. Example 3 For the concept lattice B(K0 ) shown in Figure 1, γg4 = c8 and ↑ γg4 = { c1 , c2 , c5 , c6 , c8 }. Length of the longest path from ⊥ to ⊤ is four, and length of the shortest path from ⊥ to ⊤ is three. In addition, several algorithms have been proposed [4, 17, 19] in order to update a concept lattice B(K) when a new object or a new attribute is added to a context K = (G, M, I), e.g. K turns into (G′ , M ′ , I ′ ) where G′ ⊃ G, M ′ ⊃ M , and I ′ ⊃ I. Thus we can easily modify a concept lattice by using these algorithms. 3 Classification with a Concept Lattice We illustrate our classification method after formalizing classification problems. 26 Madori Ikeda, Akihito Yamamoto Table 2. An example training set τ0 = (T0 , L0 ) g ∈ T0 g1 g2 g3 g5 g6 g7 L0 (g) { l1 , l2 } { l2 , l3 , l4 } { l4 , l5 , l6 } { l1 , l6 , l7 } { l6 , l7 } { l1 , l7 , l8 } Definition 9 We let L be a finite set that is disjoint with both G and M , and every element of L is called a label. We assume a function L∗ : G → 2L as a target classification rule, and L∗ (g) is called the set of labels of g. Each label l ∈ L represents a class. Note that, for every object g ∈ G, the value of L∗ (g) might not be a singleton and might share some labels with the value of L∗ (g ′ ) of another object g ′ ∈ G, i.e. L∗ (g) ∩ L∗ (g ′ ) ̸= ∅. Therefore this is a multi- class classification problem and is regarded as an extension of the binary-class classification problems [11–13]. Definition 10 For a subset T ⊆ G and a function L : T → 2L satisfying that L(g) = L∗ (g) if g ∈ T , a pair (T, L) is called a training set. For a training set (T, L), every object g ∈ G is called an unknown object if g ̸∈ T , otherwise it is called a known object. Example 4 A training set τ0 = (T0 , L0 ) where T0 = { g1 , g2 , g3 , g5 , g6 , g7 } and L0 : T0 → 2{ l1 ,l2 ,...,l8 } is shown in Table 2. The object g4 is excluded from G0 of the context K0 = (G0 , M0 , I0 ) given in Example 1 and is unknown for τ0 . Classification problems can be defined as obtaining a function L̂ : G → 2L , and a classification is successful when a function L̂ such that ∀g ∈ G. L̂(g) = L∗ (g) is obtained from a given training set (T, L). Our method is designed for classifying only test data that can be expressed as a context (G, M, I) and consists of the following steps. 1. constructing the concept lattice B(K) of a context K = (G, M, I), 2. calculating scores of formal concepts using a given training set τ = (T, L), 3. finding the set of neighbors for each unknown object u ∈ G \ T based on the scores, and 4. deciding a function L̂ by referring L(g) for every known object g ∈ T . The first step is achieved by simply following the definitions of formal con- cepts. In order to find the neighbors, formal concepts are given scores, and some of the scored concepts are extracted based on their score. The score of every formal concept is a real number calculated with a training set, and it changes for another training set. Definition 11 For every formal concept c ∈ B(K) and a training set τ = (T, L), we define Ex(c, τ ) = Ex(c) ∩ T . Classification by Selecting Plausible Formal Concepts in a Concept Lattice 27 Definition 12 For every formal concept c ∈ B(K) and a training set τ = (T, L), we define σ(c, τ ) as a real number in [0, 1] and call it the score of the concept c under the training set τ . The value σ(c, τ ) is calculated as follows:    0 if |Ex(c, τ )| = 0,     if |Ex(c, τ )| = 1, and 1∑|Ex(c,τ )|−1 ∑|Ex(c,τ )| σ(c, τ ) = sim(g , i jg )   i=1 ( j=i+1 ) otherwise,   |Ex(c, τ )|    2 |L(gi ) ∩ L(gj )| where sim(gi , gj ) = . |L(gi ) ∪ L(gj )| The function sim calculates similarity between know objects gi and gj , and the function σ calculates the average of similarities among objects in Ex(c, τ ). The purpose of defining the score σ(c, τ ) is to estimate similarity among all objects in Ex(c) that includes not only known objects but also unknown objects. After scoring formal concepts, the set of neighbors is found for each unknown object based on the scores. The neighbors are known objects extracted from the extent of plausible concepts. Definition 13 For every object g ∈ G in a concept lattice B(G, M, I) and a training set τ = (T, L), a formal concept c ∈ ↑ γg is called plausible w.r.t. g and τ if σ(c, τ ) ≥ σ(c′ , τ ) for any other concept c′ ∈ ↑ γg and |Ex(c)| ≥ |Ex(c′′ )| for any other concept c′′ ∈ ↑ γg such that σ(c, τ ) = σ(c′′ , τ ). The set of plausible formal concepts w.r.t. g and τ is denoted by p(g, τ ) ⊆ ↑ γg. We intend the score of a formal concept c to represent similarities among objects in Ex(c). We therefore define objects in Ex(c) ∋ u of the concept c that has the highest score as neighbors of an unknown object u. However, it sometimes happens that some formal concepts have the highest score at the same time. In this case, we define a concept c consisting of the largest size Ex(c) as a plausible concept. This is based on our policy that, among formal concepts that have the same score, the larger the size of Ex(c) is, the less the concept c has noises in Ex(c). Definition 14 For every unknown object u ∈ G \ T under a concept lattice B(G, M, I) and a training set τ = (T, L), a set N(u, τ ) of neighbors is defined as ∪ N(u, τ ) = Ex(c, τ ). c∈p(u,τ ) At the last step, a function L̂ is constructed as {∪ ′ g ′ ∈N(g,τ ) L(g ) if g is unknown for τ = (T, L), L̂(g) = L(g) otherwise. In this paper, we employed this simple definition although it could be defined by many ways. 28 Madori Ikeda, Akihito Yamamoto Example 5 Suppose that we obtained the context K0 = (G0 , M0 , I0 ) given in Example 1 and constructed the concept lattice B(K0 ) as shown in Figure 1 at the first step. Then, suppose that the training set τ0 = (T0 , L0 ) shown in Example 4 was given. The score σ(c, τ0 ) of every formal concept c ∈ B(K0 ) under the training set τ0 can be calculated at the second step and is shown as the number beside each formal concept c in Figure 1. Plausible formal concepts of the unknown object g4 decided as the third step are represented as gray and bold circles in Figure 1. There is only one plausible concept c6 , and thus N(g4 , τ0 ) = { g5 , g6 }. Finally, we can obtain a function Lˆ0 at the last step, and Lˆ0 (g4 ) = { l1 , l6 , l7 }. 4 Comparison with Related Works We concern a task classifying objects in a context for each of many training sets, and we assume that training sets have multi-classes, and that an object might be classified into several classes. This is a practical task in NLP (natural language processing) that is extending many thesauri by using a corpus. Clas- sification results required by every pair of the context and a training set must be different each other. Classification thus needs to be executed for each of the pairs, and the greater the number of the pairs is, the more solving the task is time-consuming. Our research is motivated to save the time required for the task, and our classification method can overcome the problem. The proposed method constructs a concept lattice from a context in advance, and then the method classifies unknown objects of a training set given later. The same concept lattice is used repeatedly in order to classify unknown objects of each training set, and each classification is performed by scoring formal concepts and finding neighbors of unknown objects. This is based on an idea that as many processes requiring no training sets as possible should be executed before training sets are given. Because of this idea, our method is different from some researches. Learning models based on a concept lattice are proposed in [11–13]. In these researches, a hypothesis (a set of attributes) is constructed from positive and neg- ative examples (objects) by using a formal concept lattice, and it is determined whether an unknown object is positive or not when the hypothesis is acquired. This is a binary-class classification problem, but unknown objects sometimes are not classified when hypotheses are not constructed appropriately in these approaches. Our method certainly classifies unknown objects into malt-classes by scoring formal concepts. In order to classifying data, some approaches gen- erate decision trees from concept lattices [2, 12]. The decision tree is generated for a pair of a context and a training set. Our method however is not intended to generate decision trees because manipulating a concept lattice that is already constructed is not preferable in order to reduce the time for the task we concern. As classification for a pair of a context and a training set, our method is similar to k-NN (k-nearest neighbor algorithm) [3] on the point that they find and refer neighbors of an unknown object in order to classify it. However, our method often decreases the number of the neighbors and does not need to be Classification by Selecting Plausible Formal Concepts in a Concept Lattice 29 c 0.187... D(g4 , gi )< _6 g1 g2 g3 g4 g5 g6 g7 1 Ex( c1 ) g 7 m2 0.181... Ex(c6 ) g1 g2 g3 g4 g5 g6 c2 D(g4 , gi )< _3 g5 Ex(c2) g3 g6 g2 g4 m2m4 0.2 m m 0.666... c 2 6 g2 g3 g4 5 g4 g5 g6 c6 D(g4 , gi )< _2 Ex(c5) g1 m2m4m6 0 c g4 8 D(g4 , gi )< _0 a. b. Fig. 2. Formal concepts ↑ γg4 in the concept lattice B(K0 ) with distance given the number. In this section, we illustrate such differences between the two methods with an example, and we describe that the differences cause our method to classify more accurately. In the illustration, we use the symmetric difference for measuring dissimilarity between two objects. Definition 15 For two objects g, g ′ ∈ G of a context (G, M, I), we define a distance D(g, g ′ ) between g and g ′ as D(g, g ′ ) = | { g } ∪ { g ′ } | − | { g } ∩ { g ′ } |. I I I I This distance is also known as the Hamming distance between bit vectors in information theory. Figure 2.a shows a part of the concept lattice B(G0 , M0 , I0 ) that is a set of formal concepts ↑ γg4 with the order ≤, and Figure 2.b shows a space that every object g ∈ G0 is placed according to the distance D(g4 , g) from the unknown object g4 . From these figures, we observe that each formal concept c ∈↑ γg, as the extent Ex(c), represents a set of objects located within a certain distance from an object g. It is also found that the greater a concept is, the more the concept includes many dissimilar objects, i.e. for every g ∈ G and c′ , c′′ ∈↑ γg, max({ D(g, g ′ ) | g ′ ∈ Ex(c′ ) }) ≤ max({ D(g, g ′′ ) | g ′′ ∈ Ex(c′′ ) }) if c′ ≤ c′′ . Suppose that we have the context K0 = (G0 , M0 , I0 ) given in Example 1 and the training set τ0 = (T0 , L0 ) given in Example 4, and that we have to find neighbors N(g4 , τ0 ) in order to complete a function Lˆ0 . As Figure 2.b shows, the known objects g2 , g3 , g5 , and g6 are nearest to the unknown object g4 . In adopting k-NN, each of the four objects is equally a candidate of a neighbor of g4 . 30 Madori Ikeda, Akihito Yamamoto When k > 4, we have to find k−4 more candidates that are less similar to g4 than g2 , g3 , g5 , g6 , and such candidates might be noises and might decrease accuracy of obtained function Lˆ0 . Otherwise, we need to reject 4 − k candidates from g2 , g3 , g5 , g6 according to some policy. The rejection also affects the accuracy if values of the function L0 for the candidates are different. In this case, the values of L0 (g2 ), L0 (g3 ), L0 (g5 ), and L0 (g6 ) are mutually distinct, and the value of Lˆ0 (g4 ) varies depending on remaining candidates. Therefore, the fixed number k of k-NN may lead accuracy of obtained function L̂ to be decreased. Generally, for the purpose of avoiding such problems, feature selection is repeated for each training set in advance. By contrast, the nearest objects g2 , g3 , g5 , and g6 are divided into two extents Ex(c5 ) = { g2 , g3 , g4 } and Ex(c6 ) = { g4 , g5 , g6 } in our method. The concepts c5 and c6 are discriminated by their scores, and c6 is more plausible than c5 . Consequently, the objects g2 , g3 ∈ Ex(c5 ) are neighbors of the unknown object g4 . Therefore, the number of the neighbors is often less than one in k-NN. Moreover, the number is depends on the size of the extent of every plausible concept, so the number k is not necessary in our method. We claim that the process of our method, dividing candidates of neighbors into extents and discriminating them by scores, improves both the precision and the recall of an obtained function L̂. Definition 16 Under a target function L∗ and an obtained function L̂, the precision prec(g) and the recall rec(g) for every object g ∈ G is defined as |L̂(g) ∩ L∗ (g)| |L̂(g) ∩ L∗ (g)| prec(g) = , rec(g) = . |L̂(g)| |L∗ (g)| Generally, a larger number k of candidates of neighbors results in a lower pre- cision and a higher recall. While k is fixed for all unknown objects in k-NN, it is flexible for each unknown object in our method. More precisely, our method tries to make a precision higher by making k for an unknown object less, but the method also tries to make a recall higher by making k greater when it can keep a precision high. Thus, the two values are better than ones in k-NN. We confirm this assertion by experiments in the next section. 5 Thesaurus Extension and Experiments We cast the problem of thesaurus extension as a classification problem to which we apply the method proposed in this paper. A thesaurus is a dictionary register- ing terms based on their senses. It is common to all thesauri available now that every registered term (known object) is sorted according to some senses (classes). Extending a thesaurus is putting an unregistered term (unknown object) on some proper positions corresponding to senses in the thesaurus. It is a classification problem defined in Section 3 when we regard a training set τ = (T, L) as an original thesaurus, T as a set of registered terms, L(g) for every term g ∈ T as a set of labels identifying its senses, and unknown objects as unregistered terms. Classification by Selecting Plausible Formal Concepts in a Concept Lattice 31 In this section, we compare our method with k-NN (k-nearest neighbor algo- rithm) [3] on the point of accuracy by experiments before illustrating resources used in the experiments. 5.1 Contexts and Training Sets for Experiments We prepared two Japanese corpora, a case frame corpus published by Gengo- Shigen-Kyokai (which means “Language Resource Association” in English) [8] and the Google 4-gram [10], and two Japanese thesauri, Japanese WordNet 1.0 [15] and Bunruigoihyo [16]. We generated the set G1 of 7,636 nouns contained by all of these linguistic resources. For our experiments, we constructed a context K1 = (G1 , M1 , I1 ) from the case frame corpus and a context K2 = (G1 , M2 , I2 ) from the Google 4-gram so that they satisfy I1 ∩ I2 = ∅. Additionally, we con- struct the third context K3 = (G1 , M3 , I3 ) where M3 = M1 ∪M2 and I3 = I1 ∪I2 . We also constructed a pair (G1 , L1∗ ) from Japanese WordNet 1.0, and (G1 , L2∗ ) from Bunruigoihyo. We generated ten training sets from each pair in each ex- periment adopting 10-fold cross validation. Table 3 shows the statistics for the three concept lattices of the three contexts. I I Numbers on the row beginning with “max | { g } i |”, “min | { g } i |”, and “mean I | { g } i |” respectively represent the maximum, the minimum, and the mean of the numbers of attributes that an object g ∈ G1 has. Numbers on the row of “mean |Ex(c)|” represent the mean of |Ex(c)| of every formal concept c ∈ B(Ki ) for i ∈ { 1, 2, 3 }. Every numbers on the row beginning with “max height” and “min height” respectively indicate length of the longest and the shortest path from ⊥ to ⊤. Observing the table, we find that every object has a few attributes in all of the three contexts, and that every formal concept c ∈ B(Ki ) splits the objects into small groups. In other words, formal contexts constructed from the practical corpora are very sparse, and, for all concepts, not so many objects are needed to score it. Table 4 shows the statistics for the pairs. Numbers on the row beginning with “max |Li∗ (g)|”, “min |Li∗ (g)|”, and “mean |Li∗ (g)|” respectively indicate the maximum, the minimum, and the mean of |Li∗ (g)| of every object g ∈ G1 for i ∈ { 1, 2 }. Note that, in both of the two thesauri, many terms have several senses, and many senses are represented by several terms, i.e. |Li∗ (g)| > 1 and Li∗ (g) ∩ Li∗ (g ′ ) ̸= ∅ for many terms g, g ′ ∈ G1 . They have quite different definitions of terms. Moreover, we have to note that the two thesauri share no identifiers of senses, i.e. L1∗ (g) ∩ L2∗ (g) = ∅ for every term g ∈ G1 . In the remains of this subsection, we describe contents of the contexts and the pairs in detail. The case frame corpus, which is used for construct the context K1 , consists of case frame structures that are acquired from about 1.6 billion Japanese sentences on the Web by syntactic analysis. In Japanese, every predicate relates to some nouns with some case terms in a sentence, and such relations are called case frame structures. In K1 = (G1 , M1 , I1 ), every element of M1 is a pair of a predicate and a case term that the predicate relates to a noun in G1 with the case term in the corpus, and every element of I1 is such a relation between a noun in G1 and 32 Madori Ikeda, Akihito Yamamoto Table 3. Statistics for the concept lattices B(K1 ) B(K2 ) B(K3 ) Table 4. Statistics for the thesauri |G1 | 7,636 7,636 7,636 |Mi | 19,313 7,135 26,448 (G1 , L1∗ ) (G1 , L2∗ ) max | { g }Ii | 17 27 32 ∪ |G1 | 7636 7636 min | { g }Ii | 1 1 2 | g∈G1 Li∗ (g)| 9560 595 mean | { g }Ii | 3.85 4.70 8.55 max |Li∗ (g)| 19 9 |B(Ki )| 11,960 20,066 30,540 min |Li∗ (g)| 1 1 mean |Ex(c)| 2.55 6.04 4.89 mean |Li∗ (g)| 2.19 2.89 max height 6 9 10 min height 2 2 2 Table 5. A context from case frame structures Table 6. A context from 4-grams ⟨hoeru,ga⟩ ⟨hoeru,ni⟩ ga otoko ni hoete iru inu × inu × × × otoko × otoko × × × a pair in M1 . For example, in a Japanese sentence “inu ga otoko ni hoete iru(in English, A dog is barking to a man)”, the predicate “hoeru(bark)” relates to the noun “inu(dog)” with the case term “ga(be/do)” and also relates to the noun “otoko(man)” with the case term “ni(to)”. These relations are represented as (inu,⟨hoeru,ga⟩) and (otoko,⟨hoeru,ni⟩) respectively and are shown in Table 5. Note that the corpus also has the frequency of each relation, and we used only relations satisfying 0.05 ≤ (f /n) ≤ 0.95 where f is the frequency of a relation and n is the sum of the frequencies of relations including the same noun that the relation holds. The Google 4-gram, which is used to construct the context K2 , is acquired from about 20 billion Japanese sentences on the Web. A Japanese sentence can be regarded as a string of POSs (part of speech) that are words included in the sentence, and a 4-gram is a string of POSs whose length is four. In K2 = (G1 , M2 , I2 ), every element of G1 is the first POS, and every element of M2 is POS following the first in a sentence, and every element of I2 is a relation of “following”. For example, from the same sentence “inu ga otoko ni hoete iru” that is a string of six POSs, we can obtain two 4-grams starting with a noun, “inu ga otoko ni” and “otoko ni hoete iru”, and the context shown in Table 6 is obtained. This corpus also has the frequency of each 4-gram, and, in order to construct (G1 , M2 , I2 ), we use only 4-grams satisfying the condition 0.05 ≤ (f /n) ≤ 0.95 (f is the frequency of a 4-gram and n is the sum of the frequencies of 4-grams containing the same noun the 4-gram holds). Japanese WordNet 1.0 is used to construct the pair (G1 , L1∗ ), and we use values called lemmas in the thesaurus as terms. Bunruigoihyo is used to con- struct the pair (G1 , L2∗ ), and we use values called midashi-honntai (entry) in Classification by Selecting Plausible Formal Concepts in a Concept Lattice 33 Table 7. Accuracies of obtained functions L̂1 and L̂2 (G1 , L1∗ ) (G1 , L2∗ ) method precision recall precision recall K1 = (G1 , M1 , I1 ) our method 0.039 0.274 0.164 0.533 1-NN 0.026 0.024 0.103 0.103 5-NN 0.007 0.036 0.031 0.150 10-NN 0.004 0.038 0.016 0.169 K2 = (G1 , M2 , I2 ) our method 0.007 0.079 0.028 0.248 1-NN 0.007 0.007 0.027 0.027 5-NN 0.002 0.013 0.014 0.070 10-NN 0.002 0.018 0.010 0.100 K3 = (G1 , M3 , I3 ) our method 0.030 0.072 0.132 0.250 1-NN 0.009 0.009 0.039 0.039 5-NN 0.004 0.018 0.017 0.085 10-NN 0.002 0.024 0.011 0.116 the thesaurus as terms and use values called bunrui-bango (class number) in it as senses. 5.2 Experimental Results We carried out experiments to compare our method with k-NN (k-nearest neigh- bor algorithm). In the experiments, both our method and other methods were executed with six combinations of the three concept lattices B(Ki ) for i ∈ { 1, 2, 3 } and the two pairs (G1 , Lj∗ ) for j ∈ { 1, 2 }. We adopted 10-fold cross validation on all combinations. On each combination, the set of objects G1 was split into ten sets Ul for l ∈ { 1, 2, ..., 10 } that are almost equal about their size. The l-th classification was executed for a concept lattice B(Ki ) and a training set (G1 \ Ul , Lj∗ ). We used the precision and the recall in order to evaluate accuracy of both methods. The precision and the recall of a method are mutually defined as the means of precisions and recalls of the ten classifications, and the precision and the recall of the l-th classification are defined as the means of the values of the obtained function L̂j for unknown objects Ul . We have to note that our research is intended to solve a task classifying unknown objects in a contest for every one of many training sets, but each classification in the experiments was executed for a pair of a context and a training set. We compare our method with k-NN for k ∈ { 1, 5, 10 } and show the results in Table 7. In the table, every value is rounded off to the third decimal place. The results shows our method is better than k-NN in both the precision and the recall over the combinations. The results shows that we can conclude that this is due to the fact that our method is more flexible than the others on the numbers of neighbors. 34 Madori Ikeda, Akihito Yamamoto 6 Conclusions We proposed a classification method that uses a concept lattice and introduces scores of concepts in order to find neighbors of each unknown object. The method tries to make a precision higher by making the number of the neighbors less, but the method also tries to make a recall higher by making the number greater when it can keep a precision high. Moreover, the method does not need feature selection by using the lattice and the scores. We intend the method to apply a task that classifies a set of test data for every one of many sets of training data. The task is practical in NLP, e.g. thesaurus extension By avoiding fea- ture selection, the method has an advantage of time-cost in the task. We made sure that our method classifies unknown objects more accurately than k-NN (nearest neighbor algorithm) [3] by experiments applying both of the methods to thesaurus extension. The classification results given in Figure 7 shows that the accuracy of our method is not enough although it is better than ones of the other method. The results also show that the accuracies vary over combinations of concept lattices and training sets. It is expected that the variation depends especially on the structure of a concept lattice because the structure affects directly what kind of object a formal concept contains in its extent. Therefore analyzing relations among structures of the lattices and classification results would be a future work for improvement in the accuracy. References 1. Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching Very Large Ontologies Using the WWW. In: Proc. of the ECAI 2000 Workshop gOntology Learningh (2000) 2. Belohlavek, R., De Baets, B., Outrata, J., Vychodil, V.: Inducing Decision Trees via Concept Lattices. International Journal of General Systems, vol. 38, Num. 4, pp. 455–467 (2009) 3. Belur, V., D.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, CA (1991) 4. Choi, V., Huang, Y.: Faster Algorithms for Constructing a Galois Lattice, Enumer- ating All Maximal Bipartite Cliques and Closed Frequent Sets. In: SIAM Conference on Discrete Mathematics (2006) 5. Davey, B., A., Priestly, H., A.: Introduction to Lattice and Order. Cambridge Uni- versity Press, Cambridge (2002) 6. Deng, H., Runger, G.: Feature Selection via Regularized Trees. In: Neural Networks (IJCNN), Proc. of the 2012 International Joint Conference on, pp. 1–8. IEEE (2012) 7. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., Secaucus, NJ (1999) 8. Gengo Shigen Kyokai, http://www.gsk.or.jp 9. Jun Wang, J., Ge, N.: Automatic Feature Thesaurus Enrichment: Extracting Generic Terms From Digital Gazetteer. In: Digital Libraries, 2006. JCDL ’06. Proc. of the 6th ACM/IEEE-CS Joint Conference on, pp. 326–333. IEEE (2006) 10. Kudo, T., Kazawa, H.: Web Japanese N-gram Version 1, Gengo Shigen Kyokai Classification by Selecting Plausible Formal Concepts in a Concept Lattice 35 11. Kuznetsov, S., O.: Complexity of Learning in Concept Lattices from Positive and Negative Examples. Discrete Applied Mathematics, vol. 142, issues. 1–3, pp. 111– 125 (2004) 12. Kuznetsov, S., O.: Machine Learning and Formal Concept Analysis. Lecture Notes in Computer Science, vol. 2961, pp. 287–312 (2004) 13. Kuznetsov, S., O.: Mathematical Aspects of Concept Analysis. Journal of Mathe- matical Sciences, vol. 80, no. 2, pp. 1654–1698 (1996) 14. Lopez, F., G., Torres, M., G., Melian, B., Perez, J., A., M., Moreno-Vega, J., M.: Solving Feature Subset Selection Problem by a Parallel Scatter Search. European Journal of Operational Research, vol. 169, no. 2, pp. 477–489 (2006) 15. Mok, S., W., H., Gao, H., E., Bond, F.: Using Wordnet to Predict Numeral Clas- sifiers in Chinese and Japanese. In: Proc. of the 6th International Conference of the Global WordNet Association (GWC-2012), Matsue (2012) 16. National Institute for Japanese Language and Linguistics, http://www.ninjal. ac.jp/archives/goihyo 17. Soldano, H., Ventos, V., Champesme, M., Forge, D.: Incremental Construction of Alpha Lattices and Association Rules. In: Proc. of the 14th International Conference on Knowledge-based and Intelligent Information and Engineering Systems: Part II (KES’10), pp. 351–360. Springer, Heidelberg (2010) 18. Uramoto, N.: Positioning Unknown Words in a Thesaurus by Using Information Extracted from a Corpus. In: Proc. of the 16th Conference on Computational Lin- guistics (COLING ’96), vol. 2, pp. 956–961. Association for Computational Linguis- tics, Stroudsburg, PA (1996) 19. Valtchev, P., Missaoui, R.: Building Concept (Galois) Lattices from Parts: Gener- alizing the Incremental Methods. In: Proc. of the ICCS’01, pp. 290–303. Springer, Heidelberg (2001)