1. Introduction

Automatic acquisition of lexico-semantic knowledge from corpora

0 Nadezhda A. Stepanova Novgorod State University after Yaroslav the Wise Veliky Novgorod , Russia

This paper presents a new concept-based approach to extract lexico-semantic knowledge. Genitive constructions of Russian language are derived from parsed corpora. Formal Concept Analysis is employed to build lexicon structure on the basis of genitive constructions. Next, we propose similarity measure and special algorithm to derive lexical classes from concept lattice. This class hierarchy forms lexical database which can be used in various natural language applications, the example of Question Answering systems is given. In the end we compare concept-oriented lexicon with other lexical database and give the implementation details.

1. Introduction 2. The source of the lexicon 2.1 Text processing method

First of all the text corpora processing method is needed to derive new lexical knowledge. There are two widespread text processing methods regarding the target word’s context definition: • word context viewed as unstructured text on the left and right of the target word (n-word window); • context is a set of word which are connected with target word by syntactical relations. First approach has low precision. The solid numbers of manually defined syntactical patterns are needed for the second approach. In [15] Yarowsky describes word collocation as a precise and simple method of the word sense extraction from a text. In [16] the unity of the sense of the word in collocation is proved (one sense per collocation approach). We suggest using the Genitive Construction (GC) of the Russian language as a basis of the text processing in relation to the target word. GC helps to avoid lacks of two standard approaches and manual definition of the Russian collocations is not needed.

Lexical knowledge can be derived from GC and then it can be inserted into lexicon. We argue that in the similar GСs their parts have a common meaning and this meaning can be extracted using FCA. In the next two subsections we’ll give the formalization of GC’s semantics to prove the common meaning availability. We suggest using the Intensional Logic (IL) [10] to formalize GC’s semantics.

2.2 Intensional Logic

Now we give short introduction into the IL. IL has a rich system of types. There are two basic types: e – entities and t – truth values. All other types are function types. Lambda-abstraction operator (λ) is a main instrument for the expression’s building. In the IL the semantic expressions are close to syntactic structures of the natural language. To apply IL to the description of the lexical semantic we are using Partee approach [11] based on meaning postulates.

Meaning postulate is the notion that lexeme can be defined in terms of relations with other lexemes. For formalization purpose let’s consider meaning postulate as axiom (formula) which associates word with other words of the language. Words are just labels in the IL. For the each set Σ of closed formulas there is corresponding class Σ* of all models in which all formulas from Σ are true. The class Σ* is called an axiomatizable class of models, and the set Σ is called the set of its axioms. But in Σ*, not only the axioms of Σ may be true. The set Σ** of all closed formulas which are true in Σ* is called a theory, and the formulas of Σ** are called the theorems of the theory Σ**. The axioms are a subset of the theorems. For each word wi let form(wi) be a form and mng(wi) - a meaning.

Sorts [2, 12] are elements of the ‘naive ontology’ [3] of the language. It is a way to semantically classify nominal predicates. In [12] Partee showed how sorts can be used to form restrictions to GC accuracy. Therefore GC restricts and specifies lexical meaning of GС’s components. The word meaning defined by the theory includes meaning postulates from the theories of all sorts the word belongs to and additional postulates which are specific only to this word.

2.3 Semantic of Genitive Construction

We adapted the Partee approach from [12] for the GC’s formalization. The GC (for example John’s brother) consists of three components: Head Noun (HN) (for example, brother), Genitive Phrase (GP) (for example, John’s) and genitive relation. Outside the GC the HN can be defined by expression λx[S(x)] (for non-relational nouns) or λxy[S (x)( y)] (for relational nouns). A noun phrase has a type of <<e, t>, t> and is defined by λP[P(c)] function, where с is individual constant of type <e>. These predicate set described by function can be interpreted as a set of properties of the individual constant of the type <e>. The formula w → λPi[∨i Pi ( x)] defines the set of meaning postulates of the non-relational HN (word w), where w has a type of <e>, Pi – predicate of a type of <e, t>. The word has always the Let fss(w) be a set of properties of the word w which belongs to the sort s. According to [12] the GP is always defined by the following expression: Gg = λyλR[λx[R( y)(x)]], where y – the meaning of the GP, x – argument variable, R – predicative variable. same type as the sort this word belongs to. The theory of a sort (Ts) consists of properties related by the logical operators. If word the w ( mng(w) = λP[P(x)] ) belongs to the sort s then the following expression is true:

∃Ts (λP[∃z(P(z) ∧ (∀x(P(x) → Ts (z)(x))))]).

If the HN is the non-relational noun of type <e, t> then inside the GC’s special modifier makes a typeshift from the type <e, t> to the relational type <e, <e,t>>. In addition the modifier connects the HN with GP to form the accurate GC. Sorts s1 and s2 satisfy the selective restrictions of the same modifier Sft if Ts1↔ Ts2 or there is a sort s which is also acceptable for the modifier Sft and Ts1 → Ts ∧ Ts2 → Ts (where Ts1, Ts2, Ts, are the theories of the sorts s1, s2, s). According to compositional approach we use, the word meaning cannot be decomposed into the elementary primitives. So that we can not demand that sorts s1 and s2 have the common elementary property. The common property P has to follow from the sort’s theories Ts1 and Ts2: λP[∃T(∃z(P(z) ∧∀x(T(x) → P(x)) ∧∀x1(Ts1(x1) → P(x1))∧∀x2(Ts2 (x2) → P(x2))))]. (3) ( 1 ) ( 2 ) ( 4 ) ( 5 ) ( 6 ) According to ( 6 ) the GC always includes at least one axiom from the theory of the HN and meaning postulates from the genitive relation. The GP is included into the GC’s expression as the individual constant.

The meaning of the GC’s components can be derived from the GC if for each GC found in corpora we have an expression according to ( 6 ). To get this expression for each GC’s sort the modifier expression according to ( 4 ) is needed. The modifier expression needs meaning postulates which can be constructed manually by lexicographers. Our purpose is different; we want to derive lexical knowledge automatically from the corpora and the manual expressions cannot be used. From the corpora’s parsing results we get only the individual genitive constructions and know where HN and GP are. Thus the meaning (propertyλP[P(x)] ) of HN and GP cannot be directly derived from corpora automatically and the additional steps are needed.

Let Gc1 and Gc2 be GCs which belong to the same sort ( Gc1∈ Sortk and Gc2 ∈ Sortk ). So the corresponding genitive relations (R1gen and R2gen) are completely defined by the GC’s sorts. Therefore R1gen=R2gen and the same selective restrictions are applied to the GC’s components (Gc1s/ Gc1gg and Gc2s/Gc2gg). In this case according to (3) the theories of HN or GP include the common property P. Let The genitive relation connects HN and GP. The genitive relation is also defined by the meaning postulates. The sort of the genitive relation is always equal to the sort of the GC. The genitive relation has a type of <e, <e,t>>.

The modifier consists of the HN’s theory and the theory of the genitive relation. The modifier has a type of <e,<e,t>>: Sft = λaλ b[Rgen (a)(b) ∧ ( fss (w))] , where w – the word which belongs to the sort s ( w ∈ Sorts ), Rgen – the theory of the genitive relation. Applying the modifier function to the definition of the GP ( 2 ) we’ll get the GC definition: Gc = λyλR[λx[R( y)(x)]](mng(wgg ))(λaλb[Rgen (a)(b) ∧ fss (ws )]) , where wgg – GP, ws – HN. Applying lambda conversion rules we’ll get the final GC definition: Gc = λx[λaλb[Rgen (a)(b) ∧ ( fss (ws ))](mng(wgg ))(x)] w1 and w2 be the definitions for the appropriate HNs of Gc1 and Gc2 which belong to the sorts s1 and s2 ( w1∈ Sorts1 and w2 ∈ Sorts2 ). The property set P1 is the w1 word’s theory ( mng(w1 ) = λP1[P1(x)] ) and P2 is the w2 word’s theory ( mng(w2 ) = λP2[P2(x)] ). The theories P1 and P2 are not completely the same in the general case but they must include the common property P according to (3). If we consider w1’s and w2’s theories only as a common property P then from ( 1 ) we get expression for two head nouns used in the genitive constructions of the same sort: λP1[∃z1(P1(z1) ∧ (∀x1(P1(x1) → P(z1)( x1))))] ∧ λP2[∃z2(P2(z2) ∧ (∀x2(P2(x2) → P(z2)(x2))))] ( 7 ) Inside the genitive constructions of the same sort derived from the corpora according to ( 7 ) the HNs have the common property. The common property P can be acquired by comparing these HNs’ meanings. The same is true for the GPs.

We suggest the following approximation to recognize if GCs obtained from the corpora belong to the same sort. With the certain probability two GCs belong to the same sort if their HNs or GPs are the same. To acquire common property P we can not use meaning postulates directly so we are using forms of the HNs and GPs.

3. Concept-oriented lexicon model 3.1 Formal Concept Analysis

In this subsection basic FCA definitions are given according to [7] to clarify further reasoning. Let G and M be sets called object’s and attribute’s sets respectively and I ⊆ G × M is a binary relation. If g ∈ G and m ∈ M then gIm is interpreted as “the object g has the attribute m”. A triple (G, M, I) is called a formal context. For A ⊆ G and B ⊆ M the prime-operator is defined as

A′ = {m ∈ M | ∀ g ∈ A : gIm}, B′ = {g ∈ G | ∀ m ∈ B : gIm}. ( 8 ) A pair (A,B) is formal concept of context (G, M, I) if and only if A ⊆ G, B ⊆ M и A′ = B, B′ = A . A is called the extent and B the intent of the concept (A, B). The concepts (A1, B1) and (A2, B2) of a given context are ordered by the subconcept-superconcept relation if A ⊆ A ( B ⊆ B ) and we write (A1, 1 2 2 1 B1) ≤ (A2, B2). The ordered set of all formal concepts of (G, M, I) is denoted by B(G, M, I) and is called the concept lattice of (G, M, I). A set of formal concepts is called chain if any two of its elements are comparable ( C1 ≤ C2 or C1 > C2 ).

3.2 Lexicon structure

Let Vs be the set of HNs ( vs ∈Vs ) and Vgg is the set of GPs ( vgg ∈Vgg ). The pair (vgg, vs) is the accurate GC if there exists the modifier function Sft and sorts of vs and vgg satisfy the selective restrictions of the Sft. The binary relation I is the set of pairs (vgg,vs) of the accurate GCs and I ⊆ Vgg xVs . If vggIvs then the GC was derived from corpora with vs as HN and vgg as GP and the substitution of vs and vgg to the ( 6 ) gives the accurate GC.

The relation I can be presented as the formal context K=(Vgg,Vs,I). Head nouns are the attributes of objects (GPs) which mean that objects have the common properties. The formal context can be also defined as K=(Vs, Vgg, I) then the common properties of the HNs are derived by common attributes (GPs). The complete lattice B(Vgg,Vs,I) can be build upon formal context K with order relation. The example of the formal lattice for the set of genitive constructions derived from Moshkov’s library (www.lib.ru) [17] is given in the Figure 2 according to the formal context in the Figure 1.

The formal concept ( A, A′) has extent and intent. For example the formal concept marked as ‘ФП1’ at the Figure 2 has extent A={Пиво, Вода} and intent A′ ={Банка, Бутылка, Стакан}. All objects at the extent of the formal concept have the same set of common properties defined as formal concept intent A′ . The set of attributes can be viewed as the gloss of the objects and presents the lexical knowledge acquired from the text.

Using the concept lexicon (lattice) the accurate genitive constructions can be read. The order relation ( ≤ ) in the lexicon defines the word hierarchy. Thus the formal lattice is the lexicon that can be used for question answering.

The longer the highest possible chain in the lattice is the better the lexicon satisfies the NLP needs. So we enlarge the formal context Kg=(Vgg,Vs ∪ Vg ,I), where Vg – is the set of verbs which use corresponding GC as verb’s argument in corpora. For example, drink glass of water, vg = drink and vg ∈ Vg .

4. Concept’s classes acquisition

In the Section 3 we present the lexicon based on the lattice. The lexicon includes sorts (lexeme classes) identical to formal concepts. The number of formal concepts in the lattice is usually bigger than number of input events. So the method to get classes of formal concepts is needed. The class of formal concept (sort) is a higher level of abstraction than a separate formal concept from initial concept lattice L. In this section we present the lattice segmentation algorithm to extract classes from initial lexicon (concept lattice).

Any subset of the formal concepts always has the Least Common Superconcept (LCS). Let area of the concept lattice be a set of formal concepts which are related with one LCS. The segmentation algorithm for the initial formal lattice L produces a set of formal lattices {L’} that the following is true: • each lattice Li ∈{L′} partly corresponds to one of the area of initial lattice L; • each formal concept (except top and bottom concepts) from the initial lattice belongs to one and only one lattice from {L’}.

The areas in the initial lattice can overlap. Thus we require only partial correspondence between the lattices from {L’} and areas of the initial lattice L. The ambiguous formal concept is the formal concept C of the initial lattice L that belongs to the different areas of the lattice L and LCSs of these areas are incomparable.

We require the resulted classes to include maximum number of the formal concepts. Hence the algorithm has to use immediate subconcepts of the top concept of lattice L as LCS of the areas to form the lattices Li ∈{L′}.

The proposed algorithm uses the following criteria to derive lattices Li ∈{L′} from the lattice L: each formal concept C ∈ Li has to be more similar to the formal concepts from the area corresponding to the lattice Li than to the formal concepts from the other areas. We suggest calculating the similarity measure between the formal concepts as: spc(Ci ,C j ) = − log(1 − Dc ) × pathc

| B | | Bi \ B | + | Bj \ B | + | B | , ( 9 ) where formal concept C=(A,B) is the LCS of the formal concepts Ci and Cj; Dc is the number of formal concepts in the chain in which top concept and formal concept C are maximal and minimal formal concepts; pathC is the minimal number of formal concepts in the chain which includes top and bottom concepts and also formal concept C. The similarity measure take into account the volume of common information for two concepts (|B|), the volume of the information specific for each individual concept Ci and Cj (|Bi| and |Bj|), the specificity of the common information (Dc), and the different length of the lattice hierarchies (pathC). The similarity measure is only calculated for ambiguous concept and it’s immediate superconcepts.

The segmentation algorithm includes each immediate subconcept Ci of the top concept of lattice L into the corresponding resulted lattice Li. Further all subconcepts of the formal concept Ci in the lattice L are included into lattice Li if for the subconcept Cj each its immediate superconcept is also subconcept of the formal concept Ci and Cj does not have other superconcepts which are incomparable with Ci or coincides with Ci. Otherwise the formal concept Cj is the ambiguous formal concept. It has to be placed into the same classes with its immediate superconcept with which it has the maximal similarity measure value according to ( 9 ). If the maximal similarity measure value is the same for several superconcepts then Cj is included into the class with the biggest number of items.

In the Figure 3 the example of concept classes’ acquisition is given. Figure on the left is the initial lattice L. In the right figure there are three classes L1, L2, L3. Concepts from the class L1 are more similar to each other than to the concepts from L2 and L3 classes. The same is true for concepts of L2 and L3 classes. Why does, for example, C3 concept belong to L1 class, instead of L2 class? C3 concept has two super-concepts – C2 and C5. C3 concept is more similar to C2 concept than to C5 concept because spc(C3 ,C2 ) = − log( 1− 2 ) × 3 = 0,5527 and spc(C3 ,C5 ) = − log( 1 − 2 ) × 1 = 0,2500 and spc(C3 ,C2 ) < spc(C3,C5 ) . 5 1+ 0 + 3 4 3 + 0 +1

5. QA tasks and concept-oriented lexicon

The standard QA process is described in the Figure 4. The question type recognition process is the first step in the QA. We use three basic question types: simple Yes/No questions, definition question, WHquestions, and also its subtypes. The lexicon is used to distinguish between different types of definition questions (function, type, etc.) and WH-questions (for example “Which county … ?” asks for country name). The question is parsed and key question words are recognized using syntactic patterns. To detect question subtype we first find the concept which has main key word in its extent. Next we look for its super-concepts. Objects from its extent define question subtype. Key words form query. The lexicon also helps to make query expansion by adding objects from the sibling concepts.

The answer can be found in the local answers database or using the information retrieval engine. Information retrieval engine returns text paragraphs with potential answers. Each paragraph is parsed and key answer words are recognized using syntactic patterns.

Next the concept lexicon is used to evaluate found paragraphs. We suggest semantic score metric to compare query and answers from the paragraphs: 1,ifCa = Cq  SpcLv (Ca ,Cq ),ifCa < CqorCa > Cq Sem _ score = min(SpcLv (Ca ,C), SpcLv (Cq ,C)),if∃C ∈ Lv | ,  ¬ Cq < C & Ca < C & ∃Ch | Ck < C & Cq < Ck & Ca < Ck ( 10 ) where q – key question word, a – key answer word from a paragraph, Cq = (q′′, q′) , Ca = (a ′, a′) , Lv – lexicon lattice, SpcLv (Cx ,C y ) – similarity measure between Cx and Cy concepts normalized to maximum similarity measure of the lattice Lv, C – LCS for Cq and Ca concepts.

The final answer is formed from the paragraph with the highest semantic score.

6. Evaluation 6.1 Implementation

To evaluate proposed approach we use text collection from [17] of 85 millions words. From parsed corpora objects (GPs) and attributes (HNs) are derived. The context parameters are given in the Table 1.

Parameter

Number of objects, (|G|) Number of attributes, (|M|) Context size, (|I|=|G| × |M|) Average number of attributes per object Maximum number of attributes per object

Value

Before giving the results we will describe the lattice generation method. We have to choose the best algorithm for lexicon lattice generation in term of performance because number of objects and attributes is big (see Table 1). In addition it should be an incremental algorithm because new GCs appear constantly. The choice of algorithm depends on formal context type. Table 2 gives the distribution analysis of the number of attributes per object.

Number of attributes per object Number of objects Number of objects, % > 1000 18 0,30

Most of the objects (77%) have less than 20 attributes. According to [14] and [6] Ferre algorithm shows the best performance in this situation. But in practice for our context the performance was very pure. The reason of it is that other 23% of objects have huge number of attributes. According to [14] the Norris [9] algorithm shows the best performance on the context with huge number of attributes per object. So we suggest combining Ferre and Norris algorithm: use Ferre for objects with small number of attributes and Norris for objects with huge number of attributes. To choose the switching threshold we analyze the time for adding new object for Ferre ( 11 ) and Norris ( 12 ) algorithms. where n – number of objects, m – number of attributes, k – maximum number of atributes per one object, l – number of concepts in the lattice. than Ferre algorithm should be used, otherwise Norris algorithm.

The results of experiments are given in the Table 3 for the text of 4 millions words. Objects were sorted by number of attributes in the descending order. One by one each object was added into formal context. Switching threshold (k) was changing from 120 to 30. Each row in the table corresponds to the particular switching threshold (k) and n, m, l parameters correspond to the lattice in the moment of switching between Norris and Ferre. Generation time is given for the complete lattice building process. The threshold equal to 50 corresponds to the best generation time – 81seconds. It proves the condition in ( 13 ).

6.2 Comparison

In order to evaluate our approach we compared concept-oriented lexicon with Abramov’s Dictionary of Synonyms (ADS). We used this dictionary as a “gold standard” because it is freely available and for Russian language it has largest coverage area (19108 articles). Recall and Precision are calculated according ( 14 ) and ( 15 ): where A – set of lexemes recognized as synonyms in ADS and concept-oriented lexicon, B - set of lexemes recognized as synonyms in ADS but not in concept-oriented lexicon, C - set of lexemes recognized as synonyms in concept-oriented lexicon but not in ADS.

In the experiment the concept-oriented lexicon was built upon the corpora of about 17 millions words. The experiment was conducted for the 50 most frequent lexemes of the Russian language. The results were the following: Recall=24,36% and Precision=9,78%. Low Precision is explained by the fact that concept-oriented lexicon has much bigger coverage than ADS. Also ADS includes very specific synonyms that decrease Recall. So the better “gold standard” is needed.

7. Conclusions and further work

In this paper we have proposed the novel approach to lexical resources to boost the QA-system performance. Our approach is based on the lattice theory and the genitive constructions. It derives new lexical information automatically from corpora and adds lexemes into the lexicon. The lexicon has hierarchical structure and is presented by a lattice. A formal concept is the basic item of the lexicon. The formal concept joins objects from its extent with interpretation (gloss) from its intent. This paper presents similarity measure and segmentation algorithm to derive classes with the several formal concepts from the initial lexicon. These classes correspond to the lexical sorts of different degree of granularity. The suggested lexicon is applied to the several problems of the question answering. It has been shown how to use concept-oriented lexicon in the question type detection, and the answers finding. In the further researches we would like to apply our approach to the languages other than Russian language and integrate concept-oriented lexicon with the industrial QA-system.

1. И. В. Азарова, О. А. Митрофанова, А. А. Синопальникова. Компьютерный тезаурус русского языка типа WORDNET . Труды международной конференции Диалог' 2003 , Протвино, 2003 .

2. Борщев

В.Б.

, Кнорина

.В. Типы реалий и их языковое восприятие. В сб. "Вопросы кибернетики. Язык логики и логика языка" . М.: 1990 . C. 106 - 134 .

4. Cimiano

, Hotho

, Staab S . Learning Concept Hierarchies from Text Corpora using Formal Concept Anaylsis . Journal of Artificial Intelligence Research . Volume 24 , pages 305 - 339 August 2005 .

5. Fellbaum

WordNet: An Electronic Lexical Database . Cambridge, 1998 .

6. Ferré

The Use of Associative Concepts for Fast Incremental Concept Formation in Sparse Contexts . In B. Ganter and A. de Moor editors, Using Conceptual Structures, Contributions to ICCS 2003 , 2003 .

7. Ganter

and Wille

, Formal Concept Analysis - Mathematical Foundations . Berlin: SpringerVerlag, 1999 .

8. Mann

G.S.

, Fine-Grained Proper Noun Ontologies for Question Answering , SemaNet'02: Building and Using Semantic Networks , 2002 .

9. Norris

E. M.,

An algorithm for computing the maximal rectangles in a binary relation // Revue Roumaine de Matheґmatiques Pures et Appliqueґes, 23 ( 2 ), 1978 . - pp. 243 - 250 .

10. Gerasimova

I. A..

Formal grammar and intensional logic . - Moscow, 2000 .

11. Partee B.H. Formal Semantics , Lectures. RGGU, February 14 , 2003 .

12. Partee

B.H.

, Borschev

V.B.

Genitives , types, and sorts . In Possessives and Beyond: Semantics and Syntax, eds. Ji-yung Kim , Yury

Lander and Barbara H. Partee . Amherst, MA: GLSA Publications, 2004 . - PP 29- 43 .

13. Priss , Uta. Linguistic Applications of Formal Concept Analysis . In: Ganter; Stumme; Wille (eds.), Formal Concept Analysis, Foundations and Applications . Springer Verlag. LNAI 3626 , 2005 , p. 149 - 160 .

14. Kuznetsov

S.O.

, Obiedkov

S.A.

Comparing performance of algorithms for generating concept lattices //

Journal of Experimental & Theoretical Artificial Intelligence , Volume 14 , Issue 2 & 3 , 2002 . - pp. 189 - 216 .

15. Yarowsky

Unsupervised word sense disambiguation rivaling supervised methods . Proceedings of the 33rd annual meeting on Association for Computational Linguistics , Association for Computational Linguistics, Morristown, NJ, USA, 1995 , P. 189 - 196 .

16. Yarowsky

One sense per collocation . In the Proceedings of ARPA Human Language Technology Workshop , Association for Computational Linguistics, Morristown, NJ, USA, 1993 . P. 266 - 271 .

17.www.lib.ru