=Paper=
{{Paper
|id=Vol-31/paper-3
|storemode=property
|title=Designing Clustering Methods for Ontology Building - The Mo'K Workbench
|pdfUrl=https://ceur-ws.org/Vol-31/GBisson_7.pdf
|volume=Vol-31
|dblpUrl=https://dblp.org/rec/conf/ecai/BissonNC00
}}
==Designing Clustering Methods for Ontology Building - The Mo'K Workbench==
<pdf width="1500px">https://ceur-ws.org/Vol-31/GBisson_7.pdf</pdf>
<pre>
          Designing clustering methods for ontology building:
                         The Mo’K workbench
                                 Gilles Bisson1 , Claire Nédellec2 and Dolores Cañamero2

Abstract. This paper describes Mo’K, a configurable                          place, as described for instance in [12]. Evaluation criteria
workbench that supports the development of conceptual                        proposed to assess learning results are purely quantitative, and
clustering methods for ontology building. Mo’K is intended to                comparative analyses of these criteria are rare [5]. A proper
assist ontology developers in the exploratory process of                     characterization of the effects that different methods have on
defining the most suitable learning methods for a given task.                the learning results would provide methodological guidelines
To do so, it provides facilities for evaluation, comparison,                 to help the designer select the most suitable method for a given
characterization and elaboration of conceptual clustering                    corpus and task, or to provide support to create a new one.
methods. Also, the model underlying Mo’K permits a fine-                        This observation also applies to classification algorithms.
grained definition of similarity measures and class                          No meth odology or tool has been proposed to support the
construction operators, easing the tasks of method                           elaboration of conceptual clustering algorithms that build task-
instantiation and configuration. This paper presents some                    specific ontologies. Work on conceptual clustering (e.g., [19],
experimental results that illustrate the suitability of the model            [8], [9], [2], [1], [26]) has not been extensively applied to the
to help characterize and assess the performance of different                 problem of learning from corpora. One must however
methods that learn semantic classes from parsed corpora.                     acknowledge that the application of conceptual clustering
                                                                             techniques to this domain is not straightforward, as existing
                                                                             algorithms must be previously adapted. As in the case of
1. INTRODUCTION                                                              distances, the elaboration and selection of a suitable algorithm
                                                                             for a given corpus and task requires the development of new
In this paper we propose a workbench that supports the                       methodological guidelines and tools.
development of conceptual clustering methods for the (semi-)                    As a first step toward this goal we propose Mo’K, a
automatic construction of ontologies of a conceptual hierarchy               configurable workbench to support the comparison, evaluation,
type from parsed corpora. The elaboration of any clustering                  and elaboration of methods to learn conceptual hierarchies. The
method involves the definition of two main elements—a                        conceptual clustering model underlying Mo’K permits a fine-
distance metrics and a classification algorithm. In the context              grained definition of the components of distances and of class
of conceptual hierarchy formation, the Natural Language                      construction operators, easing the tasks of method
Processing (NLP) community has investigated the notion of                    instantiation and configuration. The model is extended with a
distance to elaborate the semantic classes underlying hierar-                set of variables that permit to characterize features specific to
chies. Classification algorithms have been broadly studied                   the elaboration of learning corpora, such as pruning, stop-lists,
within the Machine Learning and Data Analysis communities.                   etc. The workbench also includes evaluation criteria to assess
   Different tools have been developed for the automatic or                  learning results obtained for different parameter configurations.
semi-automatic acquisition of semantic classes from “near”                   We finally present some experimental results that illustrate the
terms. The notion of semantic proximity is based upon distance               suitability of the model to help characterize different methods
among terms, defined as a function of the degree of similarity of            and assess their performance. These results concern only class
the contexts. Descriptions of term contexts (the learning                    formation, not classification algorithms.
examples) and of the regularities to be sought vary in different
approaches. Contexts can be purely graphic—words co-
occurring within a window—as in the case of [1], [4]; in some                2. FRAMEWORK
cases, the window can cover the whole document (see e.g. [21]).
Contexts can also be syntactic, as in the approaches that we                 2.1 Learning semantic classes
have taken into account to develop our model, e.g. [13], [14],
[11], [20], [5], [26], [7]. However, the selection of a suitable             In the context of learning semantic classes, learning from
distance for a given corpus and task is still an open problem                syntactic contexts exploits syntactic relations among words to
that has not received much attention so far [25]. In most cases,             derive semantic relations, following Harris’ hypothesis [15].
the criteria proposed to support this choice rely on the                     According to this hypothesis, the study of syntactic
evaluation of the application task for which learning takes                  regularities within a specialized corpus permits to identify
                                                                             syntactic schemata made out of combinations of word classes
1
    HELIX Project, INRIA Rhône-Alpes, ZIRST, 655 Avenue de l’Europe,         reflecting specific domain knowledge. The fact of using
    F–38330 Montbonnot, email: Gilles.Bisson@imag.fr                         specialized corpora eases the learning task, given that we have
2
    Inference and Learning Group, LRI, Bât. 490, CNRS UMR 8623 &             to deal with a limited vocabulary with reduced polysemy, and
    Université de Paris-Sud, F–91405 Orsay Cedex, email: {cn, lola}@lri.fr   limited syntactic variability.
   In syntactic approaches, learning results can be of different       experiments, we depart from this practice to compare learned
types, depending on the method employed. They can be                   classes, as we are interested in an extensional representation; we
distances that reflect the degree of similarity among terms [13],      therefore use classes formed by the union of attributes of near
[26], [22], distance-based term classes elaborated with the help       objects (Figure 1.b).
of nearest-neighbor methods [11], [14], degrees of membership
                                                                                                                                                N123
in term classes [24], class hierarchies formed by conceptual                             A2                          A3 A5
                                                                               A1                 A3              A1 3     A6
clustering [20], or predicative schemata that use concepts to                            5                              1
                                                                                                                                            N12
                                                                                                  20               15            12
constraint selection [1], [10], [7]. The notion of distance is                  10
                                                                                                       distance
                                                                                         N1                             N2                N1    N2        N3
fundamental in all cases, as it allows to calculate the degree of
proximity between two objects—terms in this case—as a                                             Figure 1.a. Object classes (= Nouns).
function of the degree of similarity between the syntactic
contexts in which they appear. Classes built by aggregation of                                                                                  A126

near terms can afterwards be used for different applications,                        N3 N4                             N3 N5                A12
                                                                               N1                 N6              N1             N7
such as syntactic disambiguation [23], [24] or document                              2        8
                                                                                                                        1    5            A1    A2        A6
                                                                                10                5                5             20
retrieval [11]. Distances are however calculated using the same                                        distance                       N4 N6 N1 N3 N5 N7
                                                                                     A1                                 A2
similarity notion in all cases, and our model relies on these
studies regardless of the application task.                                                   Figure 1.b. Attribute classes (= Nouns).

2.2 Conceptual Clustering                                              Let us take an example. If the objects <Cause Dobj> and <Result
                                                                       from(Adj)> are selected to form a class, their attribute sets are
In our case, ontologies are organized as multiple hierarchies          merged. Let us suppose that <Cause Dobj> is described by the
that form an acyclic graph where nodes are term categories             nouns {decrease, increase, modification, loss, etc.}, and
described by intention, and li nks represent inclusion, seen in        <Result from(Adj)> by {decrease, increase, composition,
this case as a generality relation. Learning though hierarchical       evolution, etc.}. The noun class learned will include nouns
classification of a set of objects can be performed in two main        shared by both objects (in bold), and also the complementary
ways: top-down, by incremental specialization of classes, and          terms (in italics); therefore, four new triplets are induced. We
bottom-up, by incremental generalization. We have adopted a            will then use the “attribute class” strategy, as th is way the “leaf”
bottom-up approach due to its smaller algorithmic complexity,          classes that we will later evaluate will be larger than those
and its understandability to the user in view of an interactive        formed using the “object class” strategy. We will not further
validation task. In this article we focus on the elements needed       develop here the differences between these two viewpoints,
to build and evaluate the basic classes of this graph, i.e. criteria   intension and extension, since this topic is out of the scope of
for building the initial corpus, distances, and evaluation             the paper. Let us however insist on the fact that the selection of
criteria to asses results. We do not address the generic class         one or the other has major effects on the learning results.
construction algorithm. With respect to this latter, let us just
mention that the application of hierarchical (conceptual [19] or
numerical [6]) clustering algorithms to our problem is not             3.2 Corpus parameters
straightforward, given that we must build acyclic graphs with          The parameters used to form a learning corpus in Mo’K include,
few abstraction levels, rather than deep and strict hierarchies.       among others, selection of learning examples, level of pruning,
                                                                       and “cleaning” of the corpus. Let us examine the first two.
3. THE MO’K MODEL AND WORKBENCH
                                                                       3.2.1 Selection of learning examples
3.1 Representation of examples and results                             One of the goals of our model is to allow the user to compare
Following the standard practice, we use binary grammatical             learning results as a function of the grammatical relations
relations as syntactic contexts. Examples are therefore                selected as input. Objects and syntactic contexts used in
represented by triplets <Head – Grammatical Relation –                 classification vary in different approaches—i.e. verbs or nouns
Modifier head>, where <Modifier head> is the object that must          which are considered as similar on the grounds of their shared
be classified and <Head – Grammatical Relation> represents the         verbal or nominal contexts, where nouns can be verb
attribute. The number of occurrences of a triplet in a corpus          complement heads (arguments [14], or adjuncts [7]), noun
characterizes the attribute for an example. For instance, if we are    complements [11] or all of them [5], [13]. None of these
interested in verbal attachments, the following two sentences:         approaches proposes a comparative study of results based on
  • This causes a decrease in […].                                     grammatical relations chosen in the initial corpus. Our model
                                                                       easily allows to specify these relations. Experiments
  • This high rate results from an increase in […].                    concerning verbal relations reported below (Section 4.2)
allow to generate two triplets, <cause Dobj decrease > (29), and       illustrate this and show significant differences among results
<Result from(Adj) increase > (2), both presenting the structure        depending on the nature of objects and attributes (whether they
<Verb – grammatical relation– Head noun> (total number of              are nouns or verbs), and on the type of corpus.
occurrences of these triads in the corpus). In the remaining of
the paper, we will designate by Action the tuple <Verb –               3.2.2 Pruning
grammatical relation>. Actions <Cause Dobj> and <Result
from(Adj)> can be regarded as objects, and nouns <Decrease>            A second parameter, taken into account by most existing
and <Increase> can be considered as attributes with values 29          methods and included in our model, concerns corpus pruning
and 2, respectively.                                                   as a function of the number of occurrences of an element.
   In bottom-up clustering, couples of near objects or of              Pruning removes occurrences that are too infrequent and
objects and classes are incrementally grouped in order to form         therefore would cause noise, as well as those which are too
hierarchies or graphs of object classes. The standard in NLP is        frequent and do not provide any information regarding the link
to use object classes (Figure 1.a) for the application task. In our    between an object and an attribute. The other side of the coin is
that infrequent but important cases can be removed. Our model       that follow a schema based on comparison of pairs of
also allows to specify the minimum number of examples               distribution profiles. Let us no te that we do not make more
characterizing an attribute and the minimal number of attributes    specific hypotheses concerning the formal properties of
for an example and, for each of these constraints, the minimal      measures—they can be similarities or dissimilarities,
total number of occurrences of the triplets being considered.       symmetrical or asymmetrical, and computed information can be
The experiments reported in Section 4.3 show that the level of      of any type. This approach thus favors the comparison of
pruning has a major impact on the results of learning, and that     existing methods, but also the elaboration of variants of these
the optimal level depends on the corpus.                            methods and even the creation of new ones. Once integrated in
                                                                    Mo'K, a method can access all the test and conceptual clustering
3.3 Distance Modeling                                               resources of the system.

Our goal is not to cover all the possible methods that can be                        Name of the step                                 Method
used to measure similarity between examples. On the contrary,        Initialization of the weight of each example E: W(E)     Init_Weight_Example
our approach focuses on methods with very precise features:          Initialization of the weight of each attribute A: W(A)   Init_Weight_Attribute
                                                                     For each example E
• They take syntactic analysis as input;                                   For each attribute A of the example
• They do not take into account external resources (e.g.                          Calculate W(A) in the context of E
                                                                           Update global W(E)
                                                                                                                              Eval_Weight_Example
                                                                                                                              Eval_Weight_Attribute
  ontologies such as WordNet [18]);                                        For each attribute A of the example
                                                                                  Normalization of the W(A) by W(E)           Init_Similarity
• They are based on a comparison of the distribution profiles of
  the attributes describing the couples of object to classify.            Table 1. Functions implementing the weighting phase in Mo’K.
Different methods have been proposed in the NLP literature
with in this framework—among others [14], [11] [13], [5] and        3.4 Distance evaluation
[7]. We have developed a generic model of these methods and
implemented it in Mo’K with the aim of elaborating a                Even though our goal is the construction of hierarchies, it is
comparison and evaluation methodology for them. In order to         interesting to evaluate the relevance of a distance metrics with
come up with a generic implementation, we have identified the       respect to more simple tasks and to analyze its behavior as a
steps shared by all these methods, as we will see below. Mo’K is    function of the application domain and of the parameters of
thus a workbench that implements a set of instantiable generic      elaboration of the learning corpus. Mo'K offers different means
methods using an object-oriented representation, as opposed to      of evaluation based on the first N couples of examples built by
the idea of a library of methods. This approach is made possible    binary aggregation, i.e. the first N couples of examples with
by the fact that similarity measures can in general be regarded     highest scores in the similarity matrix.
as a comparison of the “distribution profiles” of couples of
examples. This way, two objects will be considered as               3.4.1 Measure of recall
neighbors if the relative occurrence frequencies of each of their
attributes (i.e. of the syntactic contexts) are close. Learning     As already mentioned in Section 3, the elaboration of a class
examples taken into account in our model can be represented by      gives rise to the induction of new triplets not observed in the
means of a contingency table. Depending on the representation       initial corpus. Therefore, the evaluation process follows the
hypothesis adopted, rows (examples) and columns (attributes)        classical schema of dividing the corpus in two
of the table represent different things. For example, in the        partitions—learning and test. The former is used to build the
experiments reported in Section 4.2 they first represent actions    similarity matrix according to the measure to be evaluated. The
and nouns, respectively, and nouns and actions later on. In any     latter allows to measure the coverage rates of classes, i.e. their
case, a table cell contains the number of occurrences of an         ability to recognize the triplets in the test set. We have adopted
attribute for a given example. This table is obviously very         this evaluation task for two reasons. First, it corresponds to the
sparse, as examples are generally described by a small number       elementary step in every process of bottom-up hierarchical
of attributes (see Figure 2).                                       clustering. Second, from a NLP perspective, it conforms to a
   In practice, computation of similarity can be decomposed         disambiguation task.
into two major st eps—weighting and similarity computation.
• The weighting phase changes every raw value of co-                3.4.2 Measure of precision
  occurrences appearing in the contingency table by a               Despite its interest, the coverage measure only allows to
  coefficient, often normalized, which can be regarded as a         evaluate the recall rate associated with the set N of selected
  weight or measure of the significance of the fact of examples     classes. However, precision—a measure of the ability to avoid
  and attributes co-occurring in the corpus. Its computation can    erroneous recognition of negative examples—is an equally
  entail two steps—the initialization of the weights of             important property of the metrics. In the end, a similarity
  examples and attributes, usually according to their number of     measure that tends to over-generalize and describe object
  occurrences, and the calculation of a normalized weight of the    couples using a large number of attributes would reach high
  relevance of each attribute for each example. Technically, this   coverage rates, but produce classes that lack in meaning and
  weighting phase is implemented in Mo’K using the 5                precision. It is difficult to automatically solve the problem of
  functions described (in pseudo-code) in Table 1.                  evaluating unsupervised learning in the absence of negative
• The similarity computation phase builds a similarity matrix       examples. Given that we do not deal with annotated corpora,
  between couples of examples. Similarity increases as a            and we do not have negative examples, we face this problem in
  function of the number of shared attributes, but the way in       Mo’K by means of automatically generated (artificial) sample
  which similarity between these distributions is calculated        corpora. Following [5], we assume that examples generated this
  varies in the different approaches. In Mo’K this phase is         way will be negatives for the most part. Artifical examples are
  implemented by a single function.                                 formed by randomly choosing an object and an attribute from
We thus see that, by means of 6 functions and using a few lines     the initial corpus, taking care that none of these examples
of code, it is possible to implement most similarity measures       appears in the learning set. We measure coverage rates on
artificial examples using learned classes. Since examples are        minimum number of examples characterizing a given attribute
randomly generated, some positive examples are generated as          has been set to 2, and the minimum number of attributes for an
well (about 0.5% in the studied corpora). Although this rate of      example to 3. We will further comment this setting in Section
artificial positive examples might seem very low, it                 4.3. For each corpus, the first 25 learned classes are evaluated.
unexpectedly constitutes an important part of the artificial         Coverage is measured on the test set, which comprises 20% of
examples covered by learned classes—they cover between 0.5%          the whole corpus, and on the artificial set, which contains
and 2.5% artificial examples in our experiments. Hence, real         50,000 triplets randomly generated (see Section 3.3). Each test
precision can only be evaluated after negative examples in the       has been repeated four times.
artificially generated set have been computed by hand.                  The first experiment has been conducted on Agrovoc. The
   As we will see in the experiments reported in next section, it    classes learned in the action-based representation are twice as
is interesting to measure other criteria in order to assess the      large as those learned in the noun-based representation.
relevance of a similarity measure—for example, the induction         However, induction rates (number of induced triplets divided
rate measuring the ratio between the number of induced triplets      by the number of triplets learned by rote) are very similar (40%
and the total number of triplets learned.                            compared to 38%) and so are precision rates (45% in both
                                                                     cases). Precision rate represents the rate of negative examples
                                                                     among learned examples which cover artificial examples. The
4. EXPERIMENTS AND RESULTS                                           recall measured by the coverage rate of induced triplets on the
                                                                     test set is slightly better for the noun-based representation
The experiments reported here aim to illustrate Mo'K’s               (5.3% compared to 4.7%) but remains quite low. This can be
parameterization possibilities and the impact that different         explained by the level of generality of what is learned: the best
parameter settings have on the learning results. These               couples of learned nouns involved very general terms such as
experiments make thus a case for the use of generic platforms to     [technique-method], [influence-effect], in contrast with the
perform a systematic exploratory analysis in order to obtain         numerous technical words of the corpus. Most of the actions
sensible results in a given domain (corpus).                         characterizing these nouns concern general verbs such as [to
                                                                     present], [to observe], and [to report]. This is confirmed by
4.1 Training corpora                                                 looking at the best pairs of actions such as <to study Dobj> -
                                                                     <to analyze Dobj>. This explains why learned classes are of
We have conducted experiments on two different French                rather poor quality in both representations.
corpora—one contains cooking recipes gathered over the world
wide web; the other, Agrovoc, contains scientific abstracts in       Corpus       Learning     % Induced        tripl./ Recall (test Precision
the agricultural domain, and has been assembled by the INIST                      object       learned tripl.           set)
(Institut de l’Information Scientifique et Technique) of the         Agrovoc      Action       40 %                     4.7 %        45 %
CNRS. We have chosen these two corpora as they differ in                          Nom          38 %                     5.3 %        45 %
generality and amount of technical terms, but are still close        Cooking      Action       34 %                     12 %         32 %
enough to allow for meaningful comparison of results. Both                        Nom          38 %                     9.1 %        52 %
corpora have been analyzed using the same shallow parser. Only
                                                                               Table 2. Experimentation about example representation
verbal relations of the form <Verb – Grammatical Relation –
Noun> have been considered in our experiments. The output of
syntactic parsing is highly noisy (between 30% and 50%                  The experiments on the cooking corpus have built classes of
mistakes) due to several factors such as grammatical and             similar size in both representations. Induction rate is slightly
spelling mistakes, typos, and accentuation errors in the case of     higher for the noun-based representation—38% compared to
the cooking corpus. In Agrovoc, noise is mostly produced by          34%. However, recall on the test set is better in the action-based
the high number of technical terms mistaken by verbs, and of         representation, (12% and 9.1%, respectively). Induced triplets
embedded noun complements which are erroneously attached to          are thus more useful in the case of action-based representation,
verbs. In Agrovoc, only 300 verbs are found versus 18,828            even though they are less numerous. Moreover, the precision
nouns. This is due to the fact that only part of the corpus has      measured by the rate of negative artificial examples covered by
been considered—those triplets with a verb that appears in a         learned examples is much better for the action-based
list of verbs giving rise to nominalizations in the corpus. In the   representation (32% compared to 52%). Precision and recall
cooking corpus we find 1,181 verbs for 3,300 nouns, i.e. the         rates are thus better in this representation, although the rate of
ratio between nouns and verbs is, in average, divided by a factor    induced triplets is smaller than in the noun-based
of 20 with respect to Agrovoc. This reflects a higher                representation. In any case, all rates are much better than the
specialization in this corpus. Finally, Agrovoc is three times as    ones computed for the Agrovoc experiments. A closer
big as the cooking corpus (117,156 triplets for 168,287              examination of the best pairs of actions and nouns confirms the
occurrences in total).                                               idea that overgeneralization is less of a problem here than in
                                                                     Agrovoc. Noun pairs are more precise (e.g., [fridge-freezer],
                                                                     [olive oil–oil]) and described by more technical actions. In the
4.2 Selection of example representation                              same way, the best pairs of actions, such as [absorb Dobj ,
                                                                     evaporate Dobj ], are characterized by nouns (in this case
The first experiment illustrates the importance of the choice of     [vinegar, water, wine, excess, etc.]) which are significantly more
the object to be clustered and of the attribute which                specific than in Agrovoc. The smaller variability of the cooking
characterizes it.We have compared the classes learned by             corpus explains these observations, showing that the larger size
considering actions as objects and head nouns as attributes          of Agrovoc does not improve the meaningfulness of the
(denoted as action-based representation), with those learned         regularities observed.
using nouns as objects and actions as attributes (denoted as            This experiment thus shows that the choice of a
noun-based representation). The comparison (see Table 2) has         representation can have a major impact on the learning results.
been performed on both corpora, the other parameters remaining       It is therefore advisable to select a suitable representation
unchanged. We have applied Asium’s distance. Pruning criteria        before addressing a new domain.
are light—no minimum number of triplet occurrences, the
4.3 Pruning parameters                                                    performance when compared with others, while presenting
                                                                          rather different behaviors. They are thus a representative sample
To illustrate the importance of pruning, two pruning settings             of the subset of methods that we have modeled as characterized
have been applied to both corpora and compared. In both cases             in Section 3.3. These methods have been applied to the corpus
we have used Asium’s distance, objects are actions and                    of cooking recipes. Learning parameters are the same as those
attributes are nouns. The first setting is the one described in the       described in Section 4.2, objects are actions and attributes are
previous section. In the second one, we have set the minimum              names. In addition, we have tested the influence on learning of
number of occurrences for an example (triplet) to 2, in order to          the number of disjointed classes on which recall performance is
remove triplets occurring only once in the corpus, as they may            evaluated. To do this, we have varied this number (in abscissa
represent noise. We have also augmented the values of the                 of the diagrams below) between 10 and 200.
minimal number of examples that must characterize an attribute
(from 2 to 3), and of the minimal number of attributes per object
(from 3 to 5, i.e. in this version the objects being compared
appear in at least 5 different syntactic contexts). As it can be
inferred from the histogram in Fig. 2, this setting excludes 80%
of the corpus, versus 70% in the first setting. We can thus hope
for a more reliable classification, to the risk of removing so
many examples that coverage (recall) is drastically affected. The
experiments show that, for the cooking corpus, induction rate
(32%) and coverage (11.2%) on the whole test set are nearly
                                                                                      Figure 4a&b. Recall rate and Class efficiency
unaffected. On the contrary, the rate of artificial triplets covered
scales by a factor of 3 with respect to the previous rate; we think
                                                                             These diagrams show, respectively, the recall rate of each
that this significant increase indicates that this pruning rate
                                                                          method on the test set (Figure 4a), and the efficiency of classes
increases the rate of erroneously induced examples. In Agrovoc,
                                                                          rate (Figure 4b). The latter is assessed by the ratio between the
induction rate decreases by a factor of three and coverage rate
                                                                          number of triplets learned (by rote and induced), and the
by a factor of two, whereas the rate of artificial triplets covered
                                                                          number of triplets effectively used in the recall test. As we can
scales by a factor superior to 2. Recall is therefore much
                                                                          see in the first diagram, the coverage rate of the three methods
strongly affected by pruning than in the cooking corpus. In
                                                                          grows as expected according to the number of classes
both cases, this new pruning setting gives rise to a decrease in
                                                                          considered, Dagan’s method yielding the best results. On the
performance.
                                                                          contrary, if we pay attention to the efficiency of classes, Asium
                                                                          takes better advantage of the triplets learned. Looking at both
                                                                          diagrams, we can conclude that these methods have different
                                                                          behaviors. Dagan’s gives rise to more general classes (more
                                                                          triplets are learned, but the number of useless ones is higher).
                                                                          Asium constructs more specific classes (fewer triplets learned,
                                                                          but more of them are useful). We can take a closer look at the
                                                                          behavior of these methods and study the quality of induction
                                                                          in terms of the rate of induced triplets which are effectively
                                                                          used in the recall test (Figure 5).

         Figure 2. Example distribution per number of attribute

    However, this conclusion drawn from the evaluation of basic
classes must be tempered in the case of hierarchy formation. In
this case, the more constraining version of pruning allows to
eliminate many non-significant classes that result from the
presence of closely similar actions described by a small number
of attributes. It seems clear that, in a process of hierarchical
clustering, this type of class would cause problems, as it would                             Figure 5. Quality of induction.
alter some groupings. Therefore, the type of pruning to be
applied partly depends on the task to be performed.                          While the previous conclusion is confirmed for Asium and
                                                                          Dagan’s methods, we can also note that Greedy and Dagan’s
                                                                          methods have the same behavior along this criterion. In fact, it
4.4 Comparison of methods                                                 seems that Dagan's method is able to induce more useful
This last experiment illustrates some aspects of the use of Mo'K          triplets than Greedy, whereas this latter tends to learn by rote a
to compare results obtained with different distances. Among the           more representative sample subset of the corpus.
methods that we have tested with Mo’K (such as those proposed                The tests performed on the artificial examples confirm these
by Dagan et al ., Hindle, Grefenstette, and Grishman et al . among        results. Therefore, it seems that the classes learned by Dagan's
the best known in the literature, as well as other distances              method and, to a lesser extent, by Greedy are less robust and
proposed by the authors, e.g. Asium, and Greedy) we have                  present lower precision rates for of the learning parameters
chosen to compare here the distances used in Asium [7], the one           chosen. We must emphasize that these conclusions only apply
proposed by Dagan [5], and Greedy 3. The reason for th is choice          to the recipe corpus. For Agrovoc, results are considerably
relies on the fact that these three methods have shown good               different. These experiments therefore show the importance of
                                                                          going through an exploratory process in order to come up with
3
  Greedy has a simple behavior: it is based on a measure, inspired from   the most suitable methods and representation.
the χ2 , which favors the selection of pairs of examples described by a
large number of attributes (the method is named after this feature).
5. CONCLUSIONS AND PROSPECTS                                                       Velardi (Ed.), Adapting lexical and corpus resources to
                                                                                   sublanguages and applications, Workshop of the 1st Intl. Conf. on
Mo’K is a configurable workbench that supports the                                 Language Resources and Evaluation, pp. 1-8, Granada, Spain, May.
development of conceptual clustering methods for specific                     [8] Fisher D.H. 1987. Knowledge Acquisition via Incremental
ontology building. The learning model proposed here takes                          Conceptual Clustering, Machine Learning Journal 2, pp. 139-172.
                                                                              [9] Gennari J., Langley, P., Fisher D. 1989. Model of Incremental
parsed corpora as input. No additional (terminological or
                                                                                   Concept Formation, Artificial Intelligence Journal, Volume 40, 11-61
semantic) knowledge is used for labeling the input, guiding                   [10] Gomez F. 1998. Linking WordNet Verb Classes to Semantic
learning, or validating the learning results. Preliminary                          Interpretation. In Proceedings of the COLING-ACL Workshop on the
experiments showed that the quality of learning decreases with                     Usage of WordNet in NLP Systems.
the generality of the corpus. This makes somehow unrealistic                  [11] Grefenstette G. 1992. Use of syntactic Context to Produce Term
the use of general ontologies for guiding such learning, as they                   Association Lists for Text Retrieval. In Proceedings of the 15th
seem too incomplete and polysemic to allow for efficient                           International SIGIR'92, Denmark.
learning in specific domains. For example, [16] points out that               [12] Grefenstette G. 1993. Evaluation Techniques for Automatic
40% of the words in canonical form in the titles and abstracts of                  Semantic Extraction: Comparing Syntactic and Window Based
                                                                                   Approaches. In Workshop on Acquisition of Lexical Knowledge form
the Communications of the ACM are not included in the LDOCE
                                                                                   Text, Columbus, OH, June.
(Longman Dictionary of Contemporary English). This problem                    [13] Grishman R., Sterling J. 1994. Generalizing Automaticaly Generated
posed in the case of learning specific ontologies obviously                        Selectional Patterns. In Proceedings of the 16th International
does no t apply in the case of guiding learning of general                         Conference on Computational Linguistics (COLING'94).
semantic classes, as shown in the abundant literature on the                  [14] Hindle D. 1990. Noun classification from predicate-argument
topic (see e.g. [23], [22], [24], [17]). It would however be highly                structure. In Proc. of the 28th Annual Meeting of the Association for
valuable to take advantage of existing ontologies to improve                       Computational Linguistics (ACL'90), pp. 268-275, Pittsburgh.
the quality of learning. We consider that this can be achieved in             [15] Harris Z., Gottfried M., Ryckman T., Mattick Jr P., Daladier A.,
two main ways. First, learning could be improved by the use of                     Harris T. and Harris S. 1989. The form of Information in Science. In
                                                                                   Analysis of Immunology Sublanguages, vol. 104 of Boston Studies in
specific terminologies, dictionaries and nomenclature, such as
                                                                                   the Philosophy of Science. Dordrecht, the Netherlands, Kluwer
SNOMED International in the medical domain [3]. Second, some                       Academic Publishers.
methodological guidelines would be needed to integrate                        [16] Krovetz R and Croft W.B., W. 1991. Lexical Ambiguity and
specialized learned ontologies into more general ontologies                        Information Retrieval. In Lexical Acquisition: exploiting on-line
such as WordNet [18].                                                              resources to build a lexicon ¿ Zernik (Ed.), pp. 45-65, Hillsdale, New
   Although we have focused on a disambiguation-based task,                        Jersey, Lawrence Erlbaum Associates.
other validation tasks could be integrated in Mo’K, such as                   [17] Li H. and Abe N. 1998. Word clustering and disambiguation based
query extension and information extraction. Learning                               on co-occurrence data. In Proceedings of COLING - ACL'98.
specialized ontologies of high quality for these tasks will allow             [18] Miller G. 1990. WordNet: an on-line lexical database, International
                                                                                   Journal of Lexicography, 3(4).
the development of applications in technical and rapidly
                                                                              [19] Michalski R.S., Stepp E. 1983. Learning from Observation :
evolving domains, in which manual acquisition is too costly.                       Conceptual Clustering. In Machine Learning I: an Artificial
In this sense, we have started exploring information extraction                    Intelligence Approach, Tioga, pp. 331-363.
from molecular biology abstracts.                                             [20] Pereira F., Tishby N. and Lee L. 1993. Distributional clustering of
                                                                                   English words. In Proceedings of the 31st Annual Meeting of the
                                                                                   Association for Computational Linguistics ACL'93, p. 183-190.
ACKNOWLEDGEMENTS                                                              [21] Qiu Y. and Frei H. P. 1993. Concept based Query Expansion. In
                                                                                   Proceedings of 16th Annual International ACM SIGIR Conference,
We are grateful to INIST-CNRS for providing the Agrovoc                            pp. 160-169, Pittsburgh, ACM Press.
corpus. This research is partly funded by the French Ministry of              [22] Resnik P. 1995. Using Information Content to evaluate Semantic
Industry under RNRT project Astuxe.                                                Similarity in a Taxonomy, Cognitive Modelling.
                                                                              [23] Resnik P. and Hearst M. A. 1993. Structural Ambiguity and
                                                                                   Conceptual Relations. In Proc. Workshop on Very Large Corpora:
                                                                                   Academic and Industrial Perspectives, pp. 58-64, Ohio State Univ.
REFERENCES                                                                    [24] Ribas F. 1995. On Learning More Appropriate Selectional
[1]   Basili R., Pazienza M. T. and Velardi P. 1996. An empirical symbolic         Restrictions. In Proceedings of EACL'95.
      approach to natural language processing. Artificial Intelligence        [25] Roland D. and Jurafsky D. 1998. How Verb Subcategorization
      Journal 85, pp. 59-99.                                                       Frequencies Are Affected By Corpus Choice. In Proceedings of the
[2]   Bisson G. 1992. Conceptual Clustering in a First Order Logic                 Int'l Conf. Computational Linguistics (COLING'98).
      Representation. In Proceedings of 10th European Conference on           [26] Sekine S., Caroll J. J., Ananiadou S. and Tsujii J. 1992. Automatic
      Artificial Intelligence (ECAI'92), pp. 458-462, Vienna .                     Learning for Semantic Collocation. In Proc. of the 3rd Conference
[3]   Bouaud J., Habert B., Nazarenko A. and Zweigenbaum P. 1997.                  on Applied Natural Language Processing, pp. 104-109.
      Regroupements issus de dépendancess syntaxiques en corpus :             [27] Sparck Jones K. and Barber E. B. 1971. What makes an automatic
      catégorisation et confontation à deux modélisations conceptuelles. In        kyewords classification effective?, Journal of the ASIS, 18: 166-175.
      Actes des Journées Ingénierie des Connaissances, Zacklad E. (Ed.),      [28] Talavera L. and Bejar J. 1998. Efficient construction of
      pp. 207-223, Roscoff, France, May.                                           comprehensible hierarchical clusterings. In Proceedings of the 2nd
[4]   Church K. W. and Hanks P. 1989. Word Association Norms, Mutual               European Symposium on Principles of Data Mining and Knowledge
      Information, and Lexicography, in Proc. of the 27th Annual Meeting           Discovery, PKDD'98. pp. 93-101. Nantes, France. J. M. Zytkow and
      of the Association for Computational Linguistics, pp. 76-83.                 M. Quafafou (eds.) LNAI vol. 1510, Springer Verlag.
[5]   Dagan I., Pereira F., and Lee L. 1994. Similarity-Based Estimation of   [29] Vasco J. J. F., Faicher C., Chouraqui, E. 1996. A knowledge
      Word Co-occurrence Probabilities. In Proceedings of the 32nd                 acquisition tool for multi-perspective concept formation. In
      Annual Meeting of the Association for Computational Linguistics,             proceedings of 9th European Knowledge Acquisition Workshop,
      ACL'94, New Mexico State University, June.                                   EKAW'96, pp. 227-244. Springer Verlag.
[6]   Day W., Edelsbrunner H. 1984. Efficient Algorithms for
      Agglomerative Hierarchical Clustering Methods, Journal of
      Classification. Volume 1. pp. 1-24.
[7]   Faure D. and Nédellec C. 1998. A Corpus-based Conceptual
      Clustering Method for Verb Frames and Ontology Acquisition. In P.

</pre>