=Paper=
{{Paper
|id=Vol-31/paper-3
|storemode=property
|title=Designing Clustering Methods for Ontology Building - The Mo'K Workbench
|pdfUrl=https://ceur-ws.org/Vol-31/GBisson_7.pdf
|volume=Vol-31
|dblpUrl=https://dblp.org/rec/conf/ecai/BissonNC00
}}
==Designing Clustering Methods for Ontology Building - The Mo'K Workbench==
Designing clustering methods for ontology building:
The Mo’K workbench
Gilles Bisson1 , Claire Nédellec2 and Dolores Cañamero2
Abstract. This paper describes Mo’K, a configurable place, as described for instance in [12]. Evaluation criteria
workbench that supports the development of conceptual proposed to assess learning results are purely quantitative, and
clustering methods for ontology building. Mo’K is intended to comparative analyses of these criteria are rare [5]. A proper
assist ontology developers in the exploratory process of characterization of the effects that different methods have on
defining the most suitable learning methods for a given task. the learning results would provide methodological guidelines
To do so, it provides facilities for evaluation, comparison, to help the designer select the most suitable method for a given
characterization and elaboration of conceptual clustering corpus and task, or to provide support to create a new one.
methods. Also, the model underlying Mo’K permits a fine- This observation also applies to classification algorithms.
grained definition of similarity measures and class No meth odology or tool has been proposed to support the
construction operators, easing the tasks of method elaboration of conceptual clustering algorithms that build task-
instantiation and configuration. This paper presents some specific ontologies. Work on conceptual clustering (e.g., [19],
experimental results that illustrate the suitability of the model [8], [9], [2], [1], [26]) has not been extensively applied to the
to help characterize and assess the performance of different problem of learning from corpora. One must however
methods that learn semantic classes from parsed corpora. acknowledge that the application of conceptual clustering
techniques to this domain is not straightforward, as existing
algorithms must be previously adapted. As in the case of
1. INTRODUCTION distances, the elaboration and selection of a suitable algorithm
for a given corpus and task requires the development of new
In this paper we propose a workbench that supports the methodological guidelines and tools.
development of conceptual clustering methods for the (semi-) As a first step toward this goal we propose Mo’K, a
automatic construction of ontologies of a conceptual hierarchy configurable workbench to support the comparison, evaluation,
type from parsed corpora. The elaboration of any clustering and elaboration of methods to learn conceptual hierarchies. The
method involves the definition of two main elements—a conceptual clustering model underlying Mo’K permits a fine-
distance metrics and a classification algorithm. In the context grained definition of the components of distances and of class
of conceptual hierarchy formation, the Natural Language construction operators, easing the tasks of method
Processing (NLP) community has investigated the notion of instantiation and configuration. The model is extended with a
distance to elaborate the semantic classes underlying hierar- set of variables that permit to characterize features specific to
chies. Classification algorithms have been broadly studied the elaboration of learning corpora, such as pruning, stop-lists,
within the Machine Learning and Data Analysis communities. etc. The workbench also includes evaluation criteria to assess
Different tools have been developed for the automatic or learning results obtained for different parameter configurations.
semi-automatic acquisition of semantic classes from “near” We finally present some experimental results that illustrate the
terms. The notion of semantic proximity is based upon distance suitability of the model to help characterize different methods
among terms, defined as a function of the degree of similarity of and assess their performance. These results concern only class
the contexts. Descriptions of term contexts (the learning formation, not classification algorithms.
examples) and of the regularities to be sought vary in different
approaches. Contexts can be purely graphic—words co-
occurring within a window—as in the case of [1], [4]; in some 2. FRAMEWORK
cases, the window can cover the whole document (see e.g. [21]).
Contexts can also be syntactic, as in the approaches that we 2.1 Learning semantic classes
have taken into account to develop our model, e.g. [13], [14],
[11], [20], [5], [26], [7]. However, the selection of a suitable In the context of learning semantic classes, learning from
distance for a given corpus and task is still an open problem syntactic contexts exploits syntactic relations among words to
that has not received much attention so far [25]. In most cases, derive semantic relations, following Harris’ hypothesis [15].
the criteria proposed to support this choice rely on the According to this hypothesis, the study of syntactic
evaluation of the application task for which learning takes regularities within a specialized corpus permits to identify
syntactic schemata made out of combinations of word classes
1
HELIX Project, INRIA Rhône-Alpes, ZIRST, 655 Avenue de l’Europe, reflecting specific domain knowledge. The fact of using
F–38330 Montbonnot, email: Gilles.Bisson@imag.fr specialized corpora eases the learning task, given that we have
2
Inference and Learning Group, LRI, Bât. 490, CNRS UMR 8623 & to deal with a limited vocabulary with reduced polysemy, and
Université de Paris-Sud, F–91405 Orsay Cedex, email: {cn, lola}@lri.fr limited syntactic variability.
In syntactic approaches, learning results can be of different experiments, we depart from this practice to compare learned
types, depending on the method employed. They can be classes, as we are interested in an extensional representation; we
distances that reflect the degree of similarity among terms [13], therefore use classes formed by the union of attributes of near
[26], [22], distance-based term classes elaborated with the help objects (Figure 1.b).
of nearest-neighbor methods [11], [14], degrees of membership
N123
in term classes [24], class hierarchies formed by conceptual A2 A3 A5
A1 A3 A1 3 A6
clustering [20], or predicative schemata that use concepts to 5 1
N12
20 15 12
constraint selection [1], [10], [7]. The notion of distance is 10
distance
N1 N2 N1 N2 N3
fundamental in all cases, as it allows to calculate the degree of
proximity between two objects—terms in this case—as a Figure 1.a. Object classes (= Nouns).
function of the degree of similarity between the syntactic
contexts in which they appear. Classes built by aggregation of A126
near terms can afterwards be used for different applications, N3 N4 N3 N5 A12
N1 N6 N1 N7
such as syntactic disambiguation [23], [24] or document 2 8
1 5 A1 A2 A6
10 5 5 20
retrieval [11]. Distances are however calculated using the same distance N4 N6 N1 N3 N5 N7
A1 A2
similarity notion in all cases, and our model relies on these
studies regardless of the application task. Figure 1.b. Attribute classes (= Nouns).
2.2 Conceptual Clustering Let us take an example. If the objects and are selected to form a class, their attribute sets are
In our case, ontologies are organized as multiple hierarchies merged. Let us suppose that is described by the
that form an acyclic graph where nodes are term categories nouns {decrease, increase, modification, loss, etc.}, and
described by intention, and li nks represent inclusion, seen in by {decrease, increase, composition,
this case as a generality relation. Learning though hierarchical evolution, etc.}. The noun class learned will include nouns
classification of a set of objects can be performed in two main shared by both objects (in bold), and also the complementary
ways: top-down, by incremental specialization of classes, and terms (in italics); therefore, four new triplets are induced. We
bottom-up, by incremental generalization. We have adopted a will then use the “attribute class” strategy, as th is way the “leaf”
bottom-up approach due to its smaller algorithmic complexity, classes that we will later evaluate will be larger than those
and its understandability to the user in view of an interactive formed using the “object class” strategy. We will not further
validation task. In this article we focus on the elements needed develop here the differences between these two viewpoints,
to build and evaluate the basic classes of this graph, i.e. criteria intension and extension, since this topic is out of the scope of
for building the initial corpus, distances, and evaluation the paper. Let us however insist on the fact that the selection of
criteria to asses results. We do not address the generic class one or the other has major effects on the learning results.
construction algorithm. With respect to this latter, let us just
mention that the application of hierarchical (conceptual [19] or
numerical [6]) clustering algorithms to our problem is not 3.2 Corpus parameters
straightforward, given that we must build acyclic graphs with The parameters used to form a learning corpus in Mo’K include,
few abstraction levels, rather than deep and strict hierarchies. among others, selection of learning examples, level of pruning,
and “cleaning” of the corpus. Let us examine the first two.
3. THE MO’K MODEL AND WORKBENCH
3.2.1 Selection of learning examples
3.1 Representation of examples and results One of the goals of our model is to allow the user to compare
Following the standard practice, we use binary grammatical learning results as a function of the grammatical relations
relations as syntactic contexts. Examples are therefore selected as input. Objects and syntactic contexts used in
represented by triplets , where is the object that must which are considered as similar on the grounds of their shared
be classified and represents the verbal or nominal contexts, where nouns can be verb
attribute. The number of occurrences of a triplet in a corpus complement heads (arguments [14], or adjuncts [7]), noun
characterizes the attribute for an example. For instance, if we are complements [11] or all of them [5], [13]. None of these
interested in verbal attachments, the following two sentences: approaches proposes a comparative study of results based on
• This causes a decrease in […]. grammatical relations chosen in the initial corpus. Our model
easily allows to specify these relations. Experiments
• This high rate results from an increase in […]. concerning verbal relations reported below (Section 4.2)
allow to generate two triplets, (29), and illustrate this and show significant differences among results
(2), both presenting the structure depending on the nature of objects and attributes (whether they
(total number of are nouns or verbs), and on the type of corpus.
occurrences of these triads in the corpus). In the remaining of
the paper, we will designate by Action the tuple . Actions and can be regarded as objects, and nouns A second parameter, taken into account by most existing
and can be considered as attributes with values 29 methods and included in our model, concerns corpus pruning
and 2, respectively. as a function of the number of occurrences of an element.
In bottom-up clustering, couples of near objects or of Pruning removes occurrences that are too infrequent and
objects and classes are incrementally grouped in order to form therefore would cause noise, as well as those which are too
hierarchies or graphs of object classes. The standard in NLP is frequent and do not provide any information regarding the link
to use object classes (Figure 1.a) for the application task. In our between an object and an attribute. The other side of the coin is
that infrequent but important cases can be removed. Our model that follow a schema based on comparison of pairs of
also allows to specify the minimum number of examples distribution profiles. Let us no te that we do not make more
characterizing an attribute and the minimal number of attributes specific hypotheses concerning the formal properties of
for an example and, for each of these constraints, the minimal measures—they can be similarities or dissimilarities,
total number of occurrences of the triplets being considered. symmetrical or asymmetrical, and computed information can be
The experiments reported in Section 4.3 show that the level of of any type. This approach thus favors the comparison of
pruning has a major impact on the results of learning, and that existing methods, but also the elaboration of variants of these
the optimal level depends on the corpus. methods and even the creation of new ones. Once integrated in
Mo'K, a method can access all the test and conceptual clustering
3.3 Distance Modeling resources of the system.
Our goal is not to cover all the possible methods that can be Name of the step Method
used to measure similarity between examples. On the contrary, Initialization of the weight of each example E: W(E) Init_Weight_Example
our approach focuses on methods with very precise features: Initialization of the weight of each attribute A: W(A) Init_Weight_Attribute
For each example E
• They take syntactic analysis as input; For each attribute A of the example
• They do not take into account external resources (e.g. Calculate W(A) in the context of E
Update global W(E)
Eval_Weight_Example
Eval_Weight_Attribute
ontologies such as WordNet [18]); For each attribute A of the example
Normalization of the W(A) by W(E) Init_Similarity
• They are based on a comparison of the distribution profiles of
the attributes describing the couples of object to classify. Table 1. Functions implementing the weighting phase in Mo’K.
Different methods have been proposed in the NLP literature
with in this framework—among others [14], [11] [13], [5] and 3.4 Distance evaluation
[7]. We have developed a generic model of these methods and
implemented it in Mo’K with the aim of elaborating a Even though our goal is the construction of hierarchies, it is
comparison and evaluation methodology for them. In order to interesting to evaluate the relevance of a distance metrics with
come up with a generic implementation, we have identified the respect to more simple tasks and to analyze its behavior as a
steps shared by all these methods, as we will see below. Mo’K is function of the application domain and of the parameters of
thus a workbench that implements a set of instantiable generic elaboration of the learning corpus. Mo'K offers different means
methods using an object-oriented representation, as opposed to of evaluation based on the first N couples of examples built by
the idea of a library of methods. This approach is made possible binary aggregation, i.e. the first N couples of examples with
by the fact that similarity measures can in general be regarded highest scores in the similarity matrix.
as a comparison of the “distribution profiles” of couples of
examples. This way, two objects will be considered as 3.4.1 Measure of recall
neighbors if the relative occurrence frequencies of each of their
attributes (i.e. of the syntactic contexts) are close. Learning As already mentioned in Section 3, the elaboration of a class
examples taken into account in our model can be represented by gives rise to the induction of new triplets not observed in the
means of a contingency table. Depending on the representation initial corpus. Therefore, the evaluation process follows the
hypothesis adopted, rows (examples) and columns (attributes) classical schema of dividing the corpus in two
of the table represent different things. For example, in the partitions—learning and test. The former is used to build the
experiments reported in Section 4.2 they first represent actions similarity matrix according to the measure to be evaluated. The
and nouns, respectively, and nouns and actions later on. In any latter allows to measure the coverage rates of classes, i.e. their
case, a table cell contains the number of occurrences of an ability to recognize the triplets in the test set. We have adopted
attribute for a given example. This table is obviously very this evaluation task for two reasons. First, it corresponds to the
sparse, as examples are generally described by a small number elementary step in every process of bottom-up hierarchical
of attributes (see Figure 2). clustering. Second, from a NLP perspective, it conforms to a
In practice, computation of similarity can be decomposed disambiguation task.
into two major st eps—weighting and similarity computation.
• The weighting phase changes every raw value of co- 3.4.2 Measure of precision
occurrences appearing in the contingency table by a Despite its interest, the coverage measure only allows to
coefficient, often normalized, which can be regarded as a evaluate the recall rate associated with the set N of selected
weight or measure of the significance of the fact of examples classes. However, precision—a measure of the ability to avoid
and attributes co-occurring in the corpus. Its computation can erroneous recognition of negative examples—is an equally
entail two steps—the initialization of the weights of important property of the metrics. In the end, a similarity
examples and attributes, usually according to their number of measure that tends to over-generalize and describe object
occurrences, and the calculation of a normalized weight of the couples using a large number of attributes would reach high
relevance of each attribute for each example. Technically, this coverage rates, but produce classes that lack in meaning and
weighting phase is implemented in Mo’K using the 5 precision. It is difficult to automatically solve the problem of
functions described (in pseudo-code) in Table 1. evaluating unsupervised learning in the absence of negative
• The similarity computation phase builds a similarity matrix examples. Given that we do not deal with annotated corpora,
between couples of examples. Similarity increases as a and we do not have negative examples, we face this problem in
function of the number of shared attributes, but the way in Mo’K by means of automatically generated (artificial) sample
which similarity between these distributions is calculated corpora. Following [5], we assume that examples generated this
varies in the different approaches. In Mo’K this phase is way will be negatives for the most part. Artifical examples are
implemented by a single function. formed by randomly choosing an object and an attribute from
We thus see that, by means of 6 functions and using a few lines the initial corpus, taking care that none of these examples
of code, it is possible to implement most similarity measures appears in the learning set. We measure coverage rates on
artificial examples using learned classes. Since examples are minimum number of examples characterizing a given attribute
randomly generated, some positive examples are generated as has been set to 2, and the minimum number of attributes for an
well (about 0.5% in the studied corpora). Although this rate of example to 3. We will further comment this setting in Section
artificial positive examples might seem very low, it 4.3. For each corpus, the first 25 learned classes are evaluated.
unexpectedly constitutes an important part of the artificial Coverage is measured on the test set, which comprises 20% of
examples covered by learned classes—they cover between 0.5% the whole corpus, and on the artificial set, which contains
and 2.5% artificial examples in our experiments. Hence, real 50,000 triplets randomly generated (see Section 3.3). Each test
precision can only be evaluated after negative examples in the has been repeated four times.
artificially generated set have been computed by hand. The first experiment has been conducted on Agrovoc. The
As we will see in the experiments reported in next section, it classes learned in the action-based representation are twice as
is interesting to measure other criteria in order to assess the large as those learned in the noun-based representation.
relevance of a similarity measure—for example, the induction However, induction rates (number of induced triplets divided
rate measuring the ratio between the number of induced triplets by the number of triplets learned by rote) are very similar (40%
and the total number of triplets learned. compared to 38%) and so are precision rates (45% in both
cases). Precision rate represents the rate of negative examples
among learned examples which cover artificial examples. The
4. EXPERIMENTS AND RESULTS recall measured by the coverage rate of induced triplets on the
test set is slightly better for the noun-based representation
The experiments reported here aim to illustrate Mo'K’s (5.3% compared to 4.7%) but remains quite low. This can be
parameterization possibilities and the impact that different explained by the level of generality of what is learned: the best
parameter settings have on the learning results. These couples of learned nouns involved very general terms such as
experiments make thus a case for the use of generic platforms to [technique-method], [influence-effect], in contrast with the
perform a systematic exploratory analysis in order to obtain numerous technical words of the corpus. Most of the actions
sensible results in a given domain (corpus). characterizing these nouns concern general verbs such as [to
present], [to observe], and [to report]. This is confirmed by
4.1 Training corpora looking at the best pairs of actions such as -
. This explains why learned classes are of
We have conducted experiments on two different French rather poor quality in both representations.
corpora—one contains cooking recipes gathered over the world
wide web; the other, Agrovoc, contains scientific abstracts in Corpus Learning % Induced tripl./ Recall (test Precision
the agricultural domain, and has been assembled by the INIST object learned tripl. set)
(Institut de l’Information Scientifique et Technique) of the Agrovoc Action 40 % 4.7 % 45 %
CNRS. We have chosen these two corpora as they differ in Nom 38 % 5.3 % 45 %
generality and amount of technical terms, but are still close Cooking Action 34 % 12 % 32 %
enough to allow for meaningful comparison of results. Both Nom 38 % 9.1 % 52 %
corpora have been analyzed using the same shallow parser. Only
Table 2. Experimentation about example representation
verbal relations of the form have been considered in our experiments. The output of
syntactic parsing is highly noisy (between 30% and 50% The experiments on the cooking corpus have built classes of
mistakes) due to several factors such as grammatical and similar size in both representations. Induction rate is slightly
spelling mistakes, typos, and accentuation errors in the case of higher for the noun-based representation—38% compared to
the cooking corpus. In Agrovoc, noise is mostly produced by 34%. However, recall on the test set is better in the action-based
the high number of technical terms mistaken by verbs, and of representation, (12% and 9.1%, respectively). Induced triplets
embedded noun complements which are erroneously attached to are thus more useful in the case of action-based representation,
verbs. In Agrovoc, only 300 verbs are found versus 18,828 even though they are less numerous. Moreover, the precision
nouns. This is due to the fact that only part of the corpus has measured by the rate of negative artificial examples covered by
been considered—those triplets with a verb that appears in a learned examples is much better for the action-based
list of verbs giving rise to nominalizations in the corpus. In the representation (32% compared to 52%). Precision and recall
cooking corpus we find 1,181 verbs for 3,300 nouns, i.e. the rates are thus better in this representation, although the rate of
ratio between nouns and verbs is, in average, divided by a factor induced triplets is smaller than in the noun-based
of 20 with respect to Agrovoc. This reflects a higher representation. In any case, all rates are much better than the
specialization in this corpus. Finally, Agrovoc is three times as ones computed for the Agrovoc experiments. A closer
big as the cooking corpus (117,156 triplets for 168,287 examination of the best pairs of actions and nouns confirms the
occurrences in total). idea that overgeneralization is less of a problem here than in
Agrovoc. Noun pairs are more precise (e.g., [fridge-freezer],
[olive oil–oil]) and described by more technical actions. In the
4.2 Selection of example representation same way, the best pairs of actions, such as [absorb Dobj ,
evaporate Dobj ], are characterized by nouns (in this case
The first experiment illustrates the importance of the choice of [vinegar, water, wine, excess, etc.]) which are significantly more
the object to be clustered and of the attribute which specific than in Agrovoc. The smaller variability of the cooking
characterizes it.We have compared the classes learned by corpus explains these observations, showing that the larger size
considering actions as objects and head nouns as attributes of Agrovoc does not improve the meaningfulness of the
(denoted as action-based representation), with those learned regularities observed.
using nouns as objects and actions as attributes (denoted as This experiment thus shows that the choice of a
noun-based representation). The comparison (see Table 2) has representation can have a major impact on the learning results.
been performed on both corpora, the other parameters remaining It is therefore advisable to select a suitable representation
unchanged. We have applied Asium’s distance. Pruning criteria before addressing a new domain.
are light—no minimum number of triplet occurrences, the
4.3 Pruning parameters performance when compared with others, while presenting
rather different behaviors. They are thus a representative sample
To illustrate the importance of pruning, two pruning settings of the subset of methods that we have modeled as characterized
have been applied to both corpora and compared. In both cases in Section 3.3. These methods have been applied to the corpus
we have used Asium’s distance, objects are actions and of cooking recipes. Learning parameters are the same as those
attributes are nouns. The first setting is the one described in the described in Section 4.2, objects are actions and attributes are
previous section. In the second one, we have set the minimum names. In addition, we have tested the influence on learning of
number of occurrences for an example (triplet) to 2, in order to the number of disjointed classes on which recall performance is
remove triplets occurring only once in the corpus, as they may evaluated. To do this, we have varied this number (in abscissa
represent noise. We have also augmented the values of the of the diagrams below) between 10 and 200.
minimal number of examples that must characterize an attribute
(from 2 to 3), and of the minimal number of attributes per object
(from 3 to 5, i.e. in this version the objects being compared
appear in at least 5 different syntactic contexts). As it can be
inferred from the histogram in Fig. 2, this setting excludes 80%
of the corpus, versus 70% in the first setting. We can thus hope
for a more reliable classification, to the risk of removing so
many examples that coverage (recall) is drastically affected. The
experiments show that, for the cooking corpus, induction rate
(32%) and coverage (11.2%) on the whole test set are nearly
Figure 4a&b. Recall rate and Class efficiency
unaffected. On the contrary, the rate of artificial triplets covered
scales by a factor of 3 with respect to the previous rate; we think
These diagrams show, respectively, the recall rate of each
that this significant increase indicates that this pruning rate
method on the test set (Figure 4a), and the efficiency of classes
increases the rate of erroneously induced examples. In Agrovoc,
rate (Figure 4b). The latter is assessed by the ratio between the
induction rate decreases by a factor of three and coverage rate
number of triplets learned (by rote and induced), and the
by a factor of two, whereas the rate of artificial triplets covered
number of triplets effectively used in the recall test. As we can
scales by a factor superior to 2. Recall is therefore much
see in the first diagram, the coverage rate of the three methods
strongly affected by pruning than in the cooking corpus. In
grows as expected according to the number of classes
both cases, this new pruning setting gives rise to a decrease in
considered, Dagan’s method yielding the best results. On the
performance.
contrary, if we pay attention to the efficiency of classes, Asium
takes better advantage of the triplets learned. Looking at both
diagrams, we can conclude that these methods have different
behaviors. Dagan’s gives rise to more general classes (more
triplets are learned, but the number of useless ones is higher).
Asium constructs more specific classes (fewer triplets learned,
but more of them are useful). We can take a closer look at the
behavior of these methods and study the quality of induction
in terms of the rate of induced triplets which are effectively
used in the recall test (Figure 5).
Figure 2. Example distribution per number of attribute
However, this conclusion drawn from the evaluation of basic
classes must be tempered in the case of hierarchy formation. In
this case, the more constraining version of pruning allows to
eliminate many non-significant classes that result from the
presence of closely similar actions described by a small number
of attributes. It seems clear that, in a process of hierarchical
clustering, this type of class would cause problems, as it would Figure 5. Quality of induction.
alter some groupings. Therefore, the type of pruning to be
applied partly depends on the task to be performed. While the previous conclusion is confirmed for Asium and
Dagan’s methods, we can also note that Greedy and Dagan’s
methods have the same behavior along this criterion. In fact, it
4.4 Comparison of methods seems that Dagan's method is able to induce more useful
This last experiment illustrates some aspects of the use of Mo'K triplets than Greedy, whereas this latter tends to learn by rote a
to compare results obtained with different distances. Among the more representative sample subset of the corpus.
methods that we have tested with Mo’K (such as those proposed The tests performed on the artificial examples confirm these
by Dagan et al ., Hindle, Grefenstette, and Grishman et al . among results. Therefore, it seems that the classes learned by Dagan's
the best known in the literature, as well as other distances method and, to a lesser extent, by Greedy are less robust and
proposed by the authors, e.g. Asium, and Greedy) we have present lower precision rates for of the learning parameters
chosen to compare here the distances used in Asium [7], the one chosen. We must emphasize that these conclusions only apply
proposed by Dagan [5], and Greedy 3. The reason for th is choice to the recipe corpus. For Agrovoc, results are considerably
relies on the fact that these three methods have shown good different. These experiments therefore show the importance of
going through an exploratory process in order to come up with
3
Greedy has a simple behavior: it is based on a measure, inspired from the most suitable methods and representation.
the χ2 , which favors the selection of pairs of examples described by a
large number of attributes (the method is named after this feature).
5. CONCLUSIONS AND PROSPECTS Velardi (Ed.), Adapting lexical and corpus resources to
sublanguages and applications, Workshop of the 1st Intl. Conf. on
Mo’K is a configurable workbench that supports the Language Resources and Evaluation, pp. 1-8, Granada, Spain, May.
development of conceptual clustering methods for specific [8] Fisher D.H. 1987. Knowledge Acquisition via Incremental
ontology building. The learning model proposed here takes Conceptual Clustering, Machine Learning Journal 2, pp. 139-172.
[9] Gennari J., Langley, P., Fisher D. 1989. Model of Incremental
parsed corpora as input. No additional (terminological or
Concept Formation, Artificial Intelligence Journal, Volume 40, 11-61
semantic) knowledge is used for labeling the input, guiding [10] Gomez F. 1998. Linking WordNet Verb Classes to Semantic
learning, or validating the learning results. Preliminary Interpretation. In Proceedings of the COLING-ACL Workshop on the
experiments showed that the quality of learning decreases with Usage of WordNet in NLP Systems.
the generality of the corpus. This makes somehow unrealistic [11] Grefenstette G. 1992. Use of syntactic Context to Produce Term
the use of general ontologies for guiding such learning, as they Association Lists for Text Retrieval. In Proceedings of the 15th
seem too incomplete and polysemic to allow for efficient International SIGIR'92, Denmark.
learning in specific domains. For example, [16] points out that [12] Grefenstette G. 1993. Evaluation Techniques for Automatic
40% of the words in canonical form in the titles and abstracts of Semantic Extraction: Comparing Syntactic and Window Based
Approaches. In Workshop on Acquisition of Lexical Knowledge form
the Communications of the ACM are not included in the LDOCE
Text, Columbus, OH, June.
(Longman Dictionary of Contemporary English). This problem [13] Grishman R., Sterling J. 1994. Generalizing Automaticaly Generated
posed in the case of learning specific ontologies obviously Selectional Patterns. In Proceedings of the 16th International
does no t apply in the case of guiding learning of general Conference on Computational Linguistics (COLING'94).
semantic classes, as shown in the abundant literature on the [14] Hindle D. 1990. Noun classification from predicate-argument
topic (see e.g. [23], [22], [24], [17]). It would however be highly structure. In Proc. of the 28th Annual Meeting of the Association for
valuable to take advantage of existing ontologies to improve Computational Linguistics (ACL'90), pp. 268-275, Pittsburgh.
the quality of learning. We consider that this can be achieved in [15] Harris Z., Gottfried M., Ryckman T., Mattick Jr P., Daladier A.,
two main ways. First, learning could be improved by the use of Harris T. and Harris S. 1989. The form of Information in Science. In
Analysis of Immunology Sublanguages, vol. 104 of Boston Studies in
specific terminologies, dictionaries and nomenclature, such as
the Philosophy of Science. Dordrecht, the Netherlands, Kluwer
SNOMED International in the medical domain [3]. Second, some Academic Publishers.
methodological guidelines would be needed to integrate [16] Krovetz R and Croft W.B., W. 1991. Lexical Ambiguity and
specialized learned ontologies into more general ontologies Information Retrieval. In Lexical Acquisition: exploiting on-line
such as WordNet [18]. resources to build a lexicon ¿ Zernik (Ed.), pp. 45-65, Hillsdale, New
Although we have focused on a disambiguation-based task, Jersey, Lawrence Erlbaum Associates.
other validation tasks could be integrated in Mo’K, such as [17] Li H. and Abe N. 1998. Word clustering and disambiguation based
query extension and information extraction. Learning on co-occurrence data. In Proceedings of COLING - ACL'98.
specialized ontologies of high quality for these tasks will allow [18] Miller G. 1990. WordNet: an on-line lexical database, International
Journal of Lexicography, 3(4).
the development of applications in technical and rapidly
[19] Michalski R.S., Stepp E. 1983. Learning from Observation :
evolving domains, in which manual acquisition is too costly. Conceptual Clustering. In Machine Learning I: an Artificial
In this sense, we have started exploring information extraction Intelligence Approach, Tioga, pp. 331-363.
from molecular biology abstracts. [20] Pereira F., Tishby N. and Lee L. 1993. Distributional clustering of
English words. In Proceedings of the 31st Annual Meeting of the
Association for Computational Linguistics ACL'93, p. 183-190.
ACKNOWLEDGEMENTS [21] Qiu Y. and Frei H. P. 1993. Concept based Query Expansion. In
Proceedings of 16th Annual International ACM SIGIR Conference,
We are grateful to INIST-CNRS for providing the Agrovoc pp. 160-169, Pittsburgh, ACM Press.
corpus. This research is partly funded by the French Ministry of [22] Resnik P. 1995. Using Information Content to evaluate Semantic
Industry under RNRT project Astuxe. Similarity in a Taxonomy, Cognitive Modelling.
[23] Resnik P. and Hearst M. A. 1993. Structural Ambiguity and
Conceptual Relations. In Proc. Workshop on Very Large Corpora:
Academic and Industrial Perspectives, pp. 58-64, Ohio State Univ.
REFERENCES [24] Ribas F. 1995. On Learning More Appropriate Selectional
[1] Basili R., Pazienza M. T. and Velardi P. 1996. An empirical symbolic Restrictions. In Proceedings of EACL'95.
approach to natural language processing. Artificial Intelligence [25] Roland D. and Jurafsky D. 1998. How Verb Subcategorization
Journal 85, pp. 59-99. Frequencies Are Affected By Corpus Choice. In Proceedings of the
[2] Bisson G. 1992. Conceptual Clustering in a First Order Logic Int'l Conf. Computational Linguistics (COLING'98).
Representation. In Proceedings of 10th European Conference on [26] Sekine S., Caroll J. J., Ananiadou S. and Tsujii J. 1992. Automatic
Artificial Intelligence (ECAI'92), pp. 458-462, Vienna . Learning for Semantic Collocation. In Proc. of the 3rd Conference
[3] Bouaud J., Habert B., Nazarenko A. and Zweigenbaum P. 1997. on Applied Natural Language Processing, pp. 104-109.
Regroupements issus de dépendancess syntaxiques en corpus : [27] Sparck Jones K. and Barber E. B. 1971. What makes an automatic
catégorisation et confontation à deux modélisations conceptuelles. In kyewords classification effective?, Journal of the ASIS, 18: 166-175.
Actes des Journées Ingénierie des Connaissances, Zacklad E. (Ed.), [28] Talavera L. and Bejar J. 1998. Efficient construction of
pp. 207-223, Roscoff, France, May. comprehensible hierarchical clusterings. In Proceedings of the 2nd
[4] Church K. W. and Hanks P. 1989. Word Association Norms, Mutual European Symposium on Principles of Data Mining and Knowledge
Information, and Lexicography, in Proc. of the 27th Annual Meeting Discovery, PKDD'98. pp. 93-101. Nantes, France. J. M. Zytkow and
of the Association for Computational Linguistics, pp. 76-83. M. Quafafou (eds.) LNAI vol. 1510, Springer Verlag.
[5] Dagan I., Pereira F., and Lee L. 1994. Similarity-Based Estimation of [29] Vasco J. J. F., Faicher C., Chouraqui, E. 1996. A knowledge
Word Co-occurrence Probabilities. In Proceedings of the 32nd acquisition tool for multi-perspective concept formation. In
Annual Meeting of the Association for Computational Linguistics, proceedings of 9th European Knowledge Acquisition Workshop,
ACL'94, New Mexico State University, June. EKAW'96, pp. 227-244. Springer Verlag.
[6] Day W., Edelsbrunner H. 1984. Efficient Algorithms for
Agglomerative Hierarchical Clustering Methods, Journal of
Classification. Volume 1. pp. 1-24.
[7] Faure D. and Nédellec C. 1998. A Corpus-based Conceptual
Clustering Method for Verb Frames and Ontology Acquisition. In P.