=Paper=
{{Paper
|id=Vol-31/paper-11
|storemode=property
|title=A General Framework for Theory Learning. Perspectives for NLP
|pdfUrl=https://ceur-ws.org/Vol-31/JAlvarez_8.pdf
|volume=Vol-31
|dblpUrl=https://dblp.org/rec/conf/ecai/Alvarez00
}}
==A General Framework for Theory Learning. Perspectives for NLP==
A General Framework for Theory Learning.
Perpectives for Natural Language Processing
Jordi Alvarez
1
Abstract. This position paper introduces a general frame- that allows inverse roles, role composition, and a construction
work for theory learning based on probabilistic Description equivalent to the CLASSIC SAME-AS construct [3].
Logics (DL). The formal base of the framework is brie y de- Why this language? Because in the rst experiments done
scribed. The paper also argues that the general theory learn- for NLP problems, it seems to be the less expressive language
ing approach is suitable to be used for NLP tasks; and it can that is expressive enough to solve the problem only by de ning
be specially useful to investigate interrelations among di er- a set of3 axioms and using the reasoning capabilities provided
ent types of natural language information in order to produce by DL .
better NLP systems. The axioms allowed by YAYA are of the form C D,
where C and D are concept expressions in L; that is, they
are general concept inclusion axioms. So, YAYA TBoxes are
1 Introduction
free TBoxes. This, in combination with language complex-
In this paper, I advocate for a general theory learning frame- ity would make the reasoning procedures intractable. But in
work. The representation formalism used in our framework is problems in which taxonomic knowledge is important, an ac-
an extension of Description Logics (DL) that allows to express curate representation of it in combination with the implemen-
probabilistic knowledge. A general learning procedure based tation of some optimization techniques for DL reasoning pro-
on the previous framework has been implemented as part of cedures [10] can make the overall reasoning process tractable
a system called YAYA [2]. This learning procedure is a re- for practical situations.
lational learning procedure that performs computations very
similar to those performed by Inductive Logic Programming
(ILP) systems. 2.1 Probabilistic Knowledge
The learning procedure allows the usage of large taxonomies
as background knowledge. The main goal of the learning In order to represent probabilistic knowledge, axioms are ex-
system can then be seen as that of ontology acquisition or tended to a more general form: C ; D; where is a proba-
completion (if an initial ontology is available as background bility and is a positive real number. Axiom C ; D states
knowledge). that the conditional probability P (DjC ) follows a normal dis-
tribution centered in and with a standard deviation of
in the set of models of the axiom. This is somewhat similar
2 Knowledge Representation to the p-conditioning notion introduced in [9], but intuitively
A probabilistic extension of Description Logics (DL) is used more precise by the use of the normal distribution. Unlike
as the underlying knowledge representation and reasoning for- [12], YAYA does not allow probabilistic assertions.
malism. DL distinguishes between intensional and extensional Model theoretic semantics on which DL is usually de-
knowledge. It provides a concept language L that is used to ned has to be adapted for probabilistic axioms. For non-
build concept expressions. A knowledge base is de ned as probabilistic DL we can divide the set of interpretations of a
a pair < T ; A > where T is called TBox and consists of a knowledge base =< T ; A > in two di erent sets: the set of
set of axioms that provide the intensional knowledge; and A models of , and the rest of interpretations. Intuitively, for
is called ABox and consists of a set of assertions about in- probabilistic DL we forget about this clear distinction. For
dividual objects that provide the extensional knowledge. DL probabilistic DL we cannot make this clear distinction be-
semantics is de ned in a model-theoretic way. The reader can cause given an interpretation we cannot decide whether it is
check [7] for details on DL formalism and concept languages. a model of or not. Instead, the set of axioms in T deter-
A number of concept languages appear in DL literature. mines a probability distribution over interpretations. Every
YAYA concept language (YCL) is an extension of ALCN 2 interpretation I is then assigned a probability. This proba-
bility may be 0 if I has no chance to be a model of . Details
1 TALP Research Center, Universitat Polit
ecnica de Catalunya, are out of the scope of this position paper and can be found
Modul C6, Campus Nord, 08034 Barcelona, email: jal-
varez@talp.upc.es
at [1].
2 ALCN contains top, bottom, conjunction, disjunction, existen-
tial quanti cation, universal quanti cation, negation, and un- or roads with less than 3 lanes).
3 Role axioms also seem interesting, and can easily be introduced
quali ed number restrictions (for example, number restrictions
allow to de ne concepts for women having more than 5 children; following [11].
Probabilistic axioms have two important properties: (a) of- shown to be very useful to reduce the search space explored
ten a given behaviour can be stated with less and more4 \nat- by a learning procedure.
ural" axioms if probabilities can take part in axioms ; and YAYA learning procedure is not involved with PAC learn-
(b) Given a behaviour to be de ned by a set of axioms and ability nor Least Common Subsumer (LCS) computation [4],
two concepts C and D, there always exist some and such as most work in learning DLs does (see [5, 8] for example).
that axiom C ; D is consistent with the behaviour to Instead, our learning procedure is a general axiom explo-
be learned (and and can be estimated from the set of ration procedure that is heuristically guided by the informa-
examples given to the learning procedure). tion theory measures mentioned above. An initial theory T0 is
These two properties make probabilistic axioms specially evolved7 through the addition of new axioms. This is done by
interesting for learning purposes. The rst property states the application of a set of induction rules that produce new
that the theory to be learned will probably be easier to learn axioms from axioms already existing in the theory.
if we allow probabilistic axioms. The second one may provide There are 19 di erent induction rules that can be classi ed
a path of \intermediate" axioms that are completely consis- mainly into 8generalization, specialization, and general explo-
tent with the theory to be learned and may help to nd out ration rules . In addition, some of these rules are specially
real theory axioms (those \de nitive" axioms may convert designed to work with taxonomic knowledge.
intermediate axioms into unnecessary ones). Although there is no space to explain the whole 19 rules,
we can sketch some of them to see how they work:
3 Theory Learning expand antecedent rule It adds an existential construc-
Theory learning is formalized as the process of determining a tion to the \end" of the antecedent of an axiom.
theory/TBox T from a set of models of T . So, 5the examples For example, suppose we are acquiring a theory about
used in the learning process are complete models . This di ers family relationships from a a group of families 9
. Suppose
from usual machine learning approaches in which examples 1 = man 0:5;0:3 9has wife:woman 2 Tt 10 .
are individual entities. Then, this rule would explore axioms by adding sim-
In order to de ne what theory learning is, we rst adapt ple existential constructions to the end of antecedent.
Shannon information theory notions to DL domain. An in- Among others it will produce the axiom P hi2 = man u
9has child:person 0:87;0:1 9has wife:woman, where the
terpretation can be directly represented as a set of booleanI parameters for the distribution are supposed to have been
variables: for every concept name A and every element d 2I inferred from the training set.
we have a boolean variable (stating whether d 2 A or As 2 provides more information in what refers to man
d 62 AI ); and for every role name P and every pair of ele- having childs than 1 , it will be added to Tt .
ments d1 ; d2 2 I weIhave another boolean variable (stating same-as axiom rule It builds new axioms with SAME-
whether (d1 ; d2 ) 2 P ). The entropy of I is de ned as the AS constructions in the consequent. Going on with the
number of boolean variables necessary to represent I when previous example, suppose we have 3 = person u
no additional knowledge is available. This is, in fact, a direct 9has grandparent 1;0 9has parent:9has parent:person;
interpretation of Shannon information theory notions. and we have that 3 2 Tt .
The amount of information provided by a TBox T with The application of the same-as axiom rule to 3
respect to I can be intuitively de ned as the di erence be- would build an axiom 4 with the same antecedent
tween the necessary number of boolean variables to represent and a consequent constituted by a SAME-AS construct
I when no additional knowledge is available, and the the min- built from the antecedent and the consequent of 3 .
imum number of boolean variables necessary to represent I That is, 4 = person u 9has grandparent 1;0
when T is available6 . Then, the learning procedure has to [SAME AS has parent Æ has parent; has granparent].
maximize the amount of information provided by T with re- 4 increases the amount of information of Tt , as 4 con-
spect to a training set. An axiom will be interesting for T sequent is more speci c than 3 one. So, YAYA learning
if T [ fg provides more information than T . procedure would add 4 to Tt .
Computing the amount of information provided by a TBox
is intractable. Instead, some measures that provide an idea The iterative rule application process is repeated until all
of the information added by an axiom 1 given another ax- the axioms \reachable" through the application of an induc-
iom 2 can be eÆciently computed [2]. These measures have tion rule are not interesting for T . YAYA learning procedure
has been designed from practical principles. Unfortunately
4 A very simple and illustrative example is the penguin excep-
tion so widely used: the following probabilistic axiomatization
there is no theoretical bound on the amount of information
fbird 0:95;0:05 can f ly; penguin bird; penguin : can f lyg captured by the TBox resulting from the learning process with
(supposing probabilities are correct) is simpler than its corre- respect to T .
sponding non-probabilistic axiomatization: fbird u : penguin
can f ly; penguin bird; penguin : can f ly g. Yes, only an
7 This initial theory may be empty, it may contain a subset of the
antecedent is a bit larger. But this is a quite simple theory; for theory to be learned, and it may also take into account ontological
larger theories di erences will sure be bigger. knowledge.
5 For example, if we are working in NLP, an example could corre- 8 The rst two types are similar to ILP generalization and special-
spond to a whole sentence, with all the information we want to ization operators.
learn and from which we want to learn (syntactic groups, syntac- 9 Note that here the examples correspond to families and not to
tic relations between them, word senses : : : ). individuals, as it would correspond in most ILP systems.
6 This captures the fact that some of the boolean variables neces- 10 Throughout this example, it is supposed that man, woman, and
sary when no knowledge is available can be deduced from other person are concept names that appear in the training set, and
boolean variables when T is available. that has wif e and has child are role names.
The big number of YAYA induction rules makes possible been used to learn restrictions between senses and semantic
that an interesting axiom may be reached from a lot of di er- roles. The whole set of restrictions implicitly present in the
ent search states. This is important because in some sense it training set has successfully been learned.
reduces the amount of local minima in the search space. And
as the procedure sketched above is in fact a greedy search, it
may be a ected by local minima. REFERENCES
By other hand, the YAYA learning procedure is not com- [1] Jordi Alvarez, `Extending the tableaux calculus for probabilis-
plete; that is, it does not guarantee it will nd a given axiom tic description logics', Technical Report LSI-00-R, Universitat
Politcnica de Catalunya, (2000).
or a given theory. Nevertheless, it has been tested successfully [2] Jordi Alvarez, `A formal framework for theory learning us-
on some typical ILP benchmarks such as the family relation- ing description logics', Technical Report LSI-00-R, Universi-
ships, and the chess king-rook-king illegal position ones [16]. tat Politcnica de Catalunya, (2000).
[3] Alex Borgida and Peter F. Patel-Schneider, `A semantics and
complete algorithm for subsumption in the CLASSIC descrip-
4 YAYA and Natural Language Processing tion logic', Journal of Arti cial Intelligence Research, 1, 277{
308, (1994).
The main goal of YAYA is to be applied to NLP problems. [4] W. Cohen, A. Borgida, and H. Hirsh, `Computing least com-
Relational learning systems have been shown to be well suited mon subsumers in description logics', in Proceedings of AAAI-
92, pp. 754{760, (1992).
for Natural Language Learning (NLL). ILP has been success- [5] W. Cohen and H. Hirsh, `The learnability of description logics
fully applied in a number of NLP problems [15, 14, 6]. with equality constraints', Machine Learning, 17(2{3), 169{
With this purpose, the widely used WordNet lexical ontol- 199, (1994).
ogy [13] has been integrated into YAYA. WordNet is auto- [6] James Cussens, David Page, Stephen Muggleton, and Ashwin
Srinivasan, `Using Inductive Logic Programming for Natu-
matically converted into a set of axioms that can be used: (1) ral Language Processing', in ECML'97 { Workshop Notes
To provide a background theory to the learning procedure; on Empirical Learning of Natural Language Tasks, eds.,
and (2) For reasoning purposes (once the learning procedure W. Daelemans, T. Weijters, and A. van der Bosch, pp. 25{34,
has done its job). Prague, (1997).
Despite the big amount of concepts (synsets in WordNet [7] F.M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf, `Rea-
soning in description logics', in Studies in Logic, Language
terminology) in the mentioned ontology, a careful selection of and Information, ed., G. Brewka, 193{238, CLSI Publica-
the axioms into which the taxonomy is mapped, and the im- tions, (1996).
plementation of some optimization techniques from [10] main- [8] M. Frazier and L. Pitt, `CLASSIC learning', Machine Learn-
tain the reasoning tasks tractable. ing, 25, 151{193, (1996).
[9] Jochen Heinsohn, `Probabilistic description logics', in Pro-
YAYA provides a powerful representation, reasoning, and ceedings of the 10th Conference on Uncertainty in Arti cial
learning framework for NLP tasks. This has the immediate Intelligence, Seattle, Washington, (1994).
advantage to make possible to board di erent NLP tasks using [10] I. Horrocks and P. PatelSchneider, `Optimising description
the same environment. But most important, it provides a way logic subsumption', Journal of Logic and Computation, 9(3),
to experiment with the integration of several of these tasks; 267{293, (1999).
[11] Ian Horrocks and Ulrike Sattler, `A description logics with
and to study the interaction among them. Additionally, and transitive and inverse roles and role hierarchies', Journal of
di erent to ILP systems, YAYA has been designed specially Logic and Computation, 9(3), 385{410, (1999).
to take bene t from available taxonomic knowledge. [12] Manfred Jaeger, `Probabilistic reasoning in terminological
Traditionally, several phases have been de ned in NLP: logics', in Proceedings of the 4th International Conference on
the Principles of Knowledge Representation and Reasoning,
morfological analysis, syntactical analysis, word sense disam- (1994).
biguation, contextual resolution, conceptual representation, [13] G.A. Miller, `Wordnet: An online lexical database', Interna-
etc. The complete analysis of a natural language text consists tional Journal of Lexicography, 3(4), 235{312, (1990).
of solving each one of the phases in an ordered way. Then, [14] Raymond J. Mooney, `Inductive logic programming for natu-
one phase can use information produced by previous phases; ral language processing', in Proceedings of the Sixth Interna-
tional Inductive Logic Programming Workshop, (1996).
but, of course, it cannot use information from later phases as [15] Stephen Muggleton, `Inductive logic programming: Issues, re-
it has not already been produced. sults, and the challenge of learning language in logic', Arti -
The goal of the phases analysis is to maintain the complex- cial Intelligence, 114(1{2), 283{296, (1999).
ity of the problem below certain limits; but it may miss some [16] J.R. Quinlan, `Learning logical de nitions from relations',
Machine Learning, 5, 239{266, (1990).
important interactions between di erent phases. For NLL sys- [17] J. Zelle and R. Mooney, `Learning semantic grammars with
tems this may result in the substitution of a missed interaction constructive inductive logic programming', in Proceedings of
by a set of special situations that approximate it taking into 11th AAAI, pp. 817{822, (1993).
account only information in previous phases.
Reducing NLP to an only big phase is not realistic at this
moment. The number of interactions that should be taken
into account by a learning procedure is probably too large to
obtain interesting results in a reasonable amount of time. In-
stead, a restricted interaction model can be used to explore in-
teresting interactions. YAYA provides the necessary elements
to experiment with di erent interaction models. Nevertheless,
this work is at its initial stage at this moment.
Some experiments have been performed for NLP. More con-
cretely, over 300 sentences from the training set in [17] have