=Paper=
{{Paper
|id=Vol-31/paper-11
|storemode=property
|title=A General Framework for Theory Learning. Perspectives for NLP
|pdfUrl=https://ceur-ws.org/Vol-31/JAlvarez_8.pdf
|volume=Vol-31
|dblpUrl=https://dblp.org/rec/conf/ecai/Alvarez00
}}
==A General Framework for Theory Learning. Perspectives for NLP==
<pdf width="1500px">https://ceur-ws.org/Vol-31/JAlvarez_8.pdf</pdf>
<pre>
          A General Framework for Theory Learning.
        Perpectives for Natural Language Processing

                                                          Jordi Alvarez
                                                                            1


Abstract. This position paper introduces a general frame-            that allows inverse roles, role composition, and a construction
work for theory learning based on probabilistic Description          equivalent to the CLASSIC SAME-AS construct [3].
Logics (DL). The formal base of the framework is brie y de-             Why this language? Because in the rst experiments done
scribed. The paper also argues that the general theory learn-        for NLP problems, it seems to be the less expressive language
ing approach is suitable to be used for NLP tasks; and it can        that is expressive enough to solve the problem only by de ning
be specially useful to investigate interrelations among di er-       a set of3 axioms and using the reasoning capabilities provided
ent types of natural language information in order to produce        by DL .
better NLP systems.                                                     The axioms allowed by YAYA are of the form C  D,
                                                                     where C and D are concept expressions in L; that is, they
                                                                     are general concept inclusion axioms. So, YAYA TBoxes are
1      Introduction
                                                                     free TBoxes. This, in combination with language complex-
In this paper, I advocate for a general theory learning frame-       ity would make the reasoning procedures intractable. But in
work. The representation formalism used in our framework is          problems in which taxonomic knowledge is important, an ac-
an extension of Description Logics (DL) that allows to express       curate representation of it in combination with the implemen-
probabilistic knowledge. A general learning procedure based          tation of some optimization techniques for DL reasoning pro-
on the previous framework has been implemented as part of            cedures [10] can make the overall reasoning process tractable
a system called YAYA [2]. This learning procedure is a re-           for practical situations.
lational learning procedure that performs computations very
similar to those performed by Inductive Logic Programming
(ILP) systems.                                                       2.1    Probabilistic Knowledge
   The learning procedure allows the usage of large taxonomies
as background knowledge. The main goal of the learning               In order to represent probabilistic knowledge, axioms are ex-
system can then be seen as that of ontology acquisition or           tended to a more general form: C  ; D; where is a proba-
completion (if an initial ontology is available as background        bility and  is a positive real number. Axiom C  ; D states
knowledge).                                                          that the conditional probability P (DjC ) follows a normal dis-
                                                                     tribution centered in and with a standard deviation of 
                                                                     in the set of models of the axiom. This is somewhat similar
2      Knowledge Representation                                      to the p-conditioning notion introduced in [9], but intuitively
A probabilistic extension of Description Logics (DL) is used         more precise by the use of the normal distribution. Unlike
as the underlying knowledge representation and reasoning for-        [12], YAYA does not allow probabilistic assertions.
malism. DL distinguishes between intensional and extensional            Model theoretic semantics on which DL is usually de-
knowledge. It provides a concept language L that is used to            ned has to be adapted for probabilistic axioms. For non-
build concept expressions. A knowledge base  is de ned as           probabilistic DL we can divide the set of interpretations of a
a pair < T ; A > where T is called TBox and consists of a            knowledge base  =< T ; A > in two di erent sets: the set of
set of axioms that provide the intensional knowledge; and A          models of , and the rest of interpretations. Intuitively, for
is called ABox and consists of a set of assertions about in-         probabilistic DL we forget about this clear distinction. For
dividual objects that provide the extensional knowledge. DL          probabilistic DL we cannot make this clear distinction be-
semantics is de ned in a model-theoretic way. The reader can         cause given an interpretation we cannot decide whether it is
check [7] for details on DL formalism and concept languages.         a model of  or not. Instead, the set of axioms in T deter-
   A number of concept languages appear in DL literature.            mines a probability distribution over  interpretations. Every
YAYA concept language (YCL) is an extension of ALCN 2                 interpretation I is then assigned a probability. This proba-
                                                                     bility may be 0 if I has no chance to be a model of . Details
1 TALP Research Center, Universitat Polit
                                         ecnica de Catalunya,        are out of the scope of this position paper and can be found
    Modul C6, Campus Nord, 08034 Barcelona, email: jal-
    varez@talp.upc.es
                                                                     at [1].
2   ALCN contains top, bottom, conjunction, disjunction, existen-
    tial quanti cation, universal quanti cation, negation, and un-    or roads with less than 3 lanes).
                                                                     3 Role axioms also seem interesting, and can easily be introduced
    quali ed number restrictions (for example, number restrictions
    allow to de ne concepts for women having more than 5 children;    following [11].
   Probabilistic axioms have two important properties: (a) of-         shown to be very useful to reduce the search space explored
ten a given behaviour can be stated with less and more4 \nat-          by a learning procedure.
ural" axioms if probabilities can take part in axioms ; and               YAYA learning procedure is not involved with PAC learn-
(b) Given a behaviour to be de ned by a set of axioms and              ability nor Least Common Subsumer (LCS) computation [4],
two concepts C and D, there always exist some and  such               as most work in learning DLs does (see [5, 8] for example).
that axiom C  ; D is consistent with the behaviour to                Instead, our learning procedure is a general axiom explo-
be learned (and and  can be estimated from the set of                 ration procedure that is heuristically guided by the informa-
examples given to the learning procedure).                             tion theory  measures mentioned above. An initial theory T0 is
   These two properties make probabilistic axioms specially            evolved7 through the addition of new axioms. This is done by
interesting for learning purposes. The rst property states             the application of a set of induction rules that produce new
that the theory to be learned will probably be easier to learn         axioms from axioms already existing in the theory.
if we allow probabilistic axioms. The second one may provide              There are 19 di erent induction rules that can be classi ed
a path of \intermediate" axioms that are completely consis-            mainly into 8generalization, specialization, and general explo-
tent with the theory to be learned and may help to nd out              ration rules . In addition, some of these rules are specially
real theory axioms (those \de nitive" axioms may convert               designed to work with taxonomic knowledge.
intermediate axioms into unnecessary ones).                               Although there is no space to explain the whole 19 rules,
                                                                       we can sketch some of them to see how they work:
3    Theory Learning                                                   expand antecedent rule It adds an existential construc-

Theory learning is formalized as the process of determining a           tion to the \end" of the antecedent of an axiom.
theory/TBox T from a set of models of T . So, 5the examples               For example, suppose we are acquiring a theory         about
used in the learning process are complete models . This di ers            family relationships from a a group of families  9
                                                                                                                             . Suppose
from usual machine learning approaches in which examples                  1 = man 0:5;0:3 9has wife:woman 2 Tt 10 .
are individual entities.                                                  Then, this rule would explore axioms by adding sim-
   In order to de ne what theory learning is, we rst adapt                ple existential constructions to the end of  antecedent.
Shannon information theory notions to DL domain. An in-                   Among others it will produce the axiom P hi2 = man u
                                                                          9has child:person 0:87;0:1 9has wife:woman, where the
terpretation can be directly represented as a set of booleanI             parameters for the distribution are supposed to have been
variables: for every concept name A and every element d 2I               inferred from the training set.
we have    a boolean variable (stating whether d 2 A or                   As 2 provides more information in what refers to man
d 62 AI ); and for every role name P and every pair of ele-               having childs than 1 , it will be added to Tt .
ments d1 ; d2 2 I weIhave another boolean variable (stating           same-as axiom rule It builds new axioms with SAME-
whether (d1 ; d2 ) 2 P ). The entropy of I is de ned as the               AS constructions in the consequent. Going on with the
number of boolean variables necessary to represent I when                 previous example, suppose we have 3 = person u
no additional knowledge is available. This is, in fact, a direct          9has grandparent 1;0 9has parent:9has parent:person;
interpretation of Shannon information theory notions.                     and we have that 3 2 Tt .
   The amount of information provided by a TBox T with                    The application of the same-as axiom rule to 3
respect to I can be intuitively de ned as the di erence be-               would build an axiom 4 with the same antecedent
tween the necessary number of boolean variables to represent              and a consequent constituted by a SAME-AS construct
I when no additional knowledge is available, and the the min-             built from the antecedent and the consequent of 3 .
imum number of boolean      variables necessary to represent I            That is, 4 = person u 9has grandparent 1;0
when T is available6 . Then, the learning procedure has to                [SAME AS has parent Æ has parent; has granparent].
maximize the amount of information provided by T with re-                 4 increases the amount of information of Tt , as 4 con-
spect to a training set. An axiom  will be interesting for T             sequent is more speci c than 3 one. So, YAYA learning
if T [ fg provides more information than T .                             procedure would add 4 to Tt .
   Computing the amount of information provided by a TBox
is intractable. Instead, some measures that provide an idea               The iterative rule application process is repeated until all
of the information added by an axiom 1 given another ax-              the axioms \reachable" through the application of an induc-
iom 2 can be eÆciently computed [2]. These measures have              tion rule are not interesting for T . YAYA learning procedure
                                                                       has been designed from practical principles. Unfortunately
4 A very simple and illustrative example is the penguin excep-
  tion so widely used: the following probabilistic axiomatization
                                                                       there is no theoretical bound on the amount of information
  fbird 0:95;0:05 can f ly; penguin  bird; penguin  : can f lyg     captured by the  TBox resulting from the learning process with
  (supposing probabilities are correct) is simpler than its corre-     respect to T  .
  sponding non-probabilistic axiomatization: fbird u : penguin 
  can f ly; penguin  bird; penguin  : can f ly g. Yes, only an
                                                                       7 This initial theory may be empty, it may contain a subset of the
  antecedent is a bit larger. But this is a quite simple theory; for     theory to be learned, and it may also take into account ontological
  larger theories di erences will sure be bigger.                        knowledge.
5 For example, if we are working in NLP, an example could corre-       8 The rst two types are similar to ILP generalization and special-
  spond to a whole sentence, with all the information we want to         ization operators.
  learn and from which we want to learn (syntactic groups, syntac-     9 Note that here the examples correspond to families and not to
  tic relations between them, word senses : : : ).                       individuals, as it would correspond in most ILP systems.
6 This captures the fact that some of the boolean variables neces-     10 Throughout this example, it is supposed that man, woman, and
  sary when no knowledge is available can be deduced from other           person are concept names that appear in the training set, and
  boolean variables when T is available.                                  that has wif e and has child are role names.
  The big number of YAYA induction rules makes possible            been used to learn restrictions between senses and semantic
that an interesting axiom may be reached from a lot of di er-      roles. The whole set of restrictions implicitly present in the
ent search states. This is important because in some sense it      training set has successfully been learned.
reduces the amount of local minima in the search space. And
as the procedure sketched above is in fact a greedy search, it
may be a ected by local minima.                                    REFERENCES

  By other hand, the YAYA learning procedure is not com-            [1] Jordi Alvarez, `Extending the tableaux calculus for probabilis-
plete; that is, it does not guarantee it will nd a given axiom          tic description logics', Technical Report LSI-00-R, Universitat
                                                                        Politcnica de Catalunya, (2000).
or a given theory. Nevertheless, it has been tested successfully    [2] Jordi Alvarez, `A formal framework for theory learning us-
on some typical ILP benchmarks such as the family relation-             ing description logics', Technical Report LSI-00-R, Universi-
ships, and the chess king-rook-king illegal position ones [16].         tat Politcnica de Catalunya, (2000).
                                                                    [3] Alex Borgida and Peter F. Patel-Schneider, `A semantics and
                                                                        complete algorithm for subsumption in the CLASSIC descrip-
4   YAYA and Natural Language Processing                                tion logic', Journal of Arti cial Intelligence Research, 1, 277{
                                                                        308, (1994).
The main goal of YAYA is to be applied to NLP problems.             [4] W. Cohen, A. Borgida, and H. Hirsh, `Computing least com-
Relational learning systems have been shown to be well suited           mon subsumers in description logics', in Proceedings of AAAI-
                                                                        92, pp. 754{760, (1992).
for Natural Language Learning (NLL). ILP has been success-          [5] W. Cohen and H. Hirsh, `The learnability of description logics
fully applied in a number of NLP problems [15, 14, 6].                  with equality constraints', Machine Learning, 17(2{3), 169{
   With this purpose, the widely used WordNet lexical ontol-            199, (1994).
ogy [13] has been integrated into YAYA. WordNet is auto-            [6] James Cussens, David Page, Stephen Muggleton, and Ashwin
                                                                        Srinivasan, `Using Inductive Logic Programming for Natu-
matically converted into a set of axioms that can be used: (1)          ral Language Processing', in ECML'97 { Workshop Notes
To provide a background theory to the learning procedure;               on Empirical Learning of Natural Language Tasks, eds.,
and (2) For reasoning purposes (once the learning procedure             W. Daelemans, T. Weijters, and A. van der Bosch, pp. 25{34,
has done its job).                                                      Prague, (1997).
   Despite the big amount of concepts (synsets in WordNet           [7] F.M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf, `Rea-
                                                                        soning in description logics', in Studies in Logic, Language
terminology) in the mentioned ontology, a careful selection of          and Information, ed., G. Brewka, 193{238, CLSI Publica-
the axioms into which the taxonomy is mapped, and the im-               tions, (1996).
plementation of some optimization techniques from [10] main-        [8] M. Frazier and L. Pitt, `CLASSIC learning', Machine Learn-
tain the reasoning tasks tractable.                                     ing, 25, 151{193, (1996).
                                                                    [9] Jochen Heinsohn, `Probabilistic description logics', in Pro-
   YAYA provides a powerful representation, reasoning, and              ceedings of the 10th Conference on Uncertainty in Arti cial
learning framework for NLP tasks. This has the immediate                Intelligence, Seattle, Washington, (1994).
advantage to make possible to board di erent NLP tasks using       [10] I. Horrocks and P. PatelSchneider, `Optimising description
the same environment. But most important, it provides a way             logic subsumption', Journal of Logic and Computation, 9(3),
to experiment with the integration of several of these tasks;           267{293, (1999).
                                                                   [11] Ian Horrocks and Ulrike Sattler, `A description logics with
and to study the interaction among them. Additionally, and              transitive and inverse roles and role hierarchies', Journal of
di erent to ILP systems, YAYA has been designed specially               Logic and Computation, 9(3), 385{410, (1999).
to take bene t from available taxonomic knowledge.                 [12] Manfred Jaeger, `Probabilistic reasoning in terminological
   Traditionally, several phases have been de ned in NLP:               logics', in Proceedings of the 4th International Conference on
                                                                        the Principles of Knowledge Representation and Reasoning,
morfological analysis, syntactical analysis, word sense disam-          (1994).
biguation, contextual resolution, conceptual representation,       [13] G.A. Miller, `Wordnet: An online lexical database', Interna-
etc. The complete analysis of a natural language text consists          tional Journal of Lexicography, 3(4), 235{312, (1990).
of solving each one of the phases in an ordered way. Then,         [14] Raymond J. Mooney, `Inductive logic programming for natu-
one phase can use information produced by previous phases;              ral language processing', in Proceedings of the Sixth Interna-
                                                                        tional Inductive Logic Programming Workshop, (1996).
but, of course, it cannot use information from later phases as     [15] Stephen Muggleton, `Inductive logic programming: Issues, re-
it has not already been produced.                                       sults, and the challenge of learning language in logic', Arti -
   The goal of the phases analysis is to maintain the complex-          cial Intelligence, 114(1{2), 283{296, (1999).
ity of the problem below certain limits; but it may miss some      [16] J.R. Quinlan, `Learning logical de nitions from relations',
                                                                        Machine Learning, 5, 239{266, (1990).
important interactions between di erent phases. For NLL sys-       [17] J. Zelle and R. Mooney, `Learning semantic grammars with
tems this may result in the substitution of a missed interaction        constructive inductive logic programming', in Proceedings of
by a set of special situations that approximate it taking into          11th AAAI, pp. 817{822, (1993).
account only information in previous phases.
   Reducing NLP to an only big phase is not realistic at this
moment. The number of interactions that should be taken
into account by a learning procedure is probably too large to
obtain interesting results in a reasonable amount of time. In-
stead, a restricted interaction model can be used to explore in-
teresting interactions. YAYA provides the necessary elements
to experiment with di erent interaction models. Nevertheless,
this work is at its initial stage at this moment.
   Some experiments have been performed for NLP. More con-
cretely, over 300 sentences from the training set in [17] have

</pre>