Learning classification taxonomies from a classification
                 knowledge based system
                                            Hendra Suryanto and Paul Compton


Abstract. Knowledge-based systems (KBS) are not                  identified and extracted. Our aim is to discover the
necessarily based on well-defined ontologies. In particular      appropriate ontology given that the relevant attributes (and
it is possible to build KBS for classification problems,         values) are already well-identified. A second aspect of the
where there is little constraint on how classes are organised    problem is that in a real-world system, attributes are multi-
and a class is expressed by the expert as a free text            valued rather than boolean. In rules, conditions can then
conclusion to a rule. This paper investigates how relations      subsume each other, be disjoint, etc. For example age >10
between such ’classes’ may be discovered from existing           subsumes age 50, whereas age >40 and age <10 are clearly
knowledge bases, then investigates how to construct a            disjoint.    Our method needs to not only combine
model of these classes (an ontology) based on user-selected      information about classes from across the knowledge base,
patterns in the class relations. We have applied our             but to address, the way in which conditions based on multi-
approach to KBS built with Ripple Down Rules (RDR) [1]           valued attributes interrelate. Figure. 1 shows some rules for
RDR is a knowledge acquisition and knowledge                     the class Satisfactory                 lipid      profile
maintenance methodology, which allows KBS to be built            previous raised LDL noted. In the first rule there
very rapidly and simply, but does not require a strong           is a condition Max(LDL) > 3.4 and in the second rule
ontology. Our experimental results are based on a large          there is a condition Max(LDL) is HIGH), where HIGH is a
real-world medical RDR KBS. The motivation for our               range between 2 real number.
work is to allow an ontology in a KBS to ’emerge’ during
development, rather than requiring the ontology to be
                                                                 Satisfactory lipid profile previous raised
established prior to the development of the KBS.                 LDL noted <--
                                                                 (LDL <= 3.4) AND Triglyceride is NORMAL)
                                                                 AND (Max(LDL) > 3.4)
                                                                   OR
1. Introduction
                                                                 ((LDL is NORMAL) AND (Triglyceride is
                                                                 NORMAL) AND Max(LDL) is HIGH)
Most knowledge acquisition methodologies first build a
model of domain knowledge, before using this to build a
particular problem solver. e.g KADS and CommonKADS
[2], Protege2000 [3]. Although this approach facilitates re-
use it does not overcome the knowledge acquisition and           Figure. 1. an example of a class which is a disjunction of
maintenance bottleneck and these problems are present            two rules.
both in the development of the ontology and consequent
problem solver.                                                  Finally, the problem is compounded, by the way in which
                                                                 the expert adds conclusions.            When the Multiple
The RDR approach starts knowledge acquisition (KA) to            Classification RDR (MCRDR) KB makes an error the task
build the problem solver immediately without any                 of the expert is to specify the correct conclusion and
modelling apart from a simple attribute-value data               identify the attributes and values that justify this
representation [4]. Even the attribute-value representation      conclusion. In adding the conclusion, the expert can select
can be developed while KA is in progress. The focus of the       from a list of pre-existing conclusions organised into broad
approach is to make the addition of each incrementally           categories, but can also simply type in a new conclusion.
added piece of knowledge as simple and as reliable as            In medical pathology result interpretation, the evaluation
possible. Although this approach facilitates KA and              domain here, the conclusions added by the pathologist may
maintenance [5], it does not facilitate re-use, because of the   provide advice to the referring clinician: on patient
lack of an ontology.
                                                                 diagnosis, management, how treatment is progressing,
                                                                 whether the tests ordered were appropriate, what tests
In this learning problem, we are dealing with rules rather
                                                                 might still be necessary or any combination of the above. It
than raw cases; the relevant attributes have already been
                                                                 is quite clear to both the expert and the receiver of the
                                                                 advice what information is being provided in the free text
                                                                 interpretation, but these interpretations are a long way from
Artificial Intelligence Laboratory, School of Computer           well the defined classes of a formal ontology. A task
Science and Engineering, University of New South Wales,          analysis would assess this domain as a classification
Sydney       2052,      Australia,   email:   {hendras,          problem, but this does not imply well defined classes.
compton}@cse.unsw.edu.au.                                        Hence the problem is not only that disjuncts for a class
(separate rule paths ) may be scattered across the KB, but                                 The expert is assisted in providing an appropriate rule by
that the same class may be represented by different text                                   the system requiring that cases that correctly satisfy the
strings. Such text strings may cover a combination of                                      parent rule should not satisfy the child rule.
different classes. Some examples are given in Table 1.
                                                                                           In this study we have used 5 different pathology knowledge
                                                                                           bases provided by PKS ranging in size from 25 to 320
                                                   Rule 1:
                                              if true then ...                             rules. We also have associated sets of case data ranging in
                                                                                           number from 453 to 2218 cases. The largest PKS
                                  except                                except
                                                                                           knowledge base is over 7000 rules, but this is the subject of
                                                                                           other related studies.
                             Rule 2:                                      Rule 8:
                      if sky=SUNNY then                             if wind > 30 km/h
                          Go Swimming                                 then Play chess

                                                                                           2. Ontology learning overview
               except                         except                        except

            Rule 3:            except                                      Rule 9:         Firstly we discover class relations between rules. We
                                                                 if ultraViolet=LOW then
 if ultraViolet=VERY HIGH
      then Swimming at                                                  Go Swimming        consider three basic relations: subsumption/intersection,
    indoor swimming pool
                                                                                           mutual-exclusivity and similarity. Secondly we specify
                                                                            except
                                                                                           some compound relations which appear interesting using
                           Rule 4:                                       Rule 10:          these three basic relations. We then extract the instances of
                     if wave = LOW then
                    Swimming in the beach
                                                                   if sky=RAINY then
                                                                        Play chess
                                                                                           these compound relations or patterns and assemble them
                                                                                           into a class model.
                                                 Rule 5:
                       except        if ultraViolet=MEDIUM and                             We define that class-A subsumes class-B with subsumption
                                             wind<=30 then
                                        Swimming in the beach                              value 1.0, if we always have class-A when we have class-B,
                                                      except                               but not the other way around. We decide that class-A
                             Rule 6:
                       if wind > 40 km/h                                                   subsumes class-B from the conditions used in the rules for
                        then Play Chess
                                                                                           class-A and class-B. This syntactical subsumption needs to
                                                  Rule 7:
                                                                                           be evaluated by the expert - whether class-A semantically
                                           if sky=CLOUDY then
                                            Swimming at indoor                             subsumes class-B as well. If the subsumption value is less
                                               swimming pool
                                                                                           than 1.0 we use the term intersection rather than
                                                                                           subsumption. If class-A subsumes class-B with a 0 value, it
                Figure 2. Example of MCRDR tree.                                           means we do not have any information about the
The question arises of whether it would be better to start                                 subsumption/intersection relation for class-A and class-B.
with a more well developed ontology, as suggested by most                                  The mutual exclusivity measure of class-A and class-B is
KA researchers. However in practice RDR systems allow                                      1.0 if class-A and class-B never occur together. In RDR we
experts to very rapidly and easily build large knowledge                                   evaluate this measure by checking whether there is a
bases [5] and recent commercial RDR systems have                                           condition in a rule for class-A mutually exclusive with any
confirmed these advantages outside the research                                            condition in a rule for class-B (for example age > 50 and
environment. (Pacific Knowledge Systems (PKS), personal                                    age < 30). In RDR knowledge bases parent rules are
communication) The aim of the present work then is to                                      always mutually exclusive with child rule.
preserve the ease and speed of development provided by
RDR systems, but overcome their lack of an initial strong                                  Class-A similar to class-B with measure 1.0 if both of the
ontology, by discovering the ontologies implicit in these                                  classes have exactly the same conditions. We use the term
incrementally developed systems. This may give us the                                      same rather than similar if the similarity is 1.0.
best of both worlds.                                                                       A class in MCRDR is the set of disjunct rule paths giving
RDR exception structure provides a compact representation                                  the same conclusion. A rule path consists of all conditions
of knowledge [6-13]. Gaines ([14]) has also generalized                                    from all predecessor rules plus conditions of this particular
RDR to Exception Directed Acyclic Graphs (EDAGs). He                                       node’s rule. For example, (see figure 2) :
argues EDAGs are more compact and readable than                                            rule path 6 :
MCRDR because of a graph rather than a tree structure.
Initial RDR development was concerned with classification                                     class Play Chess    wind > 40,
tasks, first single and later multiple classification. RDR has                                wave=LOW, sky=SUNNY.
since been extended to configuration [15], heuristic search                                The central idea of the technique is to group all rules for
[16], document retrieval [17] and a more general RDR                                       each class and compute a quantitative measurement (from 0
system for construction tasks has been proposed [18].                                      to 1) for each relation (subsumption, mutual-exclusivity,
Figure 2 shows a simple example of the exception structure                                 similarity) between every pair of classes. We use this
used in MCRDR. All pathways are evaluated. Evaluation                                      quantitative measure as an informal confidence measure as
on any pathway stops when a leaf node is reached or no                                     to whether these relations exist. The algorithm will be
child rule is satisfied. A conclusion is provided by the last                              discussed in detail below, but when applied to the example
satisfied rule in each refinement pathway. When an expert                                  in figure 2 it gives: class Go Swimming subsumes class
identifies that an erroneous conclusion has been given, they                               Swimming in the beach with degree of confidence 0.83;
enter a new rule, which the system adds as a refinement.                                   class Play_Chess and class Go_Swimming are mutually


2
exclusive with degree of confidence 0.17; class Go Swimming         the same viewpoint and note that although there is research on
class is similar to class Play Chess have degree of similarity      combining KA and machine learning and using background
0.50. This quantitative measure enables us to group different       knowledge in machine learning, there seems little research so
examples of the class and provides information on whether           far in learning from a KBS rather than from cases [20], [14].
across these examples, a class tends to subsume another class
                                                                    The immediate precursor of this work [20] applied formal
(for example, class Go Swimming subsumed class Swimming
                                                                    concept analysis to ontology discovery in knowledge bases.
in the beach with degree of confidence 0.83. Using this
                                                                    This provided a useful way to explore concepts in a
quantitative measure we can say class Go Swimming tends to
                                                                    knowledge base, but because of the complexity of the
subsume or almost subsumes class Swimming in the beach,
                                                                    conceptual lattice it was more useful to consider sub-sections
rather than simply saying class Go Swimming subsumes class
                                                                    of the lattice, selected by the user or by a simple nearest
Swimming in the beach or class Go Swimming does not
                                                                    neighbour algorithm [21]. The critical difference from the
subsume class Swimming in the beach.
                                                                    work here, is that in formal concept analysis the difference
These measures become interesting when applied to real              between different concepts is emphasised. Here we attempt to
examples such as: class [Euthyroid levels] subsumes                 combine all the concepts that represent a class and consider
class [Levels         consistent        with      adequate          relations between classes rather than between concepts.
replacement] with degree of confidence 0.866.
(Medically, ‘adequate’ thyroid hormone replacement brings
thyroid hormone levels to approximately normal levels.)
Boolean values are obviously inappropriate for subsumption          3. The class relations model
/intersection, mutual-exclusivity, and similarity relations in
real domains. We found that in the Iron knowledge base,
                                                                    The class relations model shows the relations subsumption,
there were only 16 subsumption relations with degree of
                                                                    mutual-exclusivity and similarity between classes and the
confidence 1.0; 8 mutually-exclusive relations with degree of
                                                                    degree of confidence that the particular relation exists [22].
confidence 1.0 and no similarity relations with degree of
                                                                    We note that the measures we derive are strictly heuristic.
confidence 1.0.
                                                                    Other superior and perhaps more well founded measures may
We refine the subsumption/intersection measure by not only          be possible. The results here represent simply a first attempt
considering the conditions in the rule path but also the ratio of   at carrying out this type of analysis. The second point to note
number of cases handled by parent and child rules in a rule-        is that these relations have to deal with non-boolean as well as
path. This allows us to deal with the situation where a rule is     boolean data.
a gross over-generalisation and the child rule is added as
correction to deal with most of the cases the parent rule would     Let X be a class in the MCRDR framework. {X0…Xm} is a
fire on. We consider that there is little value to the              set of rules which have class X as the conclusion. {Xi0 …
subsumption relation in this circumstance. For example (see         Xin} is a set of conditions for rule Xi where i = 0...n, n is the
figure 2) rulepath-2 has two cases, rulepath-3 has                  number of distinct conditions in the rule path; m is number of
two cases, rulepath-4 has 3 cases and rulepath-6 has                rule paths for class X. In the MCRDR framework the class is
3 cases. We therefore calculate the quality of rulepath-2           given as a disjunction of rule paths [7].
is 2/(2+2+3+3) = 0.2, quality of rulepath-4 is 3/(3+3) =
0.5, quality of rulepath-3 is 2/2 = 1.0 and quality of              Then                      m     n
rulepath-6 is 3/3 = 1.0.
We do not need a rule quality measure if we use flat rule                        class X = ∨ ( ∧ Xij )
system rather than the refinements of a rule path, for example                                i=0   j=0
a flat rule for rule_2 is:
class Go Swimming sky=SUNNY, not(ultraviolet= VERY
                      
                                                                    That is, Xij stand for an individual condition in a rule path for
HIGH), not(wave=LOW).                                               the class X. If X is a class and Y is also a class, we could
                                                                    define a similarity measure as follows:
Flat rules for rule_3 and rule_6 are the same as their rule paths
since the rule path extends to the leaves and so no negation of     Sim (Xij ,Yij) = 0 if Xij ,Yij are different
child conditions is required. Rule_1 has three children with        Sim (Xij ,Yij) = 1 if Xij ,Yij are same
one, one and two conditions, and therefore we get (1 x 1 x 2)
flat rules. By converting this RDR knowledge base to flat           If α is set of distinct attributes in rule path Xi, β is set of
rules we get an equivalent knowledge base, but this is              distinct attributes in rule path Yi, n = | α ∪ β|, (Xij ,Yij) are
generally not feasible with real-world knowledge bases. In          pair of same attributes, but they could be different conditions,
the five knowledge bases considered here, some rules have           (e.g. Age>50, Age=60) then we can define:
many children with several conditions. There is one rule with
                                                                                          n
10 children and every child has 5 conditions which converts to
(510) flat rules.                                                                         Σ Sim(Xij ,Yij)
One of the advantages of learning from rules is that we can                               j=0

assume that irrelevant attributes have already been discarded.
This is significant as in our application domain there are          Similar(Xi ,Yi) = −−−−−−−−−−−− * Q( Xi ) * Q(Yi)
hundreds of attributes. Gaines [19] argues that a rule in a
knowledge base is worth many cases for learning. We adopt                                       | α ∪ β|
where Q( Xi ) = number of cases for Xi / (number of cases for                                                                     where Q(Xi ) = number of cases for Xi / (number of cases for
descendant Xi + number of cases for Xi). Function Q                                                                               descendant Xi + number of cases for Xi )
measures the quality of a rule in RDR.
                                                                                                                                  Function Subsume() measures a degree of confidence that the
If the quality of a rule is close to 100%, it means that nearly                                                                   first rule path subsumes the second rule path. For example
all cases reaching this rule are processed by the its child rules;                                                                subsume(rulepath-2,rulepath-4) = 2/2 *
and the child rules are rare exceptions. On the other hand if                                                                     Q(rulepath-2) * Q(rulepath-4).
the quality of a rule is 10%, it means 90% of cases that satisfy
that rule are passed to its children. That is the rule is too                                                                     Conditions     in      rulepath-2        are  sky=SUNNY;
general and can be regarded as not being a particularly good                                                                      conditions in rulepath-4 are sky=SUNNY, wave=LOW;
rule and so it should not be given as strong consideration in                                                                     and |α ∪ β| = 2, that is α ∪ β = {sky, wave}.
developing the relations in the system.
                                                                                                                                  Since rulepath-2 does not have the attribute wave, we
For example                                                                                                                       consider rulepath-2 is more general than rulepath-4
                                                                                                                                  with the attribute wave. Therefore there are 2 conditions in
Similar(rulepath-2, rulepath-6)
                                                                                                                                  rulepath-2 which are same as or more general than
1/3 * Q(rulepath-2) * Q(rulepath-3),
                                                                                                                                  rulepath-4. Similarly,
Similar(rulepath-8, rulepath-9)
= 1/2 * Q(rulepath-8) * Q(rulepath-9),                                                                                            subsume (rulepath-2, rulepath-5) =2/3 *
                                                                                                                                  Q(rulepath-2) * Q(rulepath-5)
Similar(rulepath-9, rulepath-10)
= 2/3 * Q(rulepath-9) * Q(rulepath-10).                                                                                           Figure 5 illustrates how we find a subsumption measure
Function Similar() measures a similarity between 2 nodes                                                                          between 2 classes. It shows Class X as a disjunction of nodes
(each node contains a rule path).                                                                                                 1,2 and 3 and Class Y as a disjunction of nodes 4,5 (each
                                                                                                                                  node contains a rule path).
            Class X                                     Class X                                 Class X
   1                  2                3       1                  2                3   1                   2                  3   We compute ClassSubsume(X,Y) = (v1 + v3) / 2. We choose
                                                                                                   v2          v4
                                                                                                                                  the v such that all nodes of class Y are covered by at least one
       v1                v2       v3               v1                v2       v3           v1
                                                                                                          v3        v5
                                                                                                                         v6
                                                                                                                                  edge and the sum of v (eg. v1 + v3) is maximal. E.g.
                                                                                                                                  classSubsume ( Go swimming, Swimming in the beach) =
              4               5                           4               5                       4                 5             (1+0.667)/2. We assume the quality of all rule paths is 1.0 to
               Class Y                                     Class Y                                 Class Y
                                                                                                                                  simplify this example.
            Figure 4.                                     Figure 5                               Figure 6
                                                                                                                                  We can define a mutually exclusive measure as follows with
Figure 4 suggests how we can find a similarity measure                                                                            Xij and Yij standing for individual conditions in rule paths
between 2 classes. It shows that Class X is the disjunction of                                                                    for the relevant classes as above.
nodes 1,2 and 3 and Class Y is the disjunction of nodes 4 and
5. We propose that ClassSimilarity(X,Y) = (v1 + v2 + v3) / 3,                                                                     Mut (Xij ,Yij) = 0 if Xij and Yij are not mutually exclusive.
where we choose the v such that all nodes are covered by at
                                                                                                                                  Mut (Xij ,Yij) = 1 if Xij and Yij are mutually exclusive (for
least one edge and the sum of v (eg. v1 + v2 + v3) is maximal.
                                                                                                                                  example A>5 and A<2)
Note that v stands for the Similar() function. In later similar
diagrams v stands for the Subsume() and MutualEx()                                                                                MutualEx(Xi, Yi) = 1 , if at least one of Mut(Xij ,Yij)=1
measures. Eg. ClassSimilarity(Go swimming, Play chess) =
                                                                                                                                  MutualEx(Xi, Yi) = 1 , if rule Xi and rule Yi are parent and
((1/3)+(1/2)+(2/3))/3=0.5. We assume here the quality of all
                                                                                                                                  child.
rule paths are 1.0 to simplify the example.
                                                                                                                                  MutualEx(Xi ,Yi) = 0 , otherwise
We can define a subsumption measure as follows with Xij and
Yij standing for individual conditions in rule paths for the                                                                      Function MutualEx() measures a degree of confidence that the
relevant classes as above.                                                                                                        first rule subsumes the second rule. Since the quality of the
                                                                                                                                  rule does not affect mutual exclusivity as much as it affects
Sub (Xij ,Yij) = 0 if Xij does not subsume Yij
                                                                                                                                  similarity and subsumption/intersection, we do not apply the
Sub (Xij, Yij) = 1 if Xij subsumes or same Yij (for example                                                                       quality measure to mutual exclusivity. For example
A>5 subsumes A>10)
                                                                                                                                  MutualEx(rulepath-2, rulepath-10) = 1.0,
If α is set of distinct attributes in rule Xi, β is set of distinct
                                                                                                                                  Figure 6 suggests how we can find a mutual exclusivity
attributes in rule Yi, then we can define:
                                                                                                                                  measure between 2 classes. It shows that Class X is a
                                           n
                                                                                                                                  disjunction of nodes 1,2 and 3 and Class Y is a disjunction of
                                                                                                                                  nodes 4 and 5. We compute ClassMutualEx(X,Y) = (v1 + v2
                                           Σ Sub(Xij ,Yij)                                                                        +v3 +v4 + v5 + v6) / 6. X and Y are mutually exclusive if
                                           j=0                                                                                    and only if all nodes of X and Y are mutually exclusive with
                                                                                                                                  respect to each other (see Figure 6). Therefore ClassMutualEx
 Subsume(Xi ,Yi) = −−−−−−−−−−− ∗ Q( Xi ) * Q(Yi)                                                                                  (Go swimming, Play chess) = 1/6, since Go
                                                                                                                                  Swimming has 2 rulepaths, Play chess has 3 rulepaths
                                               |α ∪ β|                                                                            and all MutualEx() between those rulepaths are each 0.0,
                                                                                                                                  except for MutualEx(rulepath-2, rulepath-10) = 1.0.
4. Experimental results                                                           for all sets of three classes A,B,C from a knowledge bases
                                                                                  We can then combine such elements. E.g we may join
         Horm one knowledge bases system sim ilarity-value > 0.66                 Triangle(A,B,C) and Triangle(D,E,F) if A=D and
         Class description                 Class description                      MutualExclusive( {B,C}, {E,F}, > 0.5). If MutualExclusive(
 [5] interpretation                   [12] M ale interpretation                   B, C, == 1.0 ), we could say {B,C} are exhaustive subclass
    [Satisfactory prolactin level.]    [Satisfatory prolactin level.]             partitions of A [23]. We applied this technique to the thyroid
 [14] interpretation                  [28] interpretation] [Consistent with
                                                                                  knowledge base and obtained the result in figure 7.
    [Consistent with premature           premature ovarian failure. Suggest
    ovarian failure.]                    repeat FSH and oestradiol in 2-3
                                                                                                                  0
                                         m onths to confirm.]


         Hormone knowledge bases system subsum ption-value = 1
         Class description                       Class description
 [6] interpretation [Elevated         [33] interpretation [Elevated prolac-             4      5      6       7       17         44             24
   prolactin persists. Suggest TSH ] tin persists. Primary hypothyroidism
                                       has been excluded. IV sam pling thro-
                                       ugh an in-dwelling cannula can help
                                       exclude stress-related elevations of
                                                                                   8   12   33 36    23 30    49 51        37        19    22        40 26
                                       prolactin. Pituitary imaging may be
                                        required.]
                                      [36] interpretation [Elevated prolac-
                                        tin persists. Await TSH.]                                                               38        56
  [7] Interpretation                  [23] M ale interpretation [Raised pro-
  [Raised prolactin in fem ales is      lactin in m en is commonly due to         Figure 7. Partial taxonomy of the thyroid knowledge base
  comm only due to medication,          stress, medications and occasionally
  strees or lactation.                  hypothyroidism. Suggest TSH and
 Suggest TSH to exclude                repeat prolactin after 30 minutes rest.]
  hypothyroidism and repeat           [30] interpretation [Raised prolactin
                                                                                  5. Discussion
  prolactic after 30 minutes rest.]    in fem ales is comm only due to m edi-
                                       cations, stress or lactation. Hypothy-
                                       roidism has been excluded. Suggest         It is beyond the scope of this paper (and the authors) to
                                       review medications and repeat prolac-      provide a detailed medical analysis of the ontologies produced
                                       tin after 30 minutes rest.]                by these techniques; however, it is worth noting some lay
                                                                                  observations.
      Horm one knowledge bases system m utual-exclusivity-value=1
          Class description                      Class description                The mutually-exclusive classes in Table 1 seem reasonable.
 [4] interpretation]                  [9] interpretation [No evidence of          However, we also have some cases where very similar
    [Consistent with                   perim enopause, unless patient is          conclusions are identified as mutually exclusive. This occurs
    perimenopause]                     on oestrogen therapy.]                     when experts make up rules that include different values for
 [5] interpretation                   [6] interpretation [Elevated
                                                                                  the same attribute. For example some mutually exclusive
    [Satisfactory prolactin level.]     prolactin persists. Suggest TSH
                                                                                  conclusions seem to give the same clinical advice but
                                                                                  specifically refer to a patient being male or female. This may
            Table 1. some examples of class relations.                            or may not be of clinical importance, but there is an obvious
                                                                                  opportunity to have a further superclass.
                                                                                  The most interesting issues arise with the nature of
4.1 Class relations model                                                         subsumption. A superclass subclass relation may arise where
                                                                                  one rule specifies a value for an attribute and another does
The results from the endocrinology knowledge are shown in                         not. For example a key factor in comments 6, 7, 23, 30, 33
the table 1. The results shown in each table are the class pairs                  and 36 is whether or not a TSH measurement should be
with the highest similarity, subsumption, or mutual                               ordered and whether or not primary hypothyroidism has been
exclusivity measures. Only results with high values for these                     excluded. TSH results are important in the diagnosis of
relations are shown                                                               primary hypothyroidism. The generic comment suggesting a
                                                                                  TSH measurement is given when there is no TSH result
                                                                                  available. The comment also suggests the clinical cause of
                                                                                  the high prolactin level remains unknown. When there is a
4.2 Extracting patterns from the class relation graph                             TSH result available this provides some evidence to confirm
                                                                                  or exclude one of the causes of the high prolactin. These
Since there are many classes (from 25 to 100), it is impossible                   relations appears to us as ontologically reasonable. However,
to consider all possible pairs of relations between the classes.                  the wording of the actual comments does not readily indicate
We therefore extract specific patterns which seem likely to be                    such relationships. It would be interesting to know how the
components of a meaningful taxonomy. For example:                                 expert would react and how comments might be worded if this
                                                                                  ontological information was available as rules were being
Subsume(A,B)==1.0,                                                                developed.
Subsume(A,C)==1.0,
MutualExclusive(B,C)>0.5                                                          A more general example of this pattern is the comment
                                                                                  [0]:“patient has ovulated” which is at the top level of the
taxonomy in Fig 7. This subsumes a whole range of more                3.    Grosso, W.E., et al. Knowledge Modeling at the Millennium
specific comments related to other attributes. Again it would               (The Design and Evolution of Protégé-2000). in Twelfth
                                                                            Workshop on Knowledge Acquisition, Modeling and
be interesting to see if this taxonomic information influenced              Management, Voyager Inn, Banff, Alberta, Canada. 1999.
the expert’s wording.                                                 4.    Richards, D. and P. Compton. Knowledge Acquisition First,
                                                                            Modelling Later. in 10th European Knowledge Acquisition
                                                                            Workshop. 1997
                                                                      5.    Edwards, G., et al., PEIRS: a pathologist maintained expert
5.1. Further work                                                           system for the interpretation of chemical pathology reports.
                                                                            Pathology, 1993. 25: p. 27-34.
At this stage we have only conducted a preliminary                    6.    Catlett, J. Ripple Donw Rules as a Mediating Representation in
                                                                            Interactive Induction. in Proceedings of the Second Japanese
examination of some of the relations in the knowledge bases.
                                                                            Knowledge Acquisition for Knowledge Based Systems
A detailed examination by domain experts will be required to                Workshop. 1992. Kobe, Japan.
establish the utility of this approach. This examination may          7.    Gaines, B.R. and P.J. Compton. Induction of Ripple Down
suggest that other types of measure than those suggested here               Rules. in Fifth Australian Conference on Artificial Intelligence.
may be useful and paterns other than those used in Fig 7 may                1992. Hobart.
                                                                      8.    Kivinen, J., H. Mannila, and E. Ukkonen. Learning Rules with
be of interest for browsing the relations. At this stage we
                                                                            Local Exceptions. in European Conference on Computational
make no claim about the particular heuristic measures we                    Theory. 1993.
have used, except that broadly, measures of this kind seem            9.    Scheffer, T. Algebraic foundations and improved methods of
useful in discovering and exploring implicit ontologies.                    induction or ripple-down rules. in 2nd Pacific Rim Knowledge
                                                                            Acquisition Workshop. 1996.
We are also investigating the possibility of learning                 10.   Siromoney, A. and R. Siromoney, Variations and Local
interesting patterns from a coloured graph of class relations.              Exception in Inductive Logic Programming, in Machine
The frequency of isomorphic sub-graphs may be interesting.                  Intelligence - Applied Machine Intelligence, K. Furukawa, D.
                                                                            Michie, and S. Muggleton, Editors. 1993. p. 213 - 234.
If the graph is large, then we could scale up the algorithm by
                                                                      11.   Compton, P., P. Preston, and B. Kang. The Use of Simulated
applying data mining technique (e.g the apriori algorithm                   Experts in Evaluating Knowledge Acquisition. in 9th AAAI-
[24])                                                                       sponsored Banff Knowledge Acquisition for Knowledge Base
                                                                            System Workshop. 1995. Canada.
Finally, the present technique only considers the conditions in       12.   Kang, B., P. Compton, and P. Preston. Simulated Expert
rule paths and proportion of cases handled by rules in                      Evaluation of Multiple Classification Ripple Down Rules. in
determining the relations. It does not consider any other                   11th Banff knowledge acqusition for knowledge-based systems
information about the classes themselves. The refinement                    workshop. 1998. Banff: SRDG Publications, University of
                                                                            Calgary.
structure for RDR does not indicate any ontological                   13.   Suryanto, H., D. Richards, and P. Compton. The automatic
refinement; however, it does indicate that an expert thought a              compression of Multiple Classification Ripple Down Rule
conclusion was inappropriate and so should be replaced by                   Knowledged Based Systems: Preliminary Experiments. in
another. We are looking at the possible relations between                   Knowledge-based Intelligence Information Engineering
conclusions that this would allow which could the be related                Systems. 1999. Adelaide, South Australia: IEEE.
                                                                      14.   Gaines, B.R., Transforming Rules and Trees into
to the idea of relations presented here.
                                                                            Comprehensible Knowledge Structures. 1995: AAAI/MIT
                                                                            Press.
                                                                      15.   Ramadan, Z., et al. From Multiple Classification RDR to
                                                                            Configuration RDR. in 11th Banff Knowledge Acquisition for
6. Conclusion                                                               Knowledge Base System Workshop. 1998. Canada.
                                                                      16.   Beydoun, G. and A. Hoffmann. NRDR for the Acquisition of
                                                                            Search Knowledge. in 10th Australian Conference on Artificial
The work we have presented is an attempt to develop                         Intelligence. 1997.
techniques to discover the ontologies implicit in knowledge           17.   Kang, B., et al., A help desk system with intelligent interface.
                                                                            Applied Artificial Intelligence, 1997. 11((7-8)): 611-631.
bases. We believe it will be of increasing importance to carry
                                                                      18.   Compton, P. and D. Richards. Extending Ripple Down Rules. in
out this particular kind of knowledge discovery as larger and               Australian Knowledge Acquisition Workshop 99. 1999. UNSW.
larger knowledge bases come into use and we seek to exploit           19.   Gaines, B. R. (1989). Ounce of Knowledge is Worth a Ton of
the knowledge in the knowledge bases in different ways. We                  Data: Quantitative Studies of the Trade-off between Expertise
do not make any particular claim for the techniques we have                 and Data based on Statistically Well-founded Empirical
                                                                            Induction, San Mateo, California: Morgan Kaufmann, 1989.
developed to date, except that they suggest that such ontology
                                                                      20.   Richards, D. and P. Compton. Uncovering the conceptual
discovery is possible. The key idea in the techniques we have               models in RDR KBS. in              International Conference on
developed is that is seems reasonable to use heuristic                      Conceptual Structures ICCS'97. 1997. Seattle: Springer Verlag.
quantitative measures to group classes and class relations.           21.   Richards, D., The Reuse of Knowledge in Ripple Down Rule
This then enables possible ontologies to be explored on a                   Knowledge Based Systems, in Artificial Intelligence
reasonable scale                                                            Department. 1998, New South Wales: Sydney. p. 335.
                                                                      22.   Suryanto H. and Compton P. Discovery of class relations in
                                                                            exception structured knowledge bases. The International
                                                                            Conference on Conceptual Structures 2000.
References                                                            23.   Perez, A.G. Evaluation of Taxonomic Knowledge in Ontologies
                                                                            and Knowledge Bases. in KAW'99, Twelfth Workshop on
1.   Compton, P. and R. Jansen., A philosophical basis for                  Knowledge Acquisition, Modeling andManagement. 1999.
     knowledge acquisition. Knowledge Acquisition, 1990. 2: p. 241-         Voyager Inn, Banff, Alberta, Canada.
     -257.                                                            24.   Agrawal, R., et al., Fast Discovery of Association Rules.
2.   Schreiber, G., B. Wielinga, and J. Breuker, KADS A Principled          Advances in Knowledge Discovery and Data Mining, ed. U.M.
     Approach to Knowledge-Based System Development. 1993:                  Fayyad, G. Piatetsky-shapiro, and P. Smyth. 1996:
     Academic Press.