Learning classification taxonomies from a classification knowledge based system Hendra Suryanto and Paul Compton Abstract. Knowledge-based systems (KBS) are not identified and extracted. Our aim is to discover the necessarily based on well-defined ontologies. In particular appropriate ontology given that the relevant attributes (and it is possible to build KBS for classification problems, values) are already well-identified. A second aspect of the where there is little constraint on how classes are organised problem is that in a real-world system, attributes are multi- and a class is expressed by the expert as a free text valued rather than boolean. In rules, conditions can then conclusion to a rule. This paper investigates how relations subsume each other, be disjoint, etc. For example age >10 between such ’classes’ may be discovered from existing subsumes age 50, whereas age >40 and age <10 are clearly knowledge bases, then investigates how to construct a disjoint. Our method needs to not only combine model of these classes (an ontology) based on user-selected information about classes from across the knowledge base, patterns in the class relations. We have applied our but to address, the way in which conditions based on multi- approach to KBS built with Ripple Down Rules (RDR) [1] valued attributes interrelate. Figure. 1 shows some rules for RDR is a knowledge acquisition and knowledge the class Satisfactory lipid profile maintenance methodology, which allows KBS to be built previous raised LDL noted. In the first rule there very rapidly and simply, but does not require a strong is a condition Max(LDL) > 3.4 and in the second rule ontology. Our experimental results are based on a large there is a condition Max(LDL) is HIGH), where HIGH is a real-world medical RDR KBS. The motivation for our range between 2 real number. work is to allow an ontology in a KBS to ’emerge’ during development, rather than requiring the ontology to be Satisfactory lipid profile previous raised established prior to the development of the KBS. LDL noted <-- (LDL <= 3.4) AND Triglyceride is NORMAL) AND (Max(LDL) > 3.4) OR 1. Introduction ((LDL is NORMAL) AND (Triglyceride is NORMAL) AND Max(LDL) is HIGH) Most knowledge acquisition methodologies first build a model of domain knowledge, before using this to build a particular problem solver. e.g KADS and CommonKADS [2], Protege2000 [3]. Although this approach facilitates re- use it does not overcome the knowledge acquisition and Figure. 1. an example of a class which is a disjunction of maintenance bottleneck and these problems are present two rules. both in the development of the ontology and consequent problem solver. Finally, the problem is compounded, by the way in which the expert adds conclusions. When the Multiple The RDR approach starts knowledge acquisition (KA) to Classification RDR (MCRDR) KB makes an error the task build the problem solver immediately without any of the expert is to specify the correct conclusion and modelling apart from a simple attribute-value data identify the attributes and values that justify this representation [4]. Even the attribute-value representation conclusion. In adding the conclusion, the expert can select can be developed while KA is in progress. The focus of the from a list of pre-existing conclusions organised into broad approach is to make the addition of each incrementally categories, but can also simply type in a new conclusion. added piece of knowledge as simple and as reliable as In medical pathology result interpretation, the evaluation possible. Although this approach facilitates KA and domain here, the conclusions added by the pathologist may maintenance [5], it does not facilitate re-use, because of the provide advice to the referring clinician: on patient lack of an ontology. diagnosis, management, how treatment is progressing, whether the tests ordered were appropriate, what tests In this learning problem, we are dealing with rules rather might still be necessary or any combination of the above. It than raw cases; the relevant attributes have already been is quite clear to both the expert and the receiver of the advice what information is being provided in the free text interpretation, but these interpretations are a long way from Artificial Intelligence Laboratory, School of Computer well the defined classes of a formal ontology. A task Science and Engineering, University of New South Wales, analysis would assess this domain as a classification Sydney 2052, Australia, email: {hendras, problem, but this does not imply well defined classes. compton}@cse.unsw.edu.au. Hence the problem is not only that disjuncts for a class (separate rule paths ) may be scattered across the KB, but The expert is assisted in providing an appropriate rule by that the same class may be represented by different text the system requiring that cases that correctly satisfy the strings. Such text strings may cover a combination of parent rule should not satisfy the child rule. different classes. Some examples are given in Table 1. In this study we have used 5 different pathology knowledge bases provided by PKS ranging in size from 25 to 320 Rule 1: if true then ... rules. We also have associated sets of case data ranging in number from 453 to 2218 cases. The largest PKS except except knowledge base is over 7000 rules, but this is the subject of other related studies. Rule 2: Rule 8: if sky=SUNNY then if wind > 30 km/h Go Swimming then Play chess 2. Ontology learning overview except except except Rule 3: except Rule 9: Firstly we discover class relations between rules. We if ultraViolet=LOW then if ultraViolet=VERY HIGH then Swimming at Go Swimming consider three basic relations: subsumption/intersection, indoor swimming pool mutual-exclusivity and similarity. Secondly we specify except some compound relations which appear interesting using Rule 4: Rule 10: these three basic relations. We then extract the instances of if wave = LOW then Swimming in the beach if sky=RAINY then Play chess these compound relations or patterns and assemble them into a class model. Rule 5: except if ultraViolet=MEDIUM and We define that class-A subsumes class-B with subsumption wind<=30 then Swimming in the beach value 1.0, if we always have class-A when we have class-B, except but not the other way around. We decide that class-A Rule 6: if wind > 40 km/h subsumes class-B from the conditions used in the rules for then Play Chess class-A and class-B. This syntactical subsumption needs to Rule 7: be evaluated by the expert - whether class-A semantically if sky=CLOUDY then Swimming at indoor subsumes class-B as well. If the subsumption value is less swimming pool than 1.0 we use the term intersection rather than subsumption. If class-A subsumes class-B with a 0 value, it Figure 2. Example of MCRDR tree. means we do not have any information about the The question arises of whether it would be better to start subsumption/intersection relation for class-A and class-B. with a more well developed ontology, as suggested by most The mutual exclusivity measure of class-A and class-B is KA researchers. However in practice RDR systems allow 1.0 if class-A and class-B never occur together. In RDR we experts to very rapidly and easily build large knowledge evaluate this measure by checking whether there is a bases [5] and recent commercial RDR systems have condition in a rule for class-A mutually exclusive with any confirmed these advantages outside the research condition in a rule for class-B (for example age > 50 and environment. (Pacific Knowledge Systems (PKS), personal age < 30). In RDR knowledge bases parent rules are communication) The aim of the present work then is to always mutually exclusive with child rule. preserve the ease and speed of development provided by RDR systems, but overcome their lack of an initial strong Class-A similar to class-B with measure 1.0 if both of the ontology, by discovering the ontologies implicit in these classes have exactly the same conditions. We use the term incrementally developed systems. This may give us the same rather than similar if the similarity is 1.0. best of both worlds. A class in MCRDR is the set of disjunct rule paths giving RDR exception structure provides a compact representation the same conclusion. A rule path consists of all conditions of knowledge [6-13]. Gaines ([14]) has also generalized from all predecessor rules plus conditions of this particular RDR to Exception Directed Acyclic Graphs (EDAGs). He node’s rule. For example, (see figure 2) : argues EDAGs are more compact and readable than rule path 6 : MCRDR because of a graph rather than a tree structure. Initial RDR development was concerned with classification class Play Chess wind > 40, tasks, first single and later multiple classification. RDR has wave=LOW, sky=SUNNY. since been extended to configuration [15], heuristic search The central idea of the technique is to group all rules for [16], document retrieval [17] and a more general RDR each class and compute a quantitative measurement (from 0 system for construction tasks has been proposed [18]. to 1) for each relation (subsumption, mutual-exclusivity, Figure 2 shows a simple example of the exception structure similarity) between every pair of classes. We use this used in MCRDR. All pathways are evaluated. Evaluation quantitative measure as an informal confidence measure as on any pathway stops when a leaf node is reached or no to whether these relations exist. The algorithm will be child rule is satisfied. A conclusion is provided by the last discussed in detail below, but when applied to the example satisfied rule in each refinement pathway. When an expert in figure 2 it gives: class Go Swimming subsumes class identifies that an erroneous conclusion has been given, they Swimming in the beach with degree of confidence 0.83; enter a new rule, which the system adds as a refinement. class Play_Chess and class Go_Swimming are mutually 2 exclusive with degree of confidence 0.17; class Go Swimming the same viewpoint and note that although there is research on class is similar to class Play Chess have degree of similarity combining KA and machine learning and using background 0.50. This quantitative measure enables us to group different knowledge in machine learning, there seems little research so examples of the class and provides information on whether far in learning from a KBS rather than from cases [20], [14]. across these examples, a class tends to subsume another class The immediate precursor of this work [20] applied formal (for example, class Go Swimming subsumed class Swimming concept analysis to ontology discovery in knowledge bases. in the beach with degree of confidence 0.83. Using this This provided a useful way to explore concepts in a quantitative measure we can say class Go Swimming tends to knowledge base, but because of the complexity of the subsume or almost subsumes class Swimming in the beach, conceptual lattice it was more useful to consider sub-sections rather than simply saying class Go Swimming subsumes class of the lattice, selected by the user or by a simple nearest Swimming in the beach or class Go Swimming does not neighbour algorithm [21]. The critical difference from the subsume class Swimming in the beach. work here, is that in formal concept analysis the difference These measures become interesting when applied to real between different concepts is emphasised. Here we attempt to examples such as: class [Euthyroid levels] subsumes combine all the concepts that represent a class and consider class [Levels consistent with adequate relations between classes rather than between concepts. replacement] with degree of confidence 0.866. (Medically, ‘adequate’ thyroid hormone replacement brings thyroid hormone levels to approximately normal levels.) Boolean values are obviously inappropriate for subsumption 3. The class relations model /intersection, mutual-exclusivity, and similarity relations in real domains. We found that in the Iron knowledge base, The class relations model shows the relations subsumption, there were only 16 subsumption relations with degree of mutual-exclusivity and similarity between classes and the confidence 1.0; 8 mutually-exclusive relations with degree of degree of confidence that the particular relation exists [22]. confidence 1.0 and no similarity relations with degree of We note that the measures we derive are strictly heuristic. confidence 1.0. Other superior and perhaps more well founded measures may We refine the subsumption/intersection measure by not only be possible. The results here represent simply a first attempt considering the conditions in the rule path but also the ratio of at carrying out this type of analysis. The second point to note number of cases handled by parent and child rules in a rule- is that these relations have to deal with non-boolean as well as path. This allows us to deal with the situation where a rule is boolean data. a gross over-generalisation and the child rule is added as correction to deal with most of the cases the parent rule would Let X be a class in the MCRDR framework. {X0…Xm} is a fire on. We consider that there is little value to the set of rules which have class X as the conclusion. {Xi0 … subsumption relation in this circumstance. For example (see Xin} is a set of conditions for rule Xi where i = 0...n, n is the figure 2) rulepath-2 has two cases, rulepath-3 has number of distinct conditions in the rule path; m is number of two cases, rulepath-4 has 3 cases and rulepath-6 has rule paths for class X. In the MCRDR framework the class is 3 cases. We therefore calculate the quality of rulepath-2 given as a disjunction of rule paths [7]. is 2/(2+2+3+3) = 0.2, quality of rulepath-4 is 3/(3+3) = 0.5, quality of rulepath-3 is 2/2 = 1.0 and quality of Then m n rulepath-6 is 3/3 = 1.0. We do not need a rule quality measure if we use flat rule class X = ∨ ( ∧ Xij ) system rather than the refinements of a rule path, for example i=0 j=0 a flat rule for rule_2 is: class Go Swimming sky=SUNNY, not(ultraviolet= VERY  That is, Xij stand for an individual condition in a rule path for HIGH), not(wave=LOW). the class X. If X is a class and Y is also a class, we could define a similarity measure as follows: Flat rules for rule_3 and rule_6 are the same as their rule paths since the rule path extends to the leaves and so no negation of Sim (Xij ,Yij) = 0 if Xij ,Yij are different child conditions is required. Rule_1 has three children with Sim (Xij ,Yij) = 1 if Xij ,Yij are same one, one and two conditions, and therefore we get (1 x 1 x 2) flat rules. By converting this RDR knowledge base to flat If α is set of distinct attributes in rule path Xi, β is set of rules we get an equivalent knowledge base, but this is distinct attributes in rule path Yi, n = | α ∪ β|, (Xij ,Yij) are generally not feasible with real-world knowledge bases. In pair of same attributes, but they could be different conditions, the five knowledge bases considered here, some rules have (e.g. Age>50, Age=60) then we can define: many children with several conditions. There is one rule with n 10 children and every child has 5 conditions which converts to (510) flat rules. Σ Sim(Xij ,Yij) One of the advantages of learning from rules is that we can j=0 assume that irrelevant attributes have already been discarded. This is significant as in our application domain there are Similar(Xi ,Yi) = −−−−−−−−−−−− * Q( Xi ) * Q(Yi) hundreds of attributes. Gaines [19] argues that a rule in a knowledge base is worth many cases for learning. We adopt | α ∪ β| where Q( Xi ) = number of cases for Xi / (number of cases for where Q(Xi ) = number of cases for Xi / (number of cases for descendant Xi + number of cases for Xi). Function Q descendant Xi + number of cases for Xi ) measures the quality of a rule in RDR. Function Subsume() measures a degree of confidence that the If the quality of a rule is close to 100%, it means that nearly first rule path subsumes the second rule path. For example all cases reaching this rule are processed by the its child rules; subsume(rulepath-2,rulepath-4) = 2/2 * and the child rules are rare exceptions. On the other hand if Q(rulepath-2) * Q(rulepath-4). the quality of a rule is 10%, it means 90% of cases that satisfy that rule are passed to its children. That is the rule is too Conditions in rulepath-2 are sky=SUNNY; general and can be regarded as not being a particularly good conditions in rulepath-4 are sky=SUNNY, wave=LOW; rule and so it should not be given as strong consideration in and |α ∪ β| = 2, that is α ∪ β = {sky, wave}. developing the relations in the system. Since rulepath-2 does not have the attribute wave, we For example consider rulepath-2 is more general than rulepath-4 with the attribute wave. Therefore there are 2 conditions in Similar(rulepath-2, rulepath-6) rulepath-2 which are same as or more general than 1/3 * Q(rulepath-2) * Q(rulepath-3), rulepath-4. Similarly, Similar(rulepath-8, rulepath-9) = 1/2 * Q(rulepath-8) * Q(rulepath-9), subsume (rulepath-2, rulepath-5) =2/3 * Q(rulepath-2) * Q(rulepath-5) Similar(rulepath-9, rulepath-10) = 2/3 * Q(rulepath-9) * Q(rulepath-10). Figure 5 illustrates how we find a subsumption measure Function Similar() measures a similarity between 2 nodes between 2 classes. It shows Class X as a disjunction of nodes (each node contains a rule path). 1,2 and 3 and Class Y as a disjunction of nodes 4,5 (each node contains a rule path). Class X Class X Class X 1 2 3 1 2 3 1 2 3 We compute ClassSubsume(X,Y) = (v1 + v3) / 2. We choose v2 v4 the v such that all nodes of class Y are covered by at least one v1 v2 v3 v1 v2 v3 v1 v3 v5 v6 edge and the sum of v (eg. v1 + v3) is maximal. E.g. classSubsume ( Go swimming, Swimming in the beach) = 4 5 4 5 4 5 (1+0.667)/2. We assume the quality of all rule paths is 1.0 to Class Y Class Y Class Y simplify this example. Figure 4. Figure 5 Figure 6 We can define a mutually exclusive measure as follows with Figure 4 suggests how we can find a similarity measure Xij and Yij standing for individual conditions in rule paths between 2 classes. It shows that Class X is the disjunction of for the relevant classes as above. nodes 1,2 and 3 and Class Y is the disjunction of nodes 4 and 5. We propose that ClassSimilarity(X,Y) = (v1 + v2 + v3) / 3, Mut (Xij ,Yij) = 0 if Xij and Yij are not mutually exclusive. where we choose the v such that all nodes are covered by at Mut (Xij ,Yij) = 1 if Xij and Yij are mutually exclusive (for least one edge and the sum of v (eg. v1 + v2 + v3) is maximal. example A>5 and A<2) Note that v stands for the Similar() function. In later similar diagrams v stands for the Subsume() and MutualEx() MutualEx(Xi, Yi) = 1 , if at least one of Mut(Xij ,Yij)=1 measures. Eg. ClassSimilarity(Go swimming, Play chess) = MutualEx(Xi, Yi) = 1 , if rule Xi and rule Yi are parent and ((1/3)+(1/2)+(2/3))/3=0.5. We assume here the quality of all child. rule paths are 1.0 to simplify the example. MutualEx(Xi ,Yi) = 0 , otherwise We can define a subsumption measure as follows with Xij and Yij standing for individual conditions in rule paths for the Function MutualEx() measures a degree of confidence that the relevant classes as above. first rule subsumes the second rule. Since the quality of the rule does not affect mutual exclusivity as much as it affects Sub (Xij ,Yij) = 0 if Xij does not subsume Yij similarity and subsumption/intersection, we do not apply the Sub (Xij, Yij) = 1 if Xij subsumes or same Yij (for example quality measure to mutual exclusivity. For example A>5 subsumes A>10) MutualEx(rulepath-2, rulepath-10) = 1.0, If α is set of distinct attributes in rule Xi, β is set of distinct Figure 6 suggests how we can find a mutual exclusivity attributes in rule Yi, then we can define: measure between 2 classes. It shows that Class X is a n disjunction of nodes 1,2 and 3 and Class Y is a disjunction of nodes 4 and 5. We compute ClassMutualEx(X,Y) = (v1 + v2 Σ Sub(Xij ,Yij) +v3 +v4 + v5 + v6) / 6. X and Y are mutually exclusive if j=0 and only if all nodes of X and Y are mutually exclusive with respect to each other (see Figure 6). Therefore ClassMutualEx Subsume(Xi ,Yi) = −−−−−−−−−−− ∗ Q( Xi ) * Q(Yi) (Go swimming, Play chess) = 1/6, since Go Swimming has 2 rulepaths, Play chess has 3 rulepaths |α ∪ β| and all MutualEx() between those rulepaths are each 0.0, except for MutualEx(rulepath-2, rulepath-10) = 1.0. 4. Experimental results for all sets of three classes A,B,C from a knowledge bases We can then combine such elements. E.g we may join Horm one knowledge bases system sim ilarity-value > 0.66 Triangle(A,B,C) and Triangle(D,E,F) if A=D and Class description Class description MutualExclusive( {B,C}, {E,F}, > 0.5). If MutualExclusive( [5] interpretation [12] M ale interpretation B, C, == 1.0 ), we could say {B,C} are exhaustive subclass [Satisfactory prolactin level.] [Satisfatory prolactin level.] partitions of A [23]. We applied this technique to the thyroid [14] interpretation [28] interpretation] [Consistent with knowledge base and obtained the result in figure 7. [Consistent with premature premature ovarian failure. Suggest ovarian failure.] repeat FSH and oestradiol in 2-3 0 m onths to confirm.] Hormone knowledge bases system subsum ption-value = 1 Class description Class description [6] interpretation [Elevated [33] interpretation [Elevated prolac- 4 5 6 7 17 44 24 prolactin persists. Suggest TSH ] tin persists. Primary hypothyroidism has been excluded. IV sam pling thro- ugh an in-dwelling cannula can help exclude stress-related elevations of 8 12 33 36 23 30 49 51 37 19 22 40 26 prolactin. Pituitary imaging may be required.] [36] interpretation [Elevated prolac- tin persists. Await TSH.] 38 56 [7] Interpretation [23] M ale interpretation [Raised pro- [Raised prolactin in fem ales is lactin in m en is commonly due to Figure 7. Partial taxonomy of the thyroid knowledge base comm only due to medication, stress, medications and occasionally strees or lactation. hypothyroidism. Suggest TSH and Suggest TSH to exclude repeat prolactin after 30 minutes rest.] hypothyroidism and repeat [30] interpretation [Raised prolactin 5. Discussion prolactic after 30 minutes rest.] in fem ales is comm only due to m edi- cations, stress or lactation. Hypothy- roidism has been excluded. Suggest It is beyond the scope of this paper (and the authors) to review medications and repeat prolac- provide a detailed medical analysis of the ontologies produced tin after 30 minutes rest.] by these techniques; however, it is worth noting some lay observations. Horm one knowledge bases system m utual-exclusivity-value=1 Class description Class description The mutually-exclusive classes in Table 1 seem reasonable. [4] interpretation] [9] interpretation [No evidence of However, we also have some cases where very similar [Consistent with perim enopause, unless patient is conclusions are identified as mutually exclusive. This occurs perimenopause] on oestrogen therapy.] when experts make up rules that include different values for [5] interpretation [6] interpretation [Elevated the same attribute. For example some mutually exclusive [Satisfactory prolactin level.] prolactin persists. Suggest TSH conclusions seem to give the same clinical advice but specifically refer to a patient being male or female. This may Table 1. some examples of class relations. or may not be of clinical importance, but there is an obvious opportunity to have a further superclass. The most interesting issues arise with the nature of 4.1 Class relations model subsumption. A superclass subclass relation may arise where one rule specifies a value for an attribute and another does The results from the endocrinology knowledge are shown in not. For example a key factor in comments 6, 7, 23, 30, 33 the table 1. The results shown in each table are the class pairs and 36 is whether or not a TSH measurement should be with the highest similarity, subsumption, or mutual ordered and whether or not primary hypothyroidism has been exclusivity measures. Only results with high values for these excluded. TSH results are important in the diagnosis of relations are shown primary hypothyroidism. The generic comment suggesting a TSH measurement is given when there is no TSH result available. The comment also suggests the clinical cause of the high prolactin level remains unknown. When there is a 4.2 Extracting patterns from the class relation graph TSH result available this provides some evidence to confirm or exclude one of the causes of the high prolactin. These Since there are many classes (from 25 to 100), it is impossible relations appears to us as ontologically reasonable. However, to consider all possible pairs of relations between the classes. the wording of the actual comments does not readily indicate We therefore extract specific patterns which seem likely to be such relationships. It would be interesting to know how the components of a meaningful taxonomy. For example: expert would react and how comments might be worded if this ontological information was available as rules were being Subsume(A,B)==1.0, developed. Subsume(A,C)==1.0, MutualExclusive(B,C)>0.5 A more general example of this pattern is the comment [0]:“patient has ovulated” which is at the top level of the taxonomy in Fig 7. This subsumes a whole range of more 3. Grosso, W.E., et al. Knowledge Modeling at the Millennium specific comments related to other attributes. Again it would (The Design and Evolution of Protégé-2000). in Twelfth Workshop on Knowledge Acquisition, Modeling and be interesting to see if this taxonomic information influenced Management, Voyager Inn, Banff, Alberta, Canada. 1999. the expert’s wording. 4. Richards, D. and P. Compton. Knowledge Acquisition First, Modelling Later. in 10th European Knowledge Acquisition Workshop. 1997 5. Edwards, G., et al., PEIRS: a pathologist maintained expert 5.1. Further work system for the interpretation of chemical pathology reports. Pathology, 1993. 25: p. 27-34. At this stage we have only conducted a preliminary 6. Catlett, J. Ripple Donw Rules as a Mediating Representation in Interactive Induction. in Proceedings of the Second Japanese examination of some of the relations in the knowledge bases. Knowledge Acquisition for Knowledge Based Systems A detailed examination by domain experts will be required to Workshop. 1992. Kobe, Japan. establish the utility of this approach. This examination may 7. Gaines, B.R. and P.J. Compton. Induction of Ripple Down suggest that other types of measure than those suggested here Rules. in Fifth Australian Conference on Artificial Intelligence. may be useful and paterns other than those used in Fig 7 may 1992. Hobart. 8. Kivinen, J., H. Mannila, and E. Ukkonen. Learning Rules with be of interest for browsing the relations. At this stage we Local Exceptions. in European Conference on Computational make no claim about the particular heuristic measures we Theory. 1993. have used, except that broadly, measures of this kind seem 9. Scheffer, T. Algebraic foundations and improved methods of useful in discovering and exploring implicit ontologies. induction or ripple-down rules. in 2nd Pacific Rim Knowledge Acquisition Workshop. 1996. We are also investigating the possibility of learning 10. Siromoney, A. and R. Siromoney, Variations and Local interesting patterns from a coloured graph of class relations. Exception in Inductive Logic Programming, in Machine The frequency of isomorphic sub-graphs may be interesting. Intelligence - Applied Machine Intelligence, K. Furukawa, D. Michie, and S. Muggleton, Editors. 1993. p. 213 - 234. If the graph is large, then we could scale up the algorithm by 11. Compton, P., P. Preston, and B. Kang. The Use of Simulated applying data mining technique (e.g the apriori algorithm Experts in Evaluating Knowledge Acquisition. in 9th AAAI- [24]) sponsored Banff Knowledge Acquisition for Knowledge Base System Workshop. 1995. Canada. Finally, the present technique only considers the conditions in 12. Kang, B., P. Compton, and P. Preston. Simulated Expert rule paths and proportion of cases handled by rules in Evaluation of Multiple Classification Ripple Down Rules. in determining the relations. It does not consider any other 11th Banff knowledge acqusition for knowledge-based systems information about the classes themselves. The refinement workshop. 1998. Banff: SRDG Publications, University of Calgary. structure for RDR does not indicate any ontological 13. Suryanto, H., D. Richards, and P. Compton. The automatic refinement; however, it does indicate that an expert thought a compression of Multiple Classification Ripple Down Rule conclusion was inappropriate and so should be replaced by Knowledged Based Systems: Preliminary Experiments. in another. We are looking at the possible relations between Knowledge-based Intelligence Information Engineering conclusions that this would allow which could the be related Systems. 1999. Adelaide, South Australia: IEEE. 14. Gaines, B.R., Transforming Rules and Trees into to the idea of relations presented here. Comprehensible Knowledge Structures. 1995: AAAI/MIT Press. 15. Ramadan, Z., et al. From Multiple Classification RDR to Configuration RDR. in 11th Banff Knowledge Acquisition for 6. Conclusion Knowledge Base System Workshop. 1998. Canada. 16. Beydoun, G. and A. Hoffmann. NRDR for the Acquisition of Search Knowledge. in 10th Australian Conference on Artificial The work we have presented is an attempt to develop Intelligence. 1997. techniques to discover the ontologies implicit in knowledge 17. Kang, B., et al., A help desk system with intelligent interface. Applied Artificial Intelligence, 1997. 11((7-8)): 611-631. bases. We believe it will be of increasing importance to carry 18. Compton, P. and D. Richards. Extending Ripple Down Rules. in out this particular kind of knowledge discovery as larger and Australian Knowledge Acquisition Workshop 99. 1999. UNSW. larger knowledge bases come into use and we seek to exploit 19. Gaines, B. R. (1989). Ounce of Knowledge is Worth a Ton of the knowledge in the knowledge bases in different ways. We Data: Quantitative Studies of the Trade-off between Expertise do not make any particular claim for the techniques we have and Data based on Statistically Well-founded Empirical Induction, San Mateo, California: Morgan Kaufmann, 1989. developed to date, except that they suggest that such ontology 20. Richards, D. and P. Compton. Uncovering the conceptual discovery is possible. The key idea in the techniques we have models in RDR KBS. in International Conference on developed is that is seems reasonable to use heuristic Conceptual Structures ICCS'97. 1997. Seattle: Springer Verlag. quantitative measures to group classes and class relations. 21. Richards, D., The Reuse of Knowledge in Ripple Down Rule This then enables possible ontologies to be explored on a Knowledge Based Systems, in Artificial Intelligence reasonable scale Department. 1998, New South Wales: Sydney. p. 335. 22. Suryanto H. and Compton P. Discovery of class relations in exception structured knowledge bases. The International Conference on Conceptual Structures 2000. References 23. Perez, A.G. Evaluation of Taxonomic Knowledge in Ontologies and Knowledge Bases. in KAW'99, Twelfth Workshop on 1. Compton, P. and R. Jansen., A philosophical basis for Knowledge Acquisition, Modeling andManagement. 1999. knowledge acquisition. Knowledge Acquisition, 1990. 2: p. 241- Voyager Inn, Banff, Alberta, Canada. -257. 24. Agrawal, R., et al., Fast Discovery of Association Rules. 2. Schreiber, G., B. Wielinga, and J. Breuker, KADS A Principled Advances in Knowledge Discovery and Data Mining, ed. U.M. Approach to Knowledge-Based System Development. 1993: Fayyad, G. Piatetsky-shapiro, and P. Smyth. 1996: Academic Press.