The Sigma Ontology Development Environment Adam Pease Teknowledge 1800 Embarcadero Rd Palo Alto CA 94303 650 424-0500 650 493-2645 apease@teknowledge.com keywords: natural language, ontology Abstract Abstract: We discuss the development of an environment for formal knowledge engineering. The Sigma system is an Browsing and Editing advance over previously developed systems in that it integrates a number of modern ontology development tools, The Sigma system contains several different sorts of which has motivated a number of research issues. Primary browsers for viewing formal knowledge bases. The most components include an ontology browsing and editing basic component is a term browser, which presents all the environment, a first order logic inference system and a statements in which a particular term appears. The natural language to logic translator. Although largely statements are sorted by argument position and then the independent of any particular ontology, it supports a number appearance of the term in rules and non-rule statements are of publicly available formal ontologies. then shown. All the statements are hyperlinked to the terms that appear in them. Two types of tree browser are provided. One provides Introduction an automatic graph layout. Another shows a textual hierarchy. The user can chose the term to start with, the Human knowledge is complex, and human language is number of “levels” that should be presented from the term, very expressive. Computer systems that process natural and the binary relation to chose as the predicate that links language have been limited to a shallow level of understanding by the very complexity of the information the different nodes in the graph. For example, if the user that they are expected to handle. Conversely, systems that asks for a graph of the term IntentionalProcess and for manage information in databases have been left to reason the subclass relation, the system will go “up” the graph to either in very narrow and pre-determined ways, or to display the term Process, and down the graph to display perform processing tasks that must be interpreted by the terms Keeping, Guiding, Maintaining etc. The user humans. can ask for more levels up or down the graph. Note that It is desirable to employ computer languages for any relation can be chosen so for example a presentation of representing knowledge that are as expressive as possible, partonomies, or attribute hierarchies is also supported by and as close to the expressiveness of human language. choosing the relations part and subAttribute Expressiveness does have a computational burden. In the respectively. opinion of the author, the best compromise available for The editing system is currently rather rudimentary, many knowledge representation efforts is first order logic. consisting of the ability to type formulae and have some While there are other choices, such as description logic and simple syntax and other error checking performed. We higher order logic, they represent departures from a middle plan to offer a frame-based editing system similar to, or ground. First order logic is undecidable, but can be made incorporating open-source components from, the Protégé tractable for certain kinds of reasoning, which will be detailed later. Description logic (Baader, et al, 2003) has (Gennari et al, 2002) system. some very attractive formal properties, but is significantly We have developed some more sophisticated tools for less expressive than first order logic, and therefore limits collaborative knowledge engineering. The System for the knowledge engineer’s ability to express the semantic Collaborative Open Ontology Production (SCOOP) (Pease content of statements found in normal human discourse. & Li, 2003) supports an automated workflow process for Higher order logic (Carreno et al, 2002) on the other hand development of ontologies and resolution of conflicts is much harder to reason with. among ontologies. The SCOOP system employs a theorem The Sigma system was designed to integrate several prover to detect inconsistencies among ontologies. different kinds of tools for working with knowledge in first SCOOP addresses several types of inconsistencies. The order logic. first we term vertical inconsistency. This refers to when a set of ontologies are loaded and one ontology has a singular. Although different tense and number can be contradiction with another ontology on which it depends. accepted, the system paraphrases such sentences into a Inconsistency of this type is never allowed, because from a present tense singular form. Ambiguous word choices contradictory knowledge base, any proposition can be default to the most popular sense of the word, with an concluded to be true. SCOOP enforces a workflow process escape mechanism for the user to select a different sense. that requires a resolution to such a situation. Anaphoric references are handled with a simple mechanical The second kind of inconsistency we term horizontal algorithm. Some kinds of quantification (every, some, all) can be handled. inconsistency. This refers to a situation in which The current system is capable of taking simple English knowledge products created by different knowledge sentences and translating them into KIF, using terms from engineers, and which do not have a mutual dependency, are our SUMO upper ontology (described below). We have mutually contradictory. SCOOP alerts developers to such a mapped all 100,000 WordNet (Miller, et al, 1993) word situation, but does not require that it be eliminated. It is senses (synsets) to SUMO, one at a time, by hand over the possible for two knowledge bases to represent different period of a year (Niles & Pease, 2003). This endows our perspectives, which are contradictory, but locally valid. natural language translation system with a very large We employ the KIF (Genesereth, 1991) language and vocabulary. have a translator to and from DAML (Hendler & Our approach is presented more fully in (Murray, Pease McGuinness, 2000). A translator to and from Protégé & Sams, 2003). (Gennari et al, 2002) is partially completed at this time. Language Generation Inference Each statement can also be presented as a natural One basic issue is that of allowing variables in the language paraphrase similar to (Sevcenko, 2003). Note predicate position. While the general case of this results in that we also use Sevcenko’s language templates for English expressions being beyond first order logic, there is a and Czech. Each term in an ontology can be given a slightly more restricted case, which is first order. A statement with a variable in the predicate position is only natural language presentation form, indexed by the human higher order if a quantifer ranges of all possible relations, language. For example, the SUMO term rather than the finite set of relations in a particular DiseaseOrSyndrome can be presented as “ disease or knowledge base. Since prepending another relation in front syndrome” and an Italian presentation would be shown as of every statement makes everything first order for such a "malattia o sindrome". Combined with the English restricted practical case, this therefore must mean that the understanding mechanisms described below, this gives us a statement wasn’t really higher-order in the first place. very rudimentary language translation capability. There is also an issue with prohibiting statements as Although presentation of terms is straightforward, arguments to relations. When arguments to relations are presentation of statements is more complicated, and statements, then such a statement is higher order. We could roughly patterned after C-language printf statements. For either have a quote operator or define a Statement class and example the relation “ part” , which states that one object is require all predicates have their argument types defined a part of another, has the corresponding language (which would allow us to know in advance whether an generation template “ %1 is %n a part of %2” . The first argument is a Statement, and therefore needs to be argument to the logical relation is substituted for %1 etc. implicitly quoted). A statement that has this issue is Note that this substitution is recursive, so that complex (believes John (likes Bob Mary)) statements with nested formulae can be translated Currently, such statements are quoted, and therefore effectively. The %n signifies that if the statement is become opaque first-order objects rather than statements. negated, that the negation operator for the appropriate Another issue is row variables, for use in variable arity human language should be inserted in that position. relations. One proposed version of KIF (Hayes & Menzel, We currently have language templates for English, 2001) has what are alternately called row, or sequence Czech, German and Italian. Some work has been done on variables. They are denoted by the ’@’ symbol in KIF Hindi, Telugu, Tagalog and Russian. statements. They are analogous to the lisp @REST variable. This is not first order if the number of arguments Natural Language Understanding it can "swallow up" is infinite. However, if row variables have a definite number of arguments, it can be treated like a We take a restricted language approach to natural macro, and becomes first order. For example, language understanding. Deep understanding of unrestricted human languages is too hard with present (=> technology. Our approach is to create a restricted version (and (subrelation ?REL1 ?REL2) of English, which is grammatical but more limited than true (holds ?REL1 @ROW)) natural language. Statements must be present tense (holds ?REL2 @ROW)) Of course, this is only easy in the specific case where would become two rules have opposite conclusions and the antecedent of one subsumes the other. It’s possible that in easy cases we (=> might be able to have Sigma even generate the (and (exception... clause that the macro will use. (subrelation ?REL1 ?REL2) (holds ?REL1 ?ARG1)) The algorithm for the macro simply would be, given the (holds ?REL2 ?ARG1)) (exception... statement, to negate the antecedent of the and add it as a conjunction to the (=> . Like other macros, this has (and (subrelation ?REL1 ?REL2) implications for proof presentation that are also unresolved. (holds ?REL1 ?ARG1 ?ARG2)) (holds ?REL2 ?ARG1 ?ARG2)) Proof Presentation Sigma includes several options for proof presentation. etc. Despite the fact that most textbooks present proofs as linear Also note that this "macro" style expansion has the structures, proofs are trees. The same sub-proof may be problem that unlike the true semantics of row variables, used several times in reaching different intermediate that it is not infinite. If the macro processor only expands conclusions. While this is a byproduct of automated to five variables, there is a problem if the knowledge reasoning, it is rarely useful to the human user. Therefore, engineer uses a relation with six. Because of that, Sigma’s Sigma eliminates redundant paths of reasoning in order to syntax checker must prohibit relations with more arguments create a linear proof. than the row variable preprocessor expands to. Sigma also has options for how it presents multiple proofs. Often, the same proof, with the same proof steps, Default reasoning can be reached via a different search order. This means that although two tree-structured proofs may be different, Given the example KB: they may work out to be identical when redundant paths are removed and a linear proof structure generated. Sigma (=> supports an option for hiding such duplicate proofs. (instance ?X Bird) On additional option is to suppress proofs that may have (canFly ?X)) different steps but which lead to the same answer. (=> Sometime it is useful to a knowledge engineer to see these (instance ?X Penguin) alternate proofs, and sometime not. (not (canFly ?X))) Reasoning with Equality (subclass Penguin Bird) One problem in a first order theorem prover with procedural attachments for arithmetic is that the equality We would like the second rule to override the first. The operator masks normal unification and inference. Currently general case of default reasoning is very difficult Sigma converts the SUMO term ’equal’ to ’=’ when sending (Schlechta, 1997). However, we believe that there is an assertions and queries to the theorem prover and then easy way to address the most common case and get the converts back for proof formatting. We are experimenting utility we need. Given a relation with changing the name to ’eval’ and mapping that string to the theorem prover’s '='. (exception ) Our current idea is to have three predicates, one, which handles evaluation, one, which handles inference, and we can implement a macro that modifies the first formula another, which combines the two. above to exclude the exception specified and generate the KB equalAsserted eval (=> (and equal (instance ?X Bird) (not (instance ?X Penguin)) (canFly ?X)) The following examples of assertions and queries should make clear how this will work. (=> (instance ?X Penguin) assert: (not (equal (CardinalityFn Continent) 7) (canFly ?X))) macro expands to: (subclass Penguin Bird) (eval (CardinalityFn Continent) 7) (equalAssigned (CardinalityFn Continent) 7) documentation allows us to alert the user when such query: statements are conflicting or not present. (eval (CardinalityFn Continent) ?X) We employ the Suggested Upper Merged Ontology result: (SUMO) (Niles & Pease, 2001) as the system’s standard (CardinalityFn Continent) ontology. SUMO has been released to the public for free th since its first version in December of 2000. Now in its 47 query: version, the SUMO has been reviewed by hundreds of (equalAssigned (CardinalityFn Continent) ?X) people and subject to formal validation with a theorem prover, to ensure that it is free of contradictions. The result: SUMO contains roughly 1000 terms and 4000 statements, 7 of which 750 are rules. As mentioned above, it has been mapped by hand to the WordNet lexicon, which has acted query: as an additional check on completeness and coverage. (equal (CardinalityFn Continent) ?X) A number of domain ontologies have been created that macro expands to: extend SUMO and can be used in the Sigma system. They (eval (CardinalityFn Continent) ?X) include (equalAssigned (CardinalityFn Continent) ?X) • A Quality of Service ontology, covering computer result: systems and networks (CardinalityFn Continent) • An ecommerce services ontology 7 • An ontology of biological viruses • A financial ontology query: • An ontology of terrain features (eval (AdditionFn 2 3) ?X) • Ontologies of weapons of mass destruction and result: terrorism 5 • An ontology of government and governmental organizations query: • Taxonomies of the periodic table of the elements (equalAssigned (AdditionFn 2 3) ?X) and industry types result: [no] Related Work query: There have been a number of ontology editing (equal (AdditionFn 2 3) ?X) environments created including Ontolingua (Farquhar et al, macro expands to: 1996), Ontosaurus (Swartout et al, 1996), Protégé (Gennari (eval (AdditionFn 2 3) ?X) et al, 2002), and Cyc (Lenat, 1995). Ontosaurus and Cyc (equalAssigned (AdditionFn 2 3) ?X) both have inference components. Cyc is the only system to result: contain a standard ontology. Cyc also contains a natural 5 language component although little or no public information about that work is available. The formal semantics of the macro is The GKB editor (Paley et al, 1997) is an example of a graphical ontology editor. SNARK (Stickel et al, 1994), (=> and Otter (Kalman, 2001) are examples of theorem provers. (equal ?X ?Y) Many theorem provers have been tested on the (eval ?X ?Y)) CADE/TPTP (Sutcliffe et al, 2002) competitions. (=> However, limited efforts have been made to apply formal (equal ?X ?Y) first order theorem proving to expert system and common (equalAssigned ?X ?Y)) sense reasoning on large knowledge bases. The Sigma system brings all these types of components together, and in the process, addresses a number of research issues that Ontology have not been evident in use of these separate components. The Sigma system has been designed to be as independent of a particular ontology as possible, but there are many features that are available when a standard Acknowledgements ontology is used. Current features are primarily to do with The author would like to thank Randy Schulz for his efforts error checking. Knowing a standard method for defining on early versions of this system. argument type restrictions, class-subclass relations and References Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. “ Introduction to WordNet: An On-line Lexical Baader, F., Calvanese, D., McGuinness, Nardi, D., and Database.” 1993. Patel-Schneider, P., (eds). (2003) The Description Logic Handbook Theory, Implementation and Applications, 574 Murray, W., Pease, A., and Sams, M. (2003). Applying pp. ISBN: 0521781760 Formal Methods and Representations in a Natural Language Tutor to Teach Tactical Reasoning. To appear. Carreno, V., Munoz, C., and Tahar, S., (eds.) (2002). Theorem Proving in Higher Order Logics. Proceedings of Niles, I & Pease A., (2001) “ Towards A Standard Upper the 15th International Conference, TPHOLs 2002, Ontology.” In Proceedings of Formal Ontology in Hampton, VA, USA, August 20-23, 2002. Springer- Information Systems (FOIS 2001), October 17-19, Verlag. Ogunquit, Maine, USA, pp 2-9. See also http://ontology.teknowledge.com Farquhar, A., Fikes, R., and Rice, J., (1996). The Ontolingua Server: a Tool for Collaborative Ontology Niles, I., & Pease, A., (2003). Mapping WordNet to the Construction. Proceedings of Tenth Knowledge SUMO Ontology. To appear. See also Acquisition for Knowledge-Based Systems Workshop. http://ontology.teknowledge.com:8080/rsigma/nilesWordN Available at et.pdf http://ksi.cpsc.ucalgary.ca/KAW/KAW96/farquhar/farquha r.html#RTFToC15 Paley, S.M., Lowrance, J.D., and Karp, P.D. (1997) "A Generic Knowledge-Base Browser and Editor," In Gennari, J., M. A. Musen, R. W. Fergerson, W. E. Grosso, Proceedings of the 1997 National Conference on Artificial M. Crubézy, H. Eriksson, N. F. Noy, S. W. Tu (2002). The Intelligence, Providence, RI. Evolution of Protégé: An Environment for Knowledge- Based Systems Development. Stanford SMI technical Pease, A., and Li, J., (2003). Agent-Mediated Knowledge report SMI-2002-0943 .http://smi-web.stanford.edu/pubs Engineering Collaboration, in Proceedings of the AAAI /SMI_Abstracts/SMI-2002-0943.html Spring Symposium on Agent Mediated Knowledge Management. Stanford, CA March 24-26. To appear. Genesereth, M., (1991). ``Knowledge Interchange Format' ' , In Proceedings of the Second International Conference on Pease, A., Niles, I., Li, J., (2002), The Suggested Upper the Principles of Knowledge Representation and Reasoning, Merged Ontology: A Large Ontology for the Semantic Web Allen, J., Fikes, R., Sandewall, E. (eds), Morgan Kaufman and its Applications, in Working Notes of the AAAI-2002 Publishers, pp 238-249. Workshop on Ontologies and the Semantic Web. http://projects.teknowledge.com/AAAI-2002/Pease.ps Hayes, P., and Menzel, C., (2001). A Semantics for Knowledge Interchange Format, in Working Notes of the Schlechta, K., (1997). Nonmonotonic Logics: Basic IJCAI-2001 Workshop on the IEEE Standard Upper Concepts, Results, and Techniques (Lecture Notes in Ontology. Artificial Intelligence) by Karl 243 pp. Springer Verlag pub. ISBN: 3540624821 Hendler, J., and McGuinness, D., (2000), The DARPA Agent Markup Language, IEEE Intelligent Systems, 15 (6) Stickel, M., R. Waldinger, M. Lowry, T. Pressburger, and I. (November), 67-73. Available: Underwood. (1994) Deductive composition of astronomical http://www.ksl.stanford.edu/people/dlm/papers/ieee- software from subroutine libraries. Proceedings of the daml01-abstract.html Twelfth International Conference on Automated Deduction (CADE-12), Nancy, France, June 1994, 341-355. See also Kalman, J. K. (2001) Automated Reasoning with OTTER. http://www.ai.sri.com/~stickel/snark.html. Rinton Press. Sutcliffe G., Suttner C.B., Pelletier F.J. (2002), The IJCAR Lenat, D. (1995). "Cyc: A Large-Scale Investment in ATP System Competition, Journal of Automated Reasoning Knowledge Infrastructure." Communications of the ACM 28(3), pp.307-320. 38, no. 11 (November). Swartout, B., Patil, R., Knight, K. and Russ, T. (1996). MacGregor, R., (1991). The Evolving Technology of Ontosaurus: a tool for browsing and editing ontologies, Classification-Based Knowledge Representation Systems. Gaines, B.R. and Musen, M.A., Ed. Proceedings of Tenth In Principles of Semantic Networks: Explorations in the Knowledge Acquisition Workshop. pp.69-1-69-12 Representation of Knowledge, ed J. Sowa, 385-400. San (http://ksi.cpsc.ucalgary.ca/KAW/KAW96/swartout/ontosa Francisco, Calif., Morgan Kaufmann urus_demo.html)