The Sigma Ontology Development Environment
                                                         Adam Pease
                                                         Teknowledge
                                                    1800 Embarcadero Rd
                                                     Palo Alto CA 94303
                                                         650 424-0500
                                                         650 493-2645
                                                  apease@teknowledge.com
                                              keywords: natural language, ontology


                          Abstract
  Abstract: We discuss the development of an environment
  for formal knowledge engineering. The Sigma system is an                           Browsing and Editing
  advance over previously developed systems in that it
  integrates a number of modern ontology development tools,
                                                                       The Sigma system contains several different sorts of
  which has motivated a number of research issues. Primary          browsers for viewing formal knowledge bases. The most
  components include an ontology browsing and editing               basic component is a term browser, which presents all the
  environment, a first order logic inference system and a           statements in which a particular term appears. The
  natural language to logic translator. Although largely            statements are sorted by argument position and then the
  independent of any particular ontology, it supports a number      appearance of the term in rules and non-rule statements are
  of publicly available formal ontologies.
                                                                    then shown. All the statements are hyperlinked to the terms
                                                                    that appear in them.
                                                                       Two types of tree browser are provided. One provides
                     Introduction                                   an automatic graph layout. Another shows a textual
                                                                    hierarchy. The user can chose the term to start with, the
   Human knowledge is complex, and human language is
                                                                    number of “levels” that should be presented from the term,
very expressive. Computer systems that process natural
                                                                    and the binary relation to chose as the predicate that links
language have been limited to a shallow level of
understanding by the very complexity of the information             the different nodes in the graph. For example, if the user
that they are expected to handle. Conversely, systems that          asks for a graph of the term IntentionalProcess and for
manage information in databases have been left to reason            the subclass relation, the system will go “up” the graph to
either in very narrow and pre-determined ways, or to                display the term Process, and down the graph to display
perform processing tasks that must be interpreted by                the terms Keeping, Guiding, Maintaining etc. The user
humans.                                                             can ask for more levels up or down the graph. Note that
   It is desirable to employ computer languages for                 any relation can be chosen so for example a presentation of
representing knowledge that are as expressive as possible,          partonomies, or attribute hierarchies is also supported by
and as close to the expressiveness of human language.               choosing the relations part and subAttribute
Expressiveness does have a computational burden. In the             respectively.
opinion of the author, the best compromise available for               The editing system is currently rather rudimentary,
many knowledge representation efforts is first order logic.         consisting of the ability to type formulae and have some
While there are other choices, such as description logic and        simple syntax and other error checking performed. We
higher order logic, they represent departures from a middle
                                                                    plan to offer a frame-based editing system similar to, or
ground. First order logic is undecidable, but can be made
                                                                    incorporating open-source components from, the Protégé
tractable for certain kinds of reasoning, which will be
detailed later. Description logic (Baader, et al, 2003) has         (Gennari et al, 2002) system.
some very attractive formal properties, but is significantly           We have developed some more sophisticated tools for
less expressive than first order logic, and therefore limits        collaborative knowledge engineering. The System for
the knowledge engineer’s ability to express the semantic            Collaborative Open Ontology Production (SCOOP) (Pease
content of statements found in normal human discourse.              & Li, 2003) supports an automated workflow process for
Higher order logic (Carreno et al, 2002) on the other hand          development of ontologies and resolution of conflicts
is much harder to reason with.                                      among ontologies. The SCOOP system employs a theorem
   The Sigma system was designed to integrate several               prover to detect inconsistencies among ontologies.
different kinds of tools for working with knowledge in first           SCOOP addresses several types of inconsistencies. The
order logic.                                                        first we term vertical inconsistency. This refers to when a
set of ontologies are loaded and one ontology has a              singular. Although different tense and number can be
contradiction with another ontology on which it depends.         accepted, the system paraphrases such sentences into a
Inconsistency of this type is never allowed, because from a      present tense singular form. Ambiguous word choices
contradictory knowledge base, any proposition can be             default to the most popular sense of the word, with an
concluded to be true. SCOOP enforces a workflow process          escape mechanism for the user to select a different sense.
that requires a resolution to such a situation.                  Anaphoric references are handled with a simple mechanical
   The second kind of inconsistency we term horizontal           algorithm. Some kinds of quantification (every, some, all)
                                                                 can be handled.
inconsistency.     This refers to a situation in which
                                                                    The current system is capable of taking simple English
knowledge products created by different knowledge                sentences and translating them into KIF, using terms from
engineers, and which do not have a mutual dependency, are        our SUMO upper ontology (described below). We have
mutually contradictory. SCOOP alerts developers to such a        mapped all 100,000 WordNet (Miller, et al, 1993) word
situation, but does not require that it be eliminated. It is     senses (synsets) to SUMO, one at a time, by hand over the
possible for two knowledge bases to represent different          period of a year (Niles & Pease, 2003). This endows our
perspectives, which are contradictory, but locally valid.        natural language translation system with a very large
   We employ the KIF (Genesereth, 1991) language and             vocabulary.
have a translator to and from DAML (Hendler &                       Our approach is presented more fully in (Murray, Pease
McGuinness, 2000). A translator to and from Protégé              & Sams, 2003).
(Gennari et al, 2002) is partially completed at this time.

Language Generation                                                                      Inference
   Each statement can also be presented as a natural                One basic issue is that of allowing variables in the
language paraphrase similar to (Sevcenko, 2003). Note            predicate position. While the general case of this results in
that we also use Sevcenko’s language templates for English       expressions being beyond first order logic, there is a
and Czech. Each term in an ontology can be given a               slightly more restricted case, which is first order. A
                                                                 statement with a variable in the predicate position is only
natural language presentation form, indexed by the human
                                                                 higher order if a quantifer ranges of all possible relations,
language.        For     example,      the    SUMO       term
                                                                 rather than the finite set of relations in a particular
DiseaseOrSyndrome can be presented as “ disease or               knowledge base. Since prepending another relation in front
syndrome” and an Italian presentation would be shown as          of every statement makes everything first order for such a
"malattia o sindrome".         Combined with the English         restricted practical case, this therefore must mean that the
understanding mechanisms described below, this gives us a        statement wasn’t really higher-order in the first place.
very rudimentary language translation capability.                   There is also an issue with prohibiting statements as
   Although presentation of terms is straightforward,            arguments to relations. When arguments to relations are
presentation of statements is more complicated, and              statements, then such a statement is higher order. We could
roughly patterned after C-language printf statements. For        either have a quote operator or define a Statement class and
example the relation “ part” , which states that one object is   require all predicates have their argument types defined
a part of another, has the corresponding language                (which would allow us to know in advance whether an
generation template “ %1 is %n a part of %2” . The first         argument is a Statement, and therefore needs to be
argument to the logical relation is substituted for %1 etc.      implicitly quoted). A statement that has this issue is
Note that this substitution is recursive, so that complex        (believes John (likes Bob Mary))
statements with nested formulae can be translated
                                                                    Currently, such statements are quoted, and therefore
effectively. The %n signifies that if the statement is           become opaque first-order objects rather than statements.
negated, that the negation operator for the appropriate             Another issue is row variables, for use in variable arity
human language should be inserted in that position.              relations. One proposed version of KIF (Hayes & Menzel,
   We currently have language templates for English,             2001) has what are alternately called row, or sequence
Czech, German and Italian. Some work has been done on            variables. They are denoted by the ’@’ symbol in KIF
Hindi, Telugu, Tagalog and Russian.                              statements. They are analogous to the lisp @REST
                                                                 variable. This is not first order if the number of arguments
Natural Language Understanding                                   it can "swallow up" is infinite. However, if row variables
                                                                 have a definite number of arguments, it can be treated like a
   We take a restricted language approach to natural
                                                                 macro, and becomes first order. For example,
language understanding.        Deep understanding of
unrestricted human languages is too hard with present            (=>
technology. Our approach is to create a restricted version          (and
                                                                       (subrelation ?REL1 ?REL2)
of English, which is grammatical but more limited than true            (holds ?REL1 @ROW))
natural language. Statements must be present tense                  (holds ?REL2 @ROW))
                                                                 Of course, this is only easy in the specific case where
would become                                                   two rules have opposite conclusions and the antecedent of
                                                               one subsumes the other. It’s possible that in easy cases we
(=>                                                            might be able to have Sigma even generate the
   (and
                                                               (exception... clause that the macro will use.
      (subrelation ?REL1 ?REL2)
      (holds ?REL1 ?ARG1))                                       The algorithm for the macro simply would be, given the
   (holds ?REL2 ?ARG1))                                        (exception... statement, to negate the antecedent of the
                                                               <formula-specific> and add it as a conjunction to the
(=>                                                            <formula2-general>.       Like other macros, this has
   (and
      (subrelation ?REL1 ?REL2)
                                                               implications for proof presentation that are also unresolved.
      (holds ?REL1 ?ARG1 ?ARG2))
   (holds ?REL2 ?ARG1 ?ARG2))                                  Proof Presentation
                                                                  Sigma includes several options for proof presentation.
etc.                                                           Despite the fact that most textbooks present proofs as linear
   Also note that this "macro" style expansion has the         structures, proofs are trees. The same sub-proof may be
problem that unlike the true semantics of row variables,       used several times in reaching different intermediate
that it is not infinite. If the macro processor only expands   conclusions. While this is a byproduct of automated
to five variables, there is a problem if the knowledge         reasoning, it is rarely useful to the human user. Therefore,
engineer uses a relation with six. Because of that, Sigma’s    Sigma eliminates redundant paths of reasoning in order to
syntax checker must prohibit relations with more arguments     create a linear proof.
than the row variable preprocessor expands to.                    Sigma also has options for how it presents multiple
                                                               proofs. Often, the same proof, with the same proof steps,
Default reasoning                                              can be reached via a different search order. This means
                                                               that although two tree-structured proofs may be different,
Given the example KB:                                          they may work out to be identical when redundant paths are
                                                               removed and a linear proof structure generated. Sigma
(=>                                                            supports an option for hiding such duplicate proofs.
  (instance ?X Bird)                                              On additional option is to suppress proofs that may have
  (canFly ?X))
                                                               different steps but which lead to the same answer.
(=>                                                            Sometime it is useful to a knowledge engineer to see these
  (instance ?X Penguin)                                        alternate proofs, and sometime not.
  (not
    (canFly ?X)))
                                                               Reasoning with Equality
(subclass Penguin Bird)                                           One problem in a first order theorem prover with
                                                               procedural attachments for arithmetic is that the equality
We would like the second rule to override the first. The       operator masks normal unification and inference. Currently
general case of default reasoning is very difficult            Sigma converts the SUMO term ’equal’ to ’=’ when sending
(Schlechta, 1997). However, we believe that there is an        assertions and queries to the theorem prover and then
easy way to address the most common case and get the           converts back for proof formatting. We are experimenting
utility we need. Given a relation                              with changing the name to ’eval’ and mapping that string to
                                                               the theorem prover’s '='.
(exception <formula-specific> <formula2-general>)
                                                                  Our current idea is to have three predicates, one, which
                                                               handles evaluation, one, which handles inference, and
we can implement a macro that modifies the first formula       another, which combines the two.
above to exclude the exception specified and generate the
KB                                                             equalAsserted eval
(=>
  (and                                                                   equal
    (instance ?X Bird)
    (not (instance ?X Penguin))
  (canFly ?X))                                                 The following examples of assertions and queries should
                                                               make clear how this will work.
(=>
  (instance ?X Penguin)                                        assert:
  (not
                                                               (equal (CardinalityFn Continent) 7)
    (canFly ?X)))
                                                               macro expands to:
(subclass Penguin Bird)                                        (eval (CardinalityFn Continent) 7)
                                                               (equalAssigned (CardinalityFn Continent) 7)
                                                              documentation allows us to alert the user when such
query:                                                        statements are conflicting or not present.
(eval (CardinalityFn Continent) ?X)                              We employ the Suggested Upper Merged Ontology
result:                                                       (SUMO) (Niles & Pease, 2001) as the system’s standard
(CardinalityFn Continent)                                     ontology. SUMO has been released to the public for free
                                                                                                                        th
                                                              since its first version in December of 2000. Now in its 47
query:                                                        version, the SUMO has been reviewed by hundreds of
(equalAssigned (CardinalityFn Continent) ?X)                  people and subject to formal validation with a theorem
                                                              prover, to ensure that it is free of contradictions. The
result:
                                                              SUMO contains roughly 1000 terms and 4000 statements,
7
                                                              of which 750 are rules. As mentioned above, it has been
                                                              mapped by hand to the WordNet lexicon, which has acted
query:
                                                              as an additional check on completeness and coverage.
(equal (CardinalityFn Continent) ?X)
                                                                 A number of domain ontologies have been created that
macro expands to:                                             extend SUMO and can be used in the Sigma system. They
(eval (CardinalityFn Continent) ?X)                           include
(equalAssigned (CardinalityFn Continent) ?X)
                                                                   • A Quality of Service ontology, covering computer
result:                                                                 systems and networks
(CardinalityFn Continent)                                          • An ecommerce services ontology
7                                                                  • An ontology of biological viruses
                                                                   • A financial ontology
query:                                                             • An ontology of terrain features
(eval (AdditionFn 2 3) ?X)                                         • Ontologies of weapons of mass destruction and
result:                                                                 terrorism
5                                                                  • An ontology of government and governmental
                                                                        organizations
query:                                                             • Taxonomies of the periodic table of the elements
(equalAssigned (AdditionFn 2 3) ?X)                                     and industry types
result:
[no]
                                                                                  Related Work
query:                                                           There have been a number of ontology editing
(equal (AdditionFn 2 3) ?X)                                   environments created including Ontolingua (Farquhar et al,
macro expands to:                                             1996), Ontosaurus (Swartout et al, 1996), Protégé (Gennari
(eval (AdditionFn 2 3) ?X)                                    et al, 2002), and Cyc (Lenat, 1995). Ontosaurus and Cyc
(equalAssigned (AdditionFn 2 3) ?X)                           both have inference components. Cyc is the only system to
result:                                                       contain a standard ontology. Cyc also contains a natural
5                                                             language component although little or no public
                                                              information about that work is available.
The formal semantics of the macro is                             The GKB editor (Paley et al, 1997) is an example of a
                                                              graphical ontology editor. SNARK (Stickel et al, 1994),
(=>                                                           and Otter (Kalman, 2001) are examples of theorem provers.
  (equal ?X ?Y)                                               Many theorem provers have been tested on the
  (eval ?X ?Y))
                                                              CADE/TPTP (Sutcliffe et al, 2002) competitions.
(=>                                                           However, limited efforts have been made to apply formal
  (equal ?X ?Y)                                               first order theorem proving to expert system and common
  (equalAssigned ?X ?Y))                                      sense reasoning on large knowledge bases. The Sigma
                                                              system brings all these types of components together, and
                                                              in the process, addresses a number of research issues that
                       Ontology                               have not been evident in use of these separate components.
   The Sigma system has been designed to be as
independent of a particular ontology as possible, but there
are many features that are available when a standard
                                                                               Acknowledgements
ontology is used. Current features are primarily to do with   The author would like to thank Randy Schulz for his efforts
error checking. Knowing a standard method for defining        on early versions of this system.
argument type restrictions, class-subclass relations and
                      References                             Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and
                                                             Miller, K. “ Introduction to WordNet: An On-line Lexical
Baader, F., Calvanese, D., McGuinness, Nardi, D., and        Database.” 1993.
Patel-Schneider, P., (eds). (2003) The Description Logic
Handbook Theory, Implementation and Applications, 574        Murray, W., Pease, A., and Sams, M. (2003). Applying
pp. ISBN: 0521781760                                         Formal Methods and Representations in a Natural
                                                             Language Tutor to Teach Tactical Reasoning. To appear.
Carreno, V., Munoz, C., and Tahar, S., (eds.) (2002).
Theorem Proving in Higher Order Logics. Proceedings of       Niles, I & Pease A., (2001) “ Towards A Standard Upper
the 15th International Conference, TPHOLs 2002,              Ontology.”       In Proceedings of Formal Ontology in
Hampton, VA, USA, August 20-23, 2002. Springer-              Information Systems (FOIS 2001), October 17-19,
Verlag.                                                      Ogunquit,     Maine,    USA,    pp   2-9.   See   also
                                                             http://ontology.teknowledge.com
Farquhar, A., Fikes, R., and Rice, J., (1996). The
Ontolingua Server: a Tool for Collaborative Ontology         Niles, I., & Pease, A., (2003). Mapping WordNet to the
Construction.        Proceedings of Tenth Knowledge          SUMO Ontology.             To appear.         See also
Acquisition for Knowledge-Based Systems Workshop.            http://ontology.teknowledge.com:8080/rsigma/nilesWordN
Available                                            at      et.pdf
http://ksi.cpsc.ucalgary.ca/KAW/KAW96/farquhar/farquha
r.html#RTFToC15                                              Paley, S.M., Lowrance, J.D., and Karp, P.D. (1997) "A
                                                             Generic Knowledge-Base Browser and Editor," In
Gennari, J., M. A. Musen, R. W. Fergerson, W. E. Grosso,     Proceedings of the 1997 National Conference on Artificial
M. Crubézy, H. Eriksson, N. F. Noy, S. W. Tu (2002). The     Intelligence, Providence, RI.
Evolution of Protégé: An Environment for Knowledge-
Based Systems Development. Stanford SMI technical            Pease, A., and Li, J., (2003). Agent-Mediated Knowledge
report SMI-2002-0943 .http://smi-web.stanford.edu/pubs       Engineering Collaboration, in Proceedings of the AAAI
/SMI_Abstracts/SMI-2002-0943.html                            Spring Symposium on Agent Mediated Knowledge
                                                             Management. Stanford, CA March 24-26. To appear.
Genesereth, M., (1991). ``Knowledge Interchange Format' '
                                                        ,
In Proceedings of the Second International Conference on     Pease, A., Niles, I., Li, J., (2002), The Suggested Upper
the Principles of Knowledge Representation and Reasoning,    Merged Ontology: A Large Ontology for the Semantic Web
Allen, J., Fikes, R., Sandewall, E. (eds), Morgan Kaufman    and its Applications, in Working Notes of the AAAI-2002
Publishers, pp 238-249.                                      Workshop on Ontologies and the Semantic Web.
                                                             http://projects.teknowledge.com/AAAI-2002/Pease.ps
Hayes, P., and Menzel, C., (2001). A Semantics for
Knowledge Interchange Format, in Working Notes of the        Schlechta, K., (1997). Nonmonotonic Logics: Basic
IJCAI-2001 Workshop on the IEEE Standard Upper               Concepts, Results, and Techniques (Lecture Notes in
Ontology.                                                    Artificial Intelligence) by Karl 243 pp. Springer Verlag
                                                             pub. ISBN: 3540624821
Hendler, J., and McGuinness, D., (2000), The DARPA
Agent Markup Language, IEEE Intelligent Systems, 15 (6)      Stickel, M., R. Waldinger, M. Lowry, T. Pressburger, and I.
(November),                67-73.               Available:   Underwood. (1994) Deductive composition of astronomical
http://www.ksl.stanford.edu/people/dlm/papers/ieee-          software from subroutine libraries. Proceedings of the
daml01-abstract.html                                         Twelfth International Conference on Automated Deduction
                                                             (CADE-12), Nancy, France, June 1994, 341-355. See also
Kalman, J. K. (2001) Automated Reasoning with OTTER.         http://www.ai.sri.com/~stickel/snark.html.
Rinton Press.
                                                             Sutcliffe G., Suttner C.B., Pelletier F.J. (2002), The IJCAR
Lenat, D. (1995). "Cyc: A Large-Scale Investment in          ATP System Competition, Journal of Automated Reasoning
Knowledge Infrastructure." Communications of the ACM         28(3), pp.307-320.
38, no. 11 (November).
                                                             Swartout, B., Patil, R., Knight, K. and Russ, T. (1996).
MacGregor, R., (1991). The Evolving Technology of            Ontosaurus: a tool for browsing and editing ontologies,
Classification-Based Knowledge Representation Systems.       Gaines, B.R. and Musen, M.A., Ed. Proceedings of Tenth
In Principles of Semantic Networks: Explorations in the      Knowledge        Acquisition   Workshop.  pp.69-1-69-12
Representation of Knowledge, ed J. Sowa, 385-400. San        (http://ksi.cpsc.ucalgary.ca/KAW/KAW96/swartout/ontosa
Francisco, Calif., Morgan Kaufmann                           urus_demo.html)