=Paper=
{{Paper
|id=Vol-71/paper-11
|storemode=property
|title=The Sigma Ontology Development Environment
|pdfUrl=https://ceur-ws.org/Vol-71/Pease.pdf
|volume=Vol-71
}}
==The Sigma Ontology Development Environment==
The Sigma Ontology Development Environment
Adam Pease
Teknowledge
1800 Embarcadero Rd
Palo Alto CA 94303
650 424-0500
650 493-2645
apease@teknowledge.com
keywords: natural language, ontology
Abstract
Abstract: We discuss the development of an environment
for formal knowledge engineering. The Sigma system is an Browsing and Editing
advance over previously developed systems in that it
integrates a number of modern ontology development tools,
The Sigma system contains several different sorts of
which has motivated a number of research issues. Primary browsers for viewing formal knowledge bases. The most
components include an ontology browsing and editing basic component is a term browser, which presents all the
environment, a first order logic inference system and a statements in which a particular term appears. The
natural language to logic translator. Although largely statements are sorted by argument position and then the
independent of any particular ontology, it supports a number appearance of the term in rules and non-rule statements are
of publicly available formal ontologies.
then shown. All the statements are hyperlinked to the terms
that appear in them.
Two types of tree browser are provided. One provides
Introduction an automatic graph layout. Another shows a textual
hierarchy. The user can chose the term to start with, the
Human knowledge is complex, and human language is
number of “levels” that should be presented from the term,
very expressive. Computer systems that process natural
and the binary relation to chose as the predicate that links
language have been limited to a shallow level of
understanding by the very complexity of the information the different nodes in the graph. For example, if the user
that they are expected to handle. Conversely, systems that asks for a graph of the term IntentionalProcess and for
manage information in databases have been left to reason the subclass relation, the system will go “up” the graph to
either in very narrow and pre-determined ways, or to display the term Process, and down the graph to display
perform processing tasks that must be interpreted by the terms Keeping, Guiding, Maintaining etc. The user
humans. can ask for more levels up or down the graph. Note that
It is desirable to employ computer languages for any relation can be chosen so for example a presentation of
representing knowledge that are as expressive as possible, partonomies, or attribute hierarchies is also supported by
and as close to the expressiveness of human language. choosing the relations part and subAttribute
Expressiveness does have a computational burden. In the respectively.
opinion of the author, the best compromise available for The editing system is currently rather rudimentary,
many knowledge representation efforts is first order logic. consisting of the ability to type formulae and have some
While there are other choices, such as description logic and simple syntax and other error checking performed. We
higher order logic, they represent departures from a middle
plan to offer a frame-based editing system similar to, or
ground. First order logic is undecidable, but can be made
incorporating open-source components from, the Protégé
tractable for certain kinds of reasoning, which will be
detailed later. Description logic (Baader, et al, 2003) has (Gennari et al, 2002) system.
some very attractive formal properties, but is significantly We have developed some more sophisticated tools for
less expressive than first order logic, and therefore limits collaborative knowledge engineering. The System for
the knowledge engineer’s ability to express the semantic Collaborative Open Ontology Production (SCOOP) (Pease
content of statements found in normal human discourse. & Li, 2003) supports an automated workflow process for
Higher order logic (Carreno et al, 2002) on the other hand development of ontologies and resolution of conflicts
is much harder to reason with. among ontologies. The SCOOP system employs a theorem
The Sigma system was designed to integrate several prover to detect inconsistencies among ontologies.
different kinds of tools for working with knowledge in first SCOOP addresses several types of inconsistencies. The
order logic. first we term vertical inconsistency. This refers to when a
set of ontologies are loaded and one ontology has a singular. Although different tense and number can be
contradiction with another ontology on which it depends. accepted, the system paraphrases such sentences into a
Inconsistency of this type is never allowed, because from a present tense singular form. Ambiguous word choices
contradictory knowledge base, any proposition can be default to the most popular sense of the word, with an
concluded to be true. SCOOP enforces a workflow process escape mechanism for the user to select a different sense.
that requires a resolution to such a situation. Anaphoric references are handled with a simple mechanical
The second kind of inconsistency we term horizontal algorithm. Some kinds of quantification (every, some, all)
can be handled.
inconsistency. This refers to a situation in which
The current system is capable of taking simple English
knowledge products created by different knowledge sentences and translating them into KIF, using terms from
engineers, and which do not have a mutual dependency, are our SUMO upper ontology (described below). We have
mutually contradictory. SCOOP alerts developers to such a mapped all 100,000 WordNet (Miller, et al, 1993) word
situation, but does not require that it be eliminated. It is senses (synsets) to SUMO, one at a time, by hand over the
possible for two knowledge bases to represent different period of a year (Niles & Pease, 2003). This endows our
perspectives, which are contradictory, but locally valid. natural language translation system with a very large
We employ the KIF (Genesereth, 1991) language and vocabulary.
have a translator to and from DAML (Hendler & Our approach is presented more fully in (Murray, Pease
McGuinness, 2000). A translator to and from Protégé & Sams, 2003).
(Gennari et al, 2002) is partially completed at this time.
Language Generation Inference
Each statement can also be presented as a natural One basic issue is that of allowing variables in the
language paraphrase similar to (Sevcenko, 2003). Note predicate position. While the general case of this results in
that we also use Sevcenko’s language templates for English expressions being beyond first order logic, there is a
and Czech. Each term in an ontology can be given a slightly more restricted case, which is first order. A
statement with a variable in the predicate position is only
natural language presentation form, indexed by the human
higher order if a quantifer ranges of all possible relations,
language. For example, the SUMO term
rather than the finite set of relations in a particular
DiseaseOrSyndrome can be presented as “ disease or knowledge base. Since prepending another relation in front
syndrome” and an Italian presentation would be shown as of every statement makes everything first order for such a
"malattia o sindrome". Combined with the English restricted practical case, this therefore must mean that the
understanding mechanisms described below, this gives us a statement wasn’t really higher-order in the first place.
very rudimentary language translation capability. There is also an issue with prohibiting statements as
Although presentation of terms is straightforward, arguments to relations. When arguments to relations are
presentation of statements is more complicated, and statements, then such a statement is higher order. We could
roughly patterned after C-language printf statements. For either have a quote operator or define a Statement class and
example the relation “ part” , which states that one object is require all predicates have their argument types defined
a part of another, has the corresponding language (which would allow us to know in advance whether an
generation template “ %1 is %n a part of %2” . The first argument is a Statement, and therefore needs to be
argument to the logical relation is substituted for %1 etc. implicitly quoted). A statement that has this issue is
Note that this substitution is recursive, so that complex (believes John (likes Bob Mary))
statements with nested formulae can be translated
Currently, such statements are quoted, and therefore
effectively. The %n signifies that if the statement is become opaque first-order objects rather than statements.
negated, that the negation operator for the appropriate Another issue is row variables, for use in variable arity
human language should be inserted in that position. relations. One proposed version of KIF (Hayes & Menzel,
We currently have language templates for English, 2001) has what are alternately called row, or sequence
Czech, German and Italian. Some work has been done on variables. They are denoted by the ’@’ symbol in KIF
Hindi, Telugu, Tagalog and Russian. statements. They are analogous to the lisp @REST
variable. This is not first order if the number of arguments
Natural Language Understanding it can "swallow up" is infinite. However, if row variables
have a definite number of arguments, it can be treated like a
We take a restricted language approach to natural
macro, and becomes first order. For example,
language understanding. Deep understanding of
unrestricted human languages is too hard with present (=>
technology. Our approach is to create a restricted version (and
(subrelation ?REL1 ?REL2)
of English, which is grammatical but more limited than true (holds ?REL1 @ROW))
natural language. Statements must be present tense (holds ?REL2 @ROW))
Of course, this is only easy in the specific case where
would become two rules have opposite conclusions and the antecedent of
one subsumes the other. It’s possible that in easy cases we
(=> might be able to have Sigma even generate the
(and
(exception... clause that the macro will use.
(subrelation ?REL1 ?REL2)
(holds ?REL1 ?ARG1)) The algorithm for the macro simply would be, given the
(holds ?REL2 ?ARG1)) (exception... statement, to negate the antecedent of the
and add it as a conjunction to the
(=> . Like other macros, this has
(and
(subrelation ?REL1 ?REL2)
implications for proof presentation that are also unresolved.
(holds ?REL1 ?ARG1 ?ARG2))
(holds ?REL2 ?ARG1 ?ARG2)) Proof Presentation
Sigma includes several options for proof presentation.
etc. Despite the fact that most textbooks present proofs as linear
Also note that this "macro" style expansion has the structures, proofs are trees. The same sub-proof may be
problem that unlike the true semantics of row variables, used several times in reaching different intermediate
that it is not infinite. If the macro processor only expands conclusions. While this is a byproduct of automated
to five variables, there is a problem if the knowledge reasoning, it is rarely useful to the human user. Therefore,
engineer uses a relation with six. Because of that, Sigma’s Sigma eliminates redundant paths of reasoning in order to
syntax checker must prohibit relations with more arguments create a linear proof.
than the row variable preprocessor expands to. Sigma also has options for how it presents multiple
proofs. Often, the same proof, with the same proof steps,
Default reasoning can be reached via a different search order. This means
that although two tree-structured proofs may be different,
Given the example KB: they may work out to be identical when redundant paths are
removed and a linear proof structure generated. Sigma
(=> supports an option for hiding such duplicate proofs.
(instance ?X Bird) On additional option is to suppress proofs that may have
(canFly ?X))
different steps but which lead to the same answer.
(=> Sometime it is useful to a knowledge engineer to see these
(instance ?X Penguin) alternate proofs, and sometime not.
(not
(canFly ?X)))
Reasoning with Equality
(subclass Penguin Bird) One problem in a first order theorem prover with
procedural attachments for arithmetic is that the equality
We would like the second rule to override the first. The operator masks normal unification and inference. Currently
general case of default reasoning is very difficult Sigma converts the SUMO term ’equal’ to ’=’ when sending
(Schlechta, 1997). However, we believe that there is an assertions and queries to the theorem prover and then
easy way to address the most common case and get the converts back for proof formatting. We are experimenting
utility we need. Given a relation with changing the name to ’eval’ and mapping that string to
the theorem prover’s '='.
(exception )
Our current idea is to have three predicates, one, which
handles evaluation, one, which handles inference, and
we can implement a macro that modifies the first formula another, which combines the two.
above to exclude the exception specified and generate the
KB equalAsserted eval
(=>
(and equal
(instance ?X Bird)
(not (instance ?X Penguin))
(canFly ?X)) The following examples of assertions and queries should
make clear how this will work.
(=>
(instance ?X Penguin) assert:
(not
(equal (CardinalityFn Continent) 7)
(canFly ?X)))
macro expands to:
(subclass Penguin Bird) (eval (CardinalityFn Continent) 7)
(equalAssigned (CardinalityFn Continent) 7)
documentation allows us to alert the user when such
query: statements are conflicting or not present.
(eval (CardinalityFn Continent) ?X) We employ the Suggested Upper Merged Ontology
result: (SUMO) (Niles & Pease, 2001) as the system’s standard
(CardinalityFn Continent) ontology. SUMO has been released to the public for free
th
since its first version in December of 2000. Now in its 47
query: version, the SUMO has been reviewed by hundreds of
(equalAssigned (CardinalityFn Continent) ?X) people and subject to formal validation with a theorem
prover, to ensure that it is free of contradictions. The
result:
SUMO contains roughly 1000 terms and 4000 statements,
7
of which 750 are rules. As mentioned above, it has been
mapped by hand to the WordNet lexicon, which has acted
query:
as an additional check on completeness and coverage.
(equal (CardinalityFn Continent) ?X)
A number of domain ontologies have been created that
macro expands to: extend SUMO and can be used in the Sigma system. They
(eval (CardinalityFn Continent) ?X) include
(equalAssigned (CardinalityFn Continent) ?X)
• A Quality of Service ontology, covering computer
result: systems and networks
(CardinalityFn Continent) • An ecommerce services ontology
7 • An ontology of biological viruses
• A financial ontology
query: • An ontology of terrain features
(eval (AdditionFn 2 3) ?X) • Ontologies of weapons of mass destruction and
result: terrorism
5 • An ontology of government and governmental
organizations
query: • Taxonomies of the periodic table of the elements
(equalAssigned (AdditionFn 2 3) ?X) and industry types
result:
[no]
Related Work
query: There have been a number of ontology editing
(equal (AdditionFn 2 3) ?X) environments created including Ontolingua (Farquhar et al,
macro expands to: 1996), Ontosaurus (Swartout et al, 1996), Protégé (Gennari
(eval (AdditionFn 2 3) ?X) et al, 2002), and Cyc (Lenat, 1995). Ontosaurus and Cyc
(equalAssigned (AdditionFn 2 3) ?X) both have inference components. Cyc is the only system to
result: contain a standard ontology. Cyc also contains a natural
5 language component although little or no public
information about that work is available.
The formal semantics of the macro is The GKB editor (Paley et al, 1997) is an example of a
graphical ontology editor. SNARK (Stickel et al, 1994),
(=> and Otter (Kalman, 2001) are examples of theorem provers.
(equal ?X ?Y) Many theorem provers have been tested on the
(eval ?X ?Y))
CADE/TPTP (Sutcliffe et al, 2002) competitions.
(=> However, limited efforts have been made to apply formal
(equal ?X ?Y) first order theorem proving to expert system and common
(equalAssigned ?X ?Y)) sense reasoning on large knowledge bases. The Sigma
system brings all these types of components together, and
in the process, addresses a number of research issues that
Ontology have not been evident in use of these separate components.
The Sigma system has been designed to be as
independent of a particular ontology as possible, but there
are many features that are available when a standard
Acknowledgements
ontology is used. Current features are primarily to do with The author would like to thank Randy Schulz for his efforts
error checking. Knowing a standard method for defining on early versions of this system.
argument type restrictions, class-subclass relations and
References Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and
Miller, K. “ Introduction to WordNet: An On-line Lexical
Baader, F., Calvanese, D., McGuinness, Nardi, D., and Database.” 1993.
Patel-Schneider, P., (eds). (2003) The Description Logic
Handbook Theory, Implementation and Applications, 574 Murray, W., Pease, A., and Sams, M. (2003). Applying
pp. ISBN: 0521781760 Formal Methods and Representations in a Natural
Language Tutor to Teach Tactical Reasoning. To appear.
Carreno, V., Munoz, C., and Tahar, S., (eds.) (2002).
Theorem Proving in Higher Order Logics. Proceedings of Niles, I & Pease A., (2001) “ Towards A Standard Upper
the 15th International Conference, TPHOLs 2002, Ontology.” In Proceedings of Formal Ontology in
Hampton, VA, USA, August 20-23, 2002. Springer- Information Systems (FOIS 2001), October 17-19,
Verlag. Ogunquit, Maine, USA, pp 2-9. See also
http://ontology.teknowledge.com
Farquhar, A., Fikes, R., and Rice, J., (1996). The
Ontolingua Server: a Tool for Collaborative Ontology Niles, I., & Pease, A., (2003). Mapping WordNet to the
Construction. Proceedings of Tenth Knowledge SUMO Ontology. To appear. See also
Acquisition for Knowledge-Based Systems Workshop. http://ontology.teknowledge.com:8080/rsigma/nilesWordN
Available at et.pdf
http://ksi.cpsc.ucalgary.ca/KAW/KAW96/farquhar/farquha
r.html#RTFToC15 Paley, S.M., Lowrance, J.D., and Karp, P.D. (1997) "A
Generic Knowledge-Base Browser and Editor," In
Gennari, J., M. A. Musen, R. W. Fergerson, W. E. Grosso, Proceedings of the 1997 National Conference on Artificial
M. Crubézy, H. Eriksson, N. F. Noy, S. W. Tu (2002). The Intelligence, Providence, RI.
Evolution of Protégé: An Environment for Knowledge-
Based Systems Development. Stanford SMI technical Pease, A., and Li, J., (2003). Agent-Mediated Knowledge
report SMI-2002-0943 .http://smi-web.stanford.edu/pubs Engineering Collaboration, in Proceedings of the AAAI
/SMI_Abstracts/SMI-2002-0943.html Spring Symposium on Agent Mediated Knowledge
Management. Stanford, CA March 24-26. To appear.
Genesereth, M., (1991). ``Knowledge Interchange Format' '
,
In Proceedings of the Second International Conference on Pease, A., Niles, I., Li, J., (2002), The Suggested Upper
the Principles of Knowledge Representation and Reasoning, Merged Ontology: A Large Ontology for the Semantic Web
Allen, J., Fikes, R., Sandewall, E. (eds), Morgan Kaufman and its Applications, in Working Notes of the AAAI-2002
Publishers, pp 238-249. Workshop on Ontologies and the Semantic Web.
http://projects.teknowledge.com/AAAI-2002/Pease.ps
Hayes, P., and Menzel, C., (2001). A Semantics for
Knowledge Interchange Format, in Working Notes of the Schlechta, K., (1997). Nonmonotonic Logics: Basic
IJCAI-2001 Workshop on the IEEE Standard Upper Concepts, Results, and Techniques (Lecture Notes in
Ontology. Artificial Intelligence) by Karl 243 pp. Springer Verlag
pub. ISBN: 3540624821
Hendler, J., and McGuinness, D., (2000), The DARPA
Agent Markup Language, IEEE Intelligent Systems, 15 (6) Stickel, M., R. Waldinger, M. Lowry, T. Pressburger, and I.
(November), 67-73. Available: Underwood. (1994) Deductive composition of astronomical
http://www.ksl.stanford.edu/people/dlm/papers/ieee- software from subroutine libraries. Proceedings of the
daml01-abstract.html Twelfth International Conference on Automated Deduction
(CADE-12), Nancy, France, June 1994, 341-355. See also
Kalman, J. K. (2001) Automated Reasoning with OTTER. http://www.ai.sri.com/~stickel/snark.html.
Rinton Press.
Sutcliffe G., Suttner C.B., Pelletier F.J. (2002), The IJCAR
Lenat, D. (1995). "Cyc: A Large-Scale Investment in ATP System Competition, Journal of Automated Reasoning
Knowledge Infrastructure." Communications of the ACM 28(3), pp.307-320.
38, no. 11 (November).
Swartout, B., Patil, R., Knight, K. and Russ, T. (1996).
MacGregor, R., (1991). The Evolving Technology of Ontosaurus: a tool for browsing and editing ontologies,
Classification-Based Knowledge Representation Systems. Gaines, B.R. and Musen, M.A., Ed. Proceedings of Tenth
In Principles of Semantic Networks: Explorations in the Knowledge Acquisition Workshop. pp.69-1-69-12
Representation of Knowledge, ed J. Sowa, 385-400. San (http://ksi.cpsc.ucalgary.ca/KAW/KAW96/swartout/ontosa
Francisco, Calif., Morgan Kaufmann urus_demo.html)