=Paper=
{{Paper
|id=Vol-440/paper-11
|storemode=property
|title=Toward an Open-Source Foundation Ontology Representing the Longman’s Defining Vocabulary: The COSMO Ontology OWL Version
|pdfUrl=https://ceur-ws.org/Vol-440/paper11.pdf
|volume=Vol-440
|dblpUrl=https://dblp.org/rec/conf/oic/Cassidy08
}}
==Toward an Open-Source Foundation Ontology Representing the Longman’s Defining Vocabulary: The COSMO Ontology OWL Version==
Toward an Open-Source Foundation Ontology Representing the Longman’s
Defining Vocabulary: The COSMO Ontology OWL Version
Patrick Cassidy
MICRA, Inc., Plainfield, NJ
cassidy@micra.com
Abstract - The COSMO foundation ontology is being developed different ontology groups that will accept the ontology
to test the hypothesis that there are a relatively small number can be maximized by keeping the foundation ontology
(under 10,000) of primitive ontology elements that are sufficient
to serve as the building blocks for any number of more as small as possible without compromising its ability to
specialized ontology elements representing concepts and terms support logical representation of terms and concepts in
used in any computer application. Finding evidence for this any application domain. In the COSMO approach , that
hypothesis would suggest that a promising tactic to achieve could be achieved by discovering the smallest
Semantic Interoperability among computer applications is to
focus effort on the common foundation ontology to that
inventory of fundamental ontology elements,
ontology that contains those primitive elements. This will representing the minimal essential primitive concepts
constrain the size of the ontology on which agreement is that are needed to build representations of any more
required, to the minimum that will support accurately relating complex concept.
domain and application ontologies to each other. The rationale,
methodology and current status of this project is reported here.
II. BACKGROUND TO THE COSMO APPROACH
Index Terms – Foundation ontology, conceptual primitives,
COSMO, semantic interoperability, common ontology, ontology A. The Notion of Conceptual Primitives
mapping, Longman, defining vocabulary.
The approach proposed here relies on the observation
I. INTRODUCTION that communication among agents (human or
automated) depends on the agents sharing some
Information communicated and analyzed by the common set of internally understood concepts, labeled
intelligence community is highly diverse, including by an agreed set of symbols such as words in human
technical, social and psychological concepts. The languages, or element names in databases. Wherever a
challenge of using automatic techniques for integrating particular community uses concepts not already among
such information will require adoption of an ontology the known concepts of other communities, information
that is capable of unambiguously representing the full sharing requires the first community to use a common
range of knowledge that people communicate. There is set of defining concepts to construct definitions of the
as yet no consensus on how to structure that ontology. unknown concepts understandable to the other
This paper describes one approach to overcome the communities. In this manner communicating agents
lack of agreement caused by multiple fundamentally can accurately transfer information on topics familiar
different approaches to foundation ontology or initially unfamiliar to other agents. Information
development. The proposed approach depends on transfer using human languages is facilitated by the
three factors: (1) to develop a foundation ontology that existence of a relatively small vocabulary of basic
is effective as a standard of meaning for words, representing those commonly understood
communication among many applications, it is not concepts, that can be used to create linguistic
necessary to achieve universal agreement among definitions of any specialized concept. Research in
ontology developers about the structure of the Linguistics has explored by experimental techniques
foundation ontology; it is only necessary to build a the number and identity of the common primitive
sufficiently large user group that third-party vendors concepts that are used in linguistic communication
will have incentive to develop utilities making the among people speaking different languages. Some of
ontology easier to use, and applications that that work, summarized by Goddard[1], has suggested
demonstrate the usefulness of the ontology for practical that as few as 60 semantic primitives are adequate to
purposes. (2) by allowing multiple logically compatible construct definitions of a very large number of
views for representing the same entities, and providing concepts. A less systematic but more comprehensive
translation utilities between them, many of the demonstration of the power of primitive concepts to
differing preferences for representing entities can be suffice for construction of definitions of many words is
accommodated in the same ontology. (3) the number of found in some English-language dictionaries such as
the Longman [2] that use a Defining Vocabulary of and accurately interpretable in both systems. The
basic words with which to define all of the entries in combination of the ontologies of the two systems in
the dictionary. The Longman Defining Vocabulary effect creates a single merged ontology common to
(hereafter LDV) contains 2148 words, but an both systems. In that situation, the same input data in
investigation [3], [4], [5] has shown that even fewer both systems will produce the same inferences.
words are needed to define (recursively) all of the Different data in the two systems will create some
Longman entries. For cases where a proposed different inferences, but those will not be logically
definition of a new word uses words not already in the inconsistent if the data is not inconsistent. For a
defining vocabulary, the Defining Vocabulary tactic proper automated merger of the two ontologies, it will
requires that the unrecognized word itself be defined be necessary to have utilities that can automatically
by use of the basic Defining Vocabulary. The answer recognize identical elements created in the two
appears to be that, for the Longman, words recursively separate local ontologies, and to detect inconsistencies
defined in such a manner “ground out” using a basic if they exist. But this tactic for interoperability avoids
vocabulary of 1433 words representing 3200 word the impossible task of automatically interpreting
senses. information in an external ontology that is based on
fundamentally different (usually undocumented)
The success of the linguistic defining vocabulary for assumptions about how to represent the same intended
dictionaries suggests that a similar tactic could be meanings of terms and concepts.
effective for automated information transfer among
computer systems. For automated systems, the B. The Current Absence of a Conceptual Standard
“Defining Vocabulary” would take the form of a
foundation ontology having an inventory of basic To function as a conceptual standard that will enable
concept representations that is sufficient to create semantic interoperability, i.e. permit computers to
representations of any new concept, by combinations reason accurately and automatically with transferred
of the basic elements. Communities using such a information, the syntactic format for a common
“Conceptual Defining Vocabulary (CDV)” (i.e. a standard must have at least the expressivity of First-
common foundation ontology) would be able to pursue Order Logic (FOL), so as to permit logical inference
their own interests using any local terminology or using rules expressing domain knowledge. Several
ontology that suits their purposes, and still foundation ontologies, such as OpenCyc[6], SUMO[7],
communicate their information accurately in a form DOLCE[8], and BFO[9], have been developed that
suitable for automated inferencing, by translating the have this technical capability. Other knowledge
local information into the terminology of the common classifications such as NIEM[10] and the DoD Core
foundation ontology. Limiting the core foundation Taxonomy[11] have less expressiveness. None of
ontology to the elements needed for a CDV will these projects has adopted the tactic of creating a CDV,
minimize the effort required to perform the and none has been recognized as a default standard for
translations, while ensuring that accurate translations application builders concerned with specific topics and
are possible. The question remains whether the indifferent to the nuances of representation at the
linguistic Defining Vocabulary examples can be abstract levels. The reasons for lack of wide adoption
adapted to the more precise requirements of vary. The complexity of each of the existing
representing terms and concepts in a logical format, foundation ontologies presents a steep learning curve
suitable for automated reasoning. which requires a strong motivation to impel potential
users to spend the required time. In the case of Cyc,
The essential principle of such a tactic for Semantic much of the content (such as the over 1000 specialized
Interoperability is that, when the separately developed reasoning modules) is still proprietary and cannot be
ontologies of two different systems both use the same part of an open-source project that could include
CDV to specify the structures of their ontology desired components from many non-Cycorp sources.
elements, then accurate information sharing can be Development of an effective open-source natural-
achieved, even if the two systems each have some language interface to the ontology is also desirable, to
separately-defined ontology elements not in the other, make learning and use convenient. None of the
by sharing the specifications of the ontology elements existing foundation ontologies has such an interface.
of each that are not in the other. Since the ontology Without publicly available examples showing the
elements of each system are built from the same benefits of using a complex ontology, a specialized
primitive elements of the CDV, they will be properly application developer without a need to interoperate
outside the local community is strongly tempted to III. THE COSMO PROJECT
develop a specialized ontology that is not linked to a
foundation ontology. As a result, specialized A. Origin
ontologies with no linkages to any of the major
foundation ontologies have proliferated. The COSMO ontology [12] is currently being
The above considerations suggest the following developed to serve as a fully public foundation
desiderata for a foundation ontology that can be ontology that contains representations of all of the
adopted and used by a large enough community to 2100 words in the LDV, with the intention of serving
serve as a de facto standard of meaning: as a broadly acceptable CDV. COSMO (COmmon
• the core set of concept representations required Semantic MOdel) was initiated in 2005 [13] as a
to use the ontology effectively should be as small project of the Ontology and Taxonomy Coordinating
as possible, but sufficient to support specification Working Group [14], a working group of the Federal
of any specialized concept meaning Semantic Interoperability Community of Practice.
• the ontology should be fully public and The origin of COSMO is discussed in more detail in
developed by an open procedure, so as to permit [15]. In early 2008 the project adopted the current goal
alternative logically compatible views of entities;
it should be maintained by an open process and of representing the LDV. Developing the ontology as a
allow additions as needed to represent new CDV promises to furnish a foundation ontology that
topics; has all of the elements (types, relations) needed to
• there should be a powerful intuitive natural build representations of any concept of interest in any
language interface, capable of determining application, yet be small enough to be usable without
whether (1) representations of specific concepts an extended learning period. The goal in effect is to
are already present in the core foundation identify the smallest foundation ontology that is
ontology or in some public extension, or (2) if sufficient to serve as the basis for broad semantic
not, to list the elements in the ontology closest in
meaning interoperability. Such a foundation ontology will
• the ontology format should have the contain representations of the essential units of
expressiveness of at least FOL meaning that can be combined to represent any
• there should be several open-source substantive specialized term or concept of interest in applications.
applications demonstrating the usefulness of the
ontology B. Project phasing
• extensions to the core, with logical specifications
of concepts based on combinations of the core COSMO is proceeding in several phases. The first
concept representations, should be maintained phase, expected to be complete within 3 months, is to
and freely available, in the manner of Java create a representation of all of the words in the LDV,
library packages, to minimize the need for in an OWL format [16]. The expressiveness of at least
creating new definitions.
pseudo-second-order logic (a FOL in which variables
can represent relations or assertions) is required for
In order have a de facto standard of meaning, it is not
necessary to have universal agreement to use only one some applications such as Natural Language
foundation ontology; it is only necessary that some understanding. The plan is therefore to maintain an
foundation ontology have a user community large OWL version, but convert it automatically to a
enough for third-party vendors to have incentive to Common-Logic (CL) compliant language such as KIF
develop utilities that make the standard easier to use, or IKL. This will require representing rules, functions,
and to develop applications that demonstrate its utility. and higher-arity relations in the OWL format.
It should also have a sufficiently wide community of
users that research groups will have an incentive to use When the COSMO ontology has the full set of LDV
it as the standard of meaning through which they can
transfer information from diverse separate applications, words represented, it will be tested for its ability to
each using different forms of intelligent information serve as a CDV, by creating representations of several
processing. sets of specialized concepts and discovering how many
new fundamental concept representations need to be
added to the foundation ontology. It is estimated that
this first version will contain over 7500 types (OWL
classes), over 700 relations, and over 1000 restrictions
that constrain the meanings of the elements.
The COSMO itself is not expected to be adopted concept of “mother” which is represented in some
without change as a common foundation ontology. ontologies only as a relation (‘isTheMotherOf’), and in
The main purpose of this project is to demonstrate the others as the type (class) ‘Mother’. The COSMO
feasibility a Conceptual Defining Vocabulary as an OWL version can include both representations, but the
effective basis for semantic interoperability. A CDV automatic conversion of such alternative views will
that is widely accepted is likely to arise only from a often require that rules be used, and will be possible
collaborative effort by a broad consortium of ontology only in the more expressive common-logic format.
builders and users, as well as developers of other Using an ontology representing multiple views could
knowledge representation constructs such as the lead to inference that is less efficient than with a more
NIEM. More than one CDV may eventually find wide restrictive representation. However, it is expected that
use, but the number of such ontologies is likely to be multiple alternative representations will be needed only
smaller than the number of operating systems, because for interoperability among applications, and individual
the greater number and complexity of primitive data local applications will not use the full ontology, but
structures required for a CDV is larger than those will select out only those elements required for the
manipulated by operating systems. local application. In this way, full semantic
interoperability can be achieved among applications,
C. Criterion for Success without sacrifice of efficiency.
The criterion for determining whether the COSMO can REFERENCES
serve as a starting CDV will be based on the number of
[1] Cliff Goddard, Bad Arguments Against Semantic Primitives,
new primitive ontology elements that must be added to Theoretical Linguistics, Vol. 24 (1998), No. 2-3: 129-156. (Available
the COSMO in order to represent groups of new terms online at: http://www.une.edu.au/bcss/linguistics/nsm/pdfs/bad-
or concepts from additional specialized topics. It is arguments5.pdf)
[2] Longman Dictionary of Contemporary English, Longman Group,
expected that some additional primitive elements Essex, England (New Edition,1987)
(types, relations) will be need to be added to the [3] Guo, Cheng-ming (1989) Constructing a machine-tractable dictionary
COSMO as knowledge in diverse fields is represented. from "Longman Dictionary of Contemporary English" (Ph. D.
Thesis), New Mexico State University.
To function as an effective CDV, what is required is [4] Guo, Cheng-ming (editor) Machine Tractable Dictionaries: Design
that the number of such new primitives added to the and Construction, Ablex Publishing Co., Norwood NJ (1995).
[5] Yorick Wilks, Brian Slator, and Louise Guthrie, Electric Words:
ontology will decrease asymptotically as each Dictionaries, Computers, and Meanings, MIT Press, Cambridge Mass
successive block (e.g. of 500) of new terms is (1996).
represented using the foundation ontology. Such [6] OpenCyc: http://opencyc.org/
[7] http://www.ontologyportal.org/
statistical evidence that there is some limit to the [8] See: http://www.loa-cnr.it/DOLCE.html
number of new terms that must be added will help [9] Pierre Grenon, BFO in a Nutshell: A Bi-categorial Axiomatization of
answer the two questions, of whether there is any limit BFO and Comparison with DOLCE, IFOMIS report 06/2003 (2003).
Available at: http://www.ifomis.uni-
to the number of basic elements required for the CDV, saarland.de/Research/IFOMISReports/IFOMIS%20Report%2006_200
and if so, approximately what is that number. 3.pdf.
See also : http://www.ifomis.uni-saarland.de/bfo/
[10] See: http://www.niem.gov/
D. Allowance for Multiple Viewpoints [11] DoD Core Taxonomy: http://www.dtic.mil/dtic/annualconf/conf05-
Dickert.ppt
[12] http://micra.com/COSMO/COSMO.owl
Essential to its role in enabling semantic [13]
interoperability is that COSMO must be inclusive of all http://semanticommunity.wik.is/Federal_Semantic_Interoperability_C
logically compatible views, so as to permit translations ommunity_of_Practice/Work_Group_Status/Ontology_and_Taxonom
y_Coordination/COSMO_Common_Semantic_Model
among all of the representations used in applications. [14]
This means that wherever different ontologists prefer http://semanticommunity.wik.is/Federal_Semantic_Interoperability_C
ommunity_of_Practice/Work_Group_Status/Ontology_and_Taxonom
different means of representing a concept, both y_Coordination
alternatives are included, with a translation rule (e.g. [15] http://micra.com/COSMO/COSMOoverview.doc
“bridging axioms”) that automatically converts from [16] The OWL Web Ontology Language Reference:
http://www.w3.org/TR/owl-ref/
one view to the other. An example would be the