=Paper= {{Paper |id=Vol-440/paper-11 |storemode=property |title=Toward an Open-Source Foundation Ontology Representing the Longman’s Defining Vocabulary: The COSMO Ontology OWL Version |pdfUrl=https://ceur-ws.org/Vol-440/paper11.pdf |volume=Vol-440 |dblpUrl=https://dblp.org/rec/conf/oic/Cassidy08 }} ==Toward an Open-Source Foundation Ontology Representing the Longman’s Defining Vocabulary: The COSMO Ontology OWL Version== https://ceur-ws.org/Vol-440/paper11.pdf
      Toward an Open-Source Foundation Ontology Representing the Longman’s
             Defining Vocabulary: The COSMO Ontology OWL Version

                                                       Patrick Cassidy
                                                   MICRA, Inc., Plainfield, NJ
                                                      cassidy@micra.com

Abstract - The COSMO foundation ontology is being developed         different ontology groups that will accept the ontology
to test the hypothesis that there are a relatively small number     can be maximized by keeping the foundation ontology
(under 10,000) of primitive ontology elements that are sufficient
to serve as the building blocks for any number of more              as small as possible without compromising its ability to
specialized ontology elements representing concepts and terms       support logical representation of terms and concepts in
used in any computer application. Finding evidence for this         any application domain. In the COSMO approach , that
hypothesis would suggest that a promising tactic to achieve         could be achieved by discovering the smallest
Semantic Interoperability among computer applications is to
focus effort on the common foundation ontology to that
                                                                    inventory of fundamental ontology elements,
ontology that contains those primitive elements. This will          representing the minimal essential primitive concepts
constrain the size of the ontology on which agreement is            that are needed to build representations of any more
required, to the minimum that will support accurately relating      complex concept.
domain and application ontologies to each other. The rationale,
methodology and current status of this project is reported here.
                                                                           II. BACKGROUND TO THE COSMO APPROACH
Index Terms – Foundation ontology, conceptual primitives,
COSMO, semantic interoperability, common ontology, ontology         A. The Notion of Conceptual Primitives
mapping, Longman, defining vocabulary.
                                                                    The approach proposed here relies on the observation
                      I. INTRODUCTION                               that communication among agents (human or
                                                                    automated) depends on the agents sharing some
Information communicated and analyzed by the                        common set of internally understood concepts, labeled
intelligence community is highly diverse, including                 by an agreed set of symbols such as words in human
technical, social and psychological concepts. The                   languages, or element names in databases. Wherever a
challenge of using automatic techniques for integrating             particular community uses concepts not already among
such information will require adoption of an ontology               the known concepts of other communities, information
that is capable of unambiguously representing the full              sharing requires the first community to use a common
range of knowledge that people communicate. There is                set of defining concepts to construct definitions of the
as yet no consensus on how to structure that ontology.              unknown concepts understandable to the other
This paper describes one approach to overcome the                   communities. In this manner communicating agents
lack of agreement caused by multiple fundamentally                  can accurately transfer information on topics familiar
different approaches to foundation ontology                         or initially unfamiliar to other agents. Information
development.       The proposed approach depends on                 transfer using human languages is facilitated by the
three factors: (1) to develop a foundation ontology that            existence of a relatively small vocabulary of basic
is effective as a standard of meaning for                           words, representing those commonly understood
communication among many applications, it is not                    concepts, that can be used to create linguistic
necessary to achieve universal agreement among                      definitions of any specialized concept. Research in
ontology developers about the structure of the                      Linguistics has explored by experimental techniques
foundation ontology; it is only necessary to build a                the number and identity of the common primitive
sufficiently large user group that third-party vendors              concepts that are used in linguistic communication
will have incentive to develop utilities making the                 among people speaking different languages. Some of
ontology easier to use, and applications that                       that work, summarized by Goddard[1], has suggested
demonstrate the usefulness of the ontology for practical            that as few as 60 semantic primitives are adequate to
purposes. (2) by allowing multiple logically compatible             construct definitions of a very large number of
views for representing the same entities, and providing             concepts. A less systematic but more comprehensive
translation utilities between them, many of the                     demonstration of the power of primitive concepts to
differing preferences for representing entities can be              suffice for construction of definitions of many words is
accommodated in the same ontology. (3) the number of                found in some English-language dictionaries such as
the Longman [2] that use a Defining Vocabulary of          and accurately interpretable in both systems. The
basic words with which to define all of the entries in     combination of the ontologies of the two systems in
the dictionary. The Longman Defining Vocabulary            effect creates a single merged ontology common to
(hereafter LDV) contains 2148 words, but an                both systems. In that situation, the same input data in
investigation [3], [4], [5] has shown that even fewer      both systems will produce the same inferences.
words are needed to define (recursively) all of the        Different data in the two systems will create some
Longman entries.        For cases where a proposed         different inferences, but those will not be logically
definition of a new word uses words not already in the     inconsistent if the data is not inconsistent.        For a
defining vocabulary, the Defining Vocabulary tactic        proper automated merger of the two ontologies, it will
requires that the unrecognized word itself be defined      be necessary to have utilities that can automatically
by use of the basic Defining Vocabulary. The answer        recognize identical elements created in the two
appears to be that, for the Longman, words recursively     separate local ontologies, and to detect inconsistencies
defined in such a manner “ground out” using a basic        if they exist. But this tactic for interoperability avoids
vocabulary of 1433 words representing 3200 word            the impossible task of automatically interpreting
senses.                                                    information in an external ontology that is based on
                                                           fundamentally different (usually undocumented)
The success of the linguistic defining vocabulary for      assumptions about how to represent the same intended
dictionaries suggests that a similar tactic could be       meanings of terms and concepts.
effective for automated information transfer among
computer systems.        For automated systems, the        B. The Current Absence of a Conceptual Standard
“Defining Vocabulary” would take the form of a
foundation ontology having an inventory of basic           To function as a conceptual standard that will enable
concept representations that is sufficient to create       semantic interoperability, i.e. permit computers to
representations of any new concept, by combinations        reason accurately and automatically with transferred
of the basic elements. Communities using such a            information, the syntactic format for a common
“Conceptual Defining Vocabulary (CDV)”           (i.e. a   standard must have at least the expressivity of First-
common foundation ontology) would be able to pursue        Order Logic (FOL), so as to permit logical inference
their own interests using any local terminology or         using rules expressing domain knowledge. Several
ontology that suits their purposes, and still              foundation ontologies, such as OpenCyc[6], SUMO[7],
communicate their information accurately in a form         DOLCE[8], and BFO[9], have been developed that
suitable for automated inferencing, by translating the     have this technical capability.       Other knowledge
local information into the terminology of the common       classifications such as NIEM[10] and the DoD Core
foundation ontology. Limiting the core foundation          Taxonomy[11] have less expressiveness. None of
ontology to the elements needed for a CDV will             these projects has adopted the tactic of creating a CDV,
minimize the effort required to perform the                and none has been recognized as a default standard for
translations, while ensuring that accurate translations    application builders concerned with specific topics and
are possible. The question remains whether the             indifferent to the nuances of representation at the
linguistic Defining Vocabulary examples can be             abstract levels. The reasons for lack of wide adoption
adapted to the more precise requirements of                vary.     The complexity of each of the existing
representing terms and concepts in a logical format,       foundation ontologies presents a steep learning curve
suitable for automated reasoning.                          which requires a strong motivation to impel potential
                                                           users to spend the required time. In the case of Cyc,
The essential principle of such a tactic for Semantic      much of the content (such as the over 1000 specialized
Interoperability is that, when the separately developed    reasoning modules) is still proprietary and cannot be
ontologies of two different systems both use the same      part of an open-source project that could include
CDV to specify the structures of their ontology            desired components from many non-Cycorp sources.
elements, then accurate information sharing can be         Development of an effective open-source natural-
achieved, even if the two systems each have some           language interface to the ontology is also desirable, to
separately-defined ontology elements not in the other,     make learning and use convenient.           None of the
by sharing the specifications of the ontology elements     existing foundation ontologies has such an interface.
of each that are not in the other. Since the ontology      Without publicly available examples showing the
elements of each system are built from the same            benefits of using a complex ontology, a specialized
primitive elements of the CDV, they will be properly       application developer without a need to interoperate
outside the local community is strongly tempted to             III. THE COSMO PROJECT
develop a specialized ontology that is not linked to a
foundation ontology.         As a result, specialized          A. Origin
ontologies with no linkages to any of the major
foundation ontologies have proliferated.                       The COSMO ontology [12] is currently being
The above considerations suggest the following                 developed to serve as a fully public foundation
desiderata for a foundation ontology that can be               ontology that contains representations of all of the
adopted and used by a large enough community to                2100 words in the LDV, with the intention of serving
serve as a de facto standard of meaning:                       as a broadly acceptable CDV. COSMO (COmmon
    • the core set of concept representations required         Semantic MOdel) was initiated in 2005 [13] as a
        to use the ontology effectively should be as small     project of the Ontology and Taxonomy Coordinating
        as possible, but sufficient to support specification   Working Group [14], a working group of the Federal
        of any specialized concept meaning                     Semantic Interoperability Community of Practice.
    • the ontology should be fully public and                  The origin of COSMO is discussed in more detail in
        developed by an open procedure, so as to permit        [15]. In early 2008 the project adopted the current goal
        alternative logically compatible views of entities;
        it should be maintained by an open process and         of representing the LDV. Developing the ontology as a
        allow additions as needed to represent new             CDV promises to furnish a foundation ontology that
        topics;                                                has all of the elements (types, relations) needed to
    • there should be a powerful intuitive natural             build representations of any concept of interest in any
        language interface, capable of determining             application, yet be small enough to be usable without
        whether (1) representations of specific concepts       an extended learning period. The goal in effect is to
        are already present in the core foundation             identify the smallest foundation ontology that is
        ontology or in some public extension, or (2) if        sufficient to serve as the basis for broad semantic
        not, to list the elements in the ontology closest in
        meaning                                                interoperability. Such a foundation ontology will
    • the ontology format should have the                      contain representations of the essential units of
        expressiveness of at least FOL                         meaning that can be combined to represent any
    • there should be several open-source substantive          specialized term or concept of interest in applications.
        applications demonstrating the usefulness of the
        ontology                                               B. Project phasing
    • extensions to the core, with logical specifications
        of concepts based on combinations of the core          COSMO is proceeding in several phases. The first
        concept representations, should be maintained          phase, expected to be complete within 3 months, is to
        and freely available, in the manner of Java            create a representation of all of the words in the LDV,
        library packages, to minimize the need for             in an OWL format [16]. The expressiveness of at least
        creating new definitions.
                                                               pseudo-second-order logic (a FOL in which variables
                                                               can represent relations or assertions) is required for
In order have a de facto standard of meaning, it is not
necessary to have universal agreement to use only one          some applications such as Natural Language
foundation ontology; it is only necessary that some            understanding. The plan is therefore to maintain an
foundation ontology have a user community large                OWL version, but convert it automatically to a
enough for third-party vendors to have incentive to            Common-Logic (CL) compliant language such as KIF
develop utilities that make the standard easier to use,        or IKL. This will require representing rules, functions,
and to develop applications that demonstrate its utility.      and higher-arity relations in the OWL format.
It should also have a sufficiently wide community of
users that research groups will have an incentive to use       When the COSMO ontology has the full set of LDV
it as the standard of meaning through which they can
transfer information from diverse separate applications,       words represented, it will be tested for its ability to
each using different forms of intelligent information          serve as a CDV, by creating representations of several
processing.                                                    sets of specialized concepts and discovering how many
                                                               new fundamental concept representations need to be
                                                               added to the foundation ontology. It is estimated that
                                                               this first version will contain over 7500 types (OWL
                                                               classes), over 700 relations, and over 1000 restrictions
                                                               that constrain the meanings of the elements.
The COSMO itself is not expected to be adopted             concept of “mother” which is represented in some
without change as a common foundation ontology.            ontologies only as a relation (‘isTheMotherOf’), and in
The main purpose of this project is to demonstrate the     others as the type (class) ‘Mother’. The COSMO
feasibility a Conceptual Defining Vocabulary as an         OWL version can include both representations, but the
effective basis for semantic interoperability. A CDV       automatic conversion of such alternative views will
that is widely accepted is likely to arise only from a     often require that rules be used, and will be possible
collaborative effort by a broad consortium of ontology     only in the more expressive common-logic format.
builders and users, as well as developers of other         Using an ontology representing multiple views could
knowledge representation constructs such as the            lead to inference that is less efficient than with a more
NIEM. More than one CDV may eventually find wide           restrictive representation. However, it is expected that
use, but the number of such ontologies is likely to be     multiple alternative representations will be needed only
smaller than the number of operating systems, because      for interoperability among applications, and individual
the greater number and complexity of primitive data        local applications will not use the full ontology, but
structures required for a CDV is larger than those         will select out only those elements required for the
manipulated by operating systems.                          local application.       In this way, full semantic
                                                           interoperability can be achieved among applications,
C. Criterion for Success                                   without sacrifice of efficiency.

The criterion for determining whether the COSMO can                                     REFERENCES
serve as a starting CDV will be based on the number of
                                                           [1] Cliff Goddard, Bad Arguments Against Semantic Primitives,
new primitive ontology elements that must be added to           Theoretical Linguistics, Vol. 24 (1998), No. 2-3: 129-156. (Available
the COSMO in order to represent groups of new terms             online at: http://www.une.edu.au/bcss/linguistics/nsm/pdfs/bad-
or concepts from additional specialized topics. It is           arguments5.pdf)
                                                           [2] Longman Dictionary of Contemporary English, Longman Group,
expected that some additional primitive elements                Essex, England (New Edition,1987)
(types, relations) will be need to be added to the         [3] Guo, Cheng-ming (1989) Constructing a machine-tractable dictionary
COSMO as knowledge in diverse fields is represented.            from "Longman Dictionary of Contemporary English" (Ph. D.
                                                                Thesis), New Mexico State University.
To function as an effective CDV, what is required is       [4] Guo, Cheng-ming (editor) Machine Tractable Dictionaries: Design
that the number of such new primitives added to the             and Construction, Ablex Publishing Co., Norwood NJ (1995).
                                                           [5] Yorick Wilks, Brian Slator, and Louise Guthrie, Electric Words:
ontology will decrease asymptotically as each                   Dictionaries, Computers, and Meanings, MIT Press, Cambridge Mass
successive block (e.g. of 500) of new terms is                  (1996).
represented using the foundation ontology. Such            [6] OpenCyc: http://opencyc.org/
                                                           [7] http://www.ontologyportal.org/
statistical evidence that there is some limit to the       [8] See: http://www.loa-cnr.it/DOLCE.html
number of new terms that must be added will help           [9] Pierre Grenon, BFO in a Nutshell: A Bi-categorial Axiomatization of
answer the two questions, of whether there is any limit         BFO and Comparison with DOLCE, IFOMIS report 06/2003 (2003).
                                                                Available at: http://www.ifomis.uni-
to the number of basic elements required for the CDV,           saarland.de/Research/IFOMISReports/IFOMIS%20Report%2006_200
and if so, approximately what is that number.                   3.pdf.
                                                                See also : http://www.ifomis.uni-saarland.de/bfo/
                                                           [10] See: http://www.niem.gov/
D. Allowance for Multiple Viewpoints                       [11] DoD Core Taxonomy: http://www.dtic.mil/dtic/annualconf/conf05-
                                                                Dickert.ppt
                                                           [12] http://micra.com/COSMO/COSMO.owl
Essential to its role in enabling semantic                 [13]
interoperability is that COSMO must be inclusive of all         http://semanticommunity.wik.is/Federal_Semantic_Interoperability_C
logically compatible views, so as to permit translations        ommunity_of_Practice/Work_Group_Status/Ontology_and_Taxonom
                                                                y_Coordination/COSMO_Common_Semantic_Model
among all of the representations used in applications.     [14]
This means that wherever different ontologists prefer           http://semanticommunity.wik.is/Federal_Semantic_Interoperability_C
                                                                ommunity_of_Practice/Work_Group_Status/Ontology_and_Taxonom
different means of representing a concept, both                 y_Coordination
alternatives are included, with a translation rule (e.g.   [15] http://micra.com/COSMO/COSMOoverview.doc
“bridging axioms”) that automatically converts from        [16] The OWL Web Ontology Language Reference:
                                                                  http://www.w3.org/TR/owl-ref/
one view to the other. An example would be the