Toward an Open-Source Foundation Ontology Representing the Longman’s Defining Vocabulary: The COSMO Ontology OWL Version Patrick Cassidy MICRA, Inc., Plainfield, NJ cassidy@micra.com Abstract - The COSMO foundation ontology is being developed different ontology groups that will accept the ontology to test the hypothesis that there are a relatively small number can be maximized by keeping the foundation ontology (under 10,000) of primitive ontology elements that are sufficient to serve as the building blocks for any number of more as small as possible without compromising its ability to specialized ontology elements representing concepts and terms support logical representation of terms and concepts in used in any computer application. Finding evidence for this any application domain. In the COSMO approach , that hypothesis would suggest that a promising tactic to achieve could be achieved by discovering the smallest Semantic Interoperability among computer applications is to focus effort on the common foundation ontology to that inventory of fundamental ontology elements, ontology that contains those primitive elements. This will representing the minimal essential primitive concepts constrain the size of the ontology on which agreement is that are needed to build representations of any more required, to the minimum that will support accurately relating complex concept. domain and application ontologies to each other. The rationale, methodology and current status of this project is reported here. II. BACKGROUND TO THE COSMO APPROACH Index Terms – Foundation ontology, conceptual primitives, COSMO, semantic interoperability, common ontology, ontology A. The Notion of Conceptual Primitives mapping, Longman, defining vocabulary. The approach proposed here relies on the observation I. INTRODUCTION that communication among agents (human or automated) depends on the agents sharing some Information communicated and analyzed by the common set of internally understood concepts, labeled intelligence community is highly diverse, including by an agreed set of symbols such as words in human technical, social and psychological concepts. The languages, or element names in databases. Wherever a challenge of using automatic techniques for integrating particular community uses concepts not already among such information will require adoption of an ontology the known concepts of other communities, information that is capable of unambiguously representing the full sharing requires the first community to use a common range of knowledge that people communicate. There is set of defining concepts to construct definitions of the as yet no consensus on how to structure that ontology. unknown concepts understandable to the other This paper describes one approach to overcome the communities. In this manner communicating agents lack of agreement caused by multiple fundamentally can accurately transfer information on topics familiar different approaches to foundation ontology or initially unfamiliar to other agents. Information development. The proposed approach depends on transfer using human languages is facilitated by the three factors: (1) to develop a foundation ontology that existence of a relatively small vocabulary of basic is effective as a standard of meaning for words, representing those commonly understood communication among many applications, it is not concepts, that can be used to create linguistic necessary to achieve universal agreement among definitions of any specialized concept. Research in ontology developers about the structure of the Linguistics has explored by experimental techniques foundation ontology; it is only necessary to build a the number and identity of the common primitive sufficiently large user group that third-party vendors concepts that are used in linguistic communication will have incentive to develop utilities making the among people speaking different languages. Some of ontology easier to use, and applications that that work, summarized by Goddard[1], has suggested demonstrate the usefulness of the ontology for practical that as few as 60 semantic primitives are adequate to purposes. (2) by allowing multiple logically compatible construct definitions of a very large number of views for representing the same entities, and providing concepts. A less systematic but more comprehensive translation utilities between them, many of the demonstration of the power of primitive concepts to differing preferences for representing entities can be suffice for construction of definitions of many words is accommodated in the same ontology. (3) the number of found in some English-language dictionaries such as the Longman [2] that use a Defining Vocabulary of and accurately interpretable in both systems. The basic words with which to define all of the entries in combination of the ontologies of the two systems in the dictionary. The Longman Defining Vocabulary effect creates a single merged ontology common to (hereafter LDV) contains 2148 words, but an both systems. In that situation, the same input data in investigation [3], [4], [5] has shown that even fewer both systems will produce the same inferences. words are needed to define (recursively) all of the Different data in the two systems will create some Longman entries. For cases where a proposed different inferences, but those will not be logically definition of a new word uses words not already in the inconsistent if the data is not inconsistent. For a defining vocabulary, the Defining Vocabulary tactic proper automated merger of the two ontologies, it will requires that the unrecognized word itself be defined be necessary to have utilities that can automatically by use of the basic Defining Vocabulary. The answer recognize identical elements created in the two appears to be that, for the Longman, words recursively separate local ontologies, and to detect inconsistencies defined in such a manner “ground out” using a basic if they exist. But this tactic for interoperability avoids vocabulary of 1433 words representing 3200 word the impossible task of automatically interpreting senses. information in an external ontology that is based on fundamentally different (usually undocumented) The success of the linguistic defining vocabulary for assumptions about how to represent the same intended dictionaries suggests that a similar tactic could be meanings of terms and concepts. effective for automated information transfer among computer systems. For automated systems, the B. The Current Absence of a Conceptual Standard “Defining Vocabulary” would take the form of a foundation ontology having an inventory of basic To function as a conceptual standard that will enable concept representations that is sufficient to create semantic interoperability, i.e. permit computers to representations of any new concept, by combinations reason accurately and automatically with transferred of the basic elements. Communities using such a information, the syntactic format for a common “Conceptual Defining Vocabulary (CDV)” (i.e. a standard must have at least the expressivity of First- common foundation ontology) would be able to pursue Order Logic (FOL), so as to permit logical inference their own interests using any local terminology or using rules expressing domain knowledge. Several ontology that suits their purposes, and still foundation ontologies, such as OpenCyc[6], SUMO[7], communicate their information accurately in a form DOLCE[8], and BFO[9], have been developed that suitable for automated inferencing, by translating the have this technical capability. Other knowledge local information into the terminology of the common classifications such as NIEM[10] and the DoD Core foundation ontology. Limiting the core foundation Taxonomy[11] have less expressiveness. None of ontology to the elements needed for a CDV will these projects has adopted the tactic of creating a CDV, minimize the effort required to perform the and none has been recognized as a default standard for translations, while ensuring that accurate translations application builders concerned with specific topics and are possible. The question remains whether the indifferent to the nuances of representation at the linguistic Defining Vocabulary examples can be abstract levels. The reasons for lack of wide adoption adapted to the more precise requirements of vary. The complexity of each of the existing representing terms and concepts in a logical format, foundation ontologies presents a steep learning curve suitable for automated reasoning. which requires a strong motivation to impel potential users to spend the required time. In the case of Cyc, The essential principle of such a tactic for Semantic much of the content (such as the over 1000 specialized Interoperability is that, when the separately developed reasoning modules) is still proprietary and cannot be ontologies of two different systems both use the same part of an open-source project that could include CDV to specify the structures of their ontology desired components from many non-Cycorp sources. elements, then accurate information sharing can be Development of an effective open-source natural- achieved, even if the two systems each have some language interface to the ontology is also desirable, to separately-defined ontology elements not in the other, make learning and use convenient. None of the by sharing the specifications of the ontology elements existing foundation ontologies has such an interface. of each that are not in the other. Since the ontology Without publicly available examples showing the elements of each system are built from the same benefits of using a complex ontology, a specialized primitive elements of the CDV, they will be properly application developer without a need to interoperate outside the local community is strongly tempted to III. THE COSMO PROJECT develop a specialized ontology that is not linked to a foundation ontology. As a result, specialized A. Origin ontologies with no linkages to any of the major foundation ontologies have proliferated. The COSMO ontology [12] is currently being The above considerations suggest the following developed to serve as a fully public foundation desiderata for a foundation ontology that can be ontology that contains representations of all of the adopted and used by a large enough community to 2100 words in the LDV, with the intention of serving serve as a de facto standard of meaning: as a broadly acceptable CDV. COSMO (COmmon • the core set of concept representations required Semantic MOdel) was initiated in 2005 [13] as a to use the ontology effectively should be as small project of the Ontology and Taxonomy Coordinating as possible, but sufficient to support specification Working Group [14], a working group of the Federal of any specialized concept meaning Semantic Interoperability Community of Practice. • the ontology should be fully public and The origin of COSMO is discussed in more detail in developed by an open procedure, so as to permit [15]. In early 2008 the project adopted the current goal alternative logically compatible views of entities; it should be maintained by an open process and of representing the LDV. Developing the ontology as a allow additions as needed to represent new CDV promises to furnish a foundation ontology that topics; has all of the elements (types, relations) needed to • there should be a powerful intuitive natural build representations of any concept of interest in any language interface, capable of determining application, yet be small enough to be usable without whether (1) representations of specific concepts an extended learning period. The goal in effect is to are already present in the core foundation identify the smallest foundation ontology that is ontology or in some public extension, or (2) if sufficient to serve as the basis for broad semantic not, to list the elements in the ontology closest in meaning interoperability. Such a foundation ontology will • the ontology format should have the contain representations of the essential units of expressiveness of at least FOL meaning that can be combined to represent any • there should be several open-source substantive specialized term or concept of interest in applications. applications demonstrating the usefulness of the ontology B. Project phasing • extensions to the core, with logical specifications of concepts based on combinations of the core COSMO is proceeding in several phases. The first concept representations, should be maintained phase, expected to be complete within 3 months, is to and freely available, in the manner of Java create a representation of all of the words in the LDV, library packages, to minimize the need for in an OWL format [16]. The expressiveness of at least creating new definitions. pseudo-second-order logic (a FOL in which variables can represent relations or assertions) is required for In order have a de facto standard of meaning, it is not necessary to have universal agreement to use only one some applications such as Natural Language foundation ontology; it is only necessary that some understanding. The plan is therefore to maintain an foundation ontology have a user community large OWL version, but convert it automatically to a enough for third-party vendors to have incentive to Common-Logic (CL) compliant language such as KIF develop utilities that make the standard easier to use, or IKL. This will require representing rules, functions, and to develop applications that demonstrate its utility. and higher-arity relations in the OWL format. It should also have a sufficiently wide community of users that research groups will have an incentive to use When the COSMO ontology has the full set of LDV it as the standard of meaning through which they can transfer information from diverse separate applications, words represented, it will be tested for its ability to each using different forms of intelligent information serve as a CDV, by creating representations of several processing. sets of specialized concepts and discovering how many new fundamental concept representations need to be added to the foundation ontology. It is estimated that this first version will contain over 7500 types (OWL classes), over 700 relations, and over 1000 restrictions that constrain the meanings of the elements. The COSMO itself is not expected to be adopted concept of “mother” which is represented in some without change as a common foundation ontology. ontologies only as a relation (‘isTheMotherOf’), and in The main purpose of this project is to demonstrate the others as the type (class) ‘Mother’. The COSMO feasibility a Conceptual Defining Vocabulary as an OWL version can include both representations, but the effective basis for semantic interoperability. A CDV automatic conversion of such alternative views will that is widely accepted is likely to arise only from a often require that rules be used, and will be possible collaborative effort by a broad consortium of ontology only in the more expressive common-logic format. builders and users, as well as developers of other Using an ontology representing multiple views could knowledge representation constructs such as the lead to inference that is less efficient than with a more NIEM. More than one CDV may eventually find wide restrictive representation. However, it is expected that use, but the number of such ontologies is likely to be multiple alternative representations will be needed only smaller than the number of operating systems, because for interoperability among applications, and individual the greater number and complexity of primitive data local applications will not use the full ontology, but structures required for a CDV is larger than those will select out only those elements required for the manipulated by operating systems. local application. In this way, full semantic interoperability can be achieved among applications, C. Criterion for Success without sacrifice of efficiency. The criterion for determining whether the COSMO can REFERENCES serve as a starting CDV will be based on the number of [1] Cliff Goddard, Bad Arguments Against Semantic Primitives, new primitive ontology elements that must be added to Theoretical Linguistics, Vol. 24 (1998), No. 2-3: 129-156. (Available the COSMO in order to represent groups of new terms online at: http://www.une.edu.au/bcss/linguistics/nsm/pdfs/bad- or concepts from additional specialized topics. It is arguments5.pdf) [2] Longman Dictionary of Contemporary English, Longman Group, expected that some additional primitive elements Essex, England (New Edition,1987) (types, relations) will be need to be added to the [3] Guo, Cheng-ming (1989) Constructing a machine-tractable dictionary COSMO as knowledge in diverse fields is represented. from "Longman Dictionary of Contemporary English" (Ph. D. Thesis), New Mexico State University. To function as an effective CDV, what is required is [4] Guo, Cheng-ming (editor) Machine Tractable Dictionaries: Design that the number of such new primitives added to the and Construction, Ablex Publishing Co., Norwood NJ (1995). [5] Yorick Wilks, Brian Slator, and Louise Guthrie, Electric Words: ontology will decrease asymptotically as each Dictionaries, Computers, and Meanings, MIT Press, Cambridge Mass successive block (e.g. of 500) of new terms is (1996). represented using the foundation ontology. Such [6] OpenCyc: http://opencyc.org/ [7] http://www.ontologyportal.org/ statistical evidence that there is some limit to the [8] See: http://www.loa-cnr.it/DOLCE.html number of new terms that must be added will help [9] Pierre Grenon, BFO in a Nutshell: A Bi-categorial Axiomatization of answer the two questions, of whether there is any limit BFO and Comparison with DOLCE, IFOMIS report 06/2003 (2003). Available at: http://www.ifomis.uni- to the number of basic elements required for the CDV, saarland.de/Research/IFOMISReports/IFOMIS%20Report%2006_200 and if so, approximately what is that number. 3.pdf. See also : http://www.ifomis.uni-saarland.de/bfo/ [10] See: http://www.niem.gov/ D. Allowance for Multiple Viewpoints [11] DoD Core Taxonomy: http://www.dtic.mil/dtic/annualconf/conf05- Dickert.ppt [12] http://micra.com/COSMO/COSMO.owl Essential to its role in enabling semantic [13] interoperability is that COSMO must be inclusive of all http://semanticommunity.wik.is/Federal_Semantic_Interoperability_C logically compatible views, so as to permit translations ommunity_of_Practice/Work_Group_Status/Ontology_and_Taxonom y_Coordination/COSMO_Common_Semantic_Model among all of the representations used in applications. [14] This means that wherever different ontologists prefer http://semanticommunity.wik.is/Federal_Semantic_Interoperability_C ommunity_of_Practice/Work_Group_Status/Ontology_and_Taxonom different means of representing a concept, both y_Coordination alternatives are included, with a translation rule (e.g. [15] http://micra.com/COSMO/COSMOoverview.doc “bridging axioms”) that automatically converts from [16] The OWL Web Ontology Language Reference: http://www.w3.org/TR/owl-ref/ one view to the other. An example would be the