=Paper=
{{Paper
|id=Vol-1097/STIDS2013_T02
|storemode=property
|title=A Reference Architecture for Probabilistic Ontology Development
|pdfUrl=https://ceur-ws.org/Vol-1097/STIDS2013_T02_HaberlinEtAl.pdf
|volume=Vol-1097
|dblpUrl=https://dblp.org/rec/conf/stids/HaberlinCL13
}}
==A Reference Architecture for Probabilistic Ontology Development==
A Reference Architecture for Probabilistic Ontology
Development
Richard J. Haberlin, Jr. Paulo C. G. da Costa
EMSolutions, Inc. Kathryn B. Laskey
Arlington, Virginia Systems Engineering and Operations Research
rjhaberlin@comcast.net George Mason University
Fairfax, Virginia
pcosta, klaskey @gmu.edu
Abstract - The use of ontologies is on the rise, as they facilitate relationships, thereby reducing development time and project
interoperability and provide support for automation. Today, risk. Further, it standardizes language among participants,
ontologies are popular for research in areas such as the Semantic provides consistency of development within the domain,
Web, knowledge engineering, artificial intelligence and provides a reference for evaluation, and establishes
knowledge management. However, many real world problems in specifications and patterns [1].
these disciplines are burdened by incomplete information and
other sources of uncertainty which traditional ontologies cannot A. Background
represent. Therefore, a means to incorporate uncertainty is a Development of the RAPOD provides synergy of effort
necessity. Probabilistic ontologies extend current ontology within the Semantic Technology (ST) community by
formalisms to provide support for representing and reasoning identifying concepts, processes, languages, theories and tools
with uncertainty. Representation of uncertainty in real-world
for designing and maintaining probabilistic ontologies.
problems requires probabilistic ontologies, which integrate the
Presently, ontological engineering facilitates the development
inferential reasoning power of probabilistic representations with
the first-order expressivity of ontologies. This paper introduces a
of explicit, logical and defensible ontologies for knowledge-
systematic approach to probabilistic ontology development sharing and reuse. A similar pragmatics in the form of the
through a reference architecture which captures the evolution of Probabilistic Ontology Development Methodology has been
a traditional ontology into a probabilistic ontology produced for probabilistic ontologies and is described in [3].
implementation for real-world problems. The Reference The RAPOD facilitates synergy of effort between multiple
Architecture for Probabilistic Ontology Development catalogues disciplines including probabilists, logicians, decision analysts
and defines the processes and artifacts necessary for the and computer scientists. It describes each of the components
development, implementation and evaluation of explicit, logical required for a functional probabilistic ontology and their
and defensible probabilistic ontologies developed for knowledge- interrelationships, and defines the criteria to be satisfied by any
sharing and reuse in a given domain. set of selected tools and methods using a Unified Process-
inspired methodology.
Keywords—probabilistic ontology, knowledge engineering,
reference architecture B. Scope
The RAPOD spans the knowledge, processes, models, and
I. INTRODUCTION tools necessary for engineering probabilistic ontologies at a
The Reference Architecture for Probabilistic Ontology high level of abstraction. Through decomposition or
Development (RAPOD) presents a compilation of components aggregation of existing methodologies, it provides universal
required for probabilistic ontology development and therefore techniques and a generalized framework for the fundamental
facilitates design, implementation, and support processes components needed to construct probabilistic ontologies from
without rigid adherence to a particular set of tools. The conceptualization to operation through multiple tasks,
Department of Defense (DOD) defines a Reference including:
Architecture as:
x Model conceptualization and framing
“… an authoritative source of information about a
specific subject area that guides and constrains the x Ontology development through elicitation and
instantiations of multiple architectures and ontological learning
solutions[1].” x Probability incorporation through iterative
Common throughout the literature on reference decomposition
architectures is the idea of serving as a blueprint for architects There are many participants involved in realizing an
to develop specific solution architectures within a defined operational probabilistic ontology. The Stakeholder Decision
domain [1] [2]. As the blueprint, it serves as a template for Maker (DM), Subject-Matter Expert (SME) and Probabilistic
software development, defining integral components and their
STIDS 2013 Proceedings Page 10
Ontology Developer coordinate to instantiate a collection of an architecture that is used to develop the PO. Figure 1
concepts and tools for development and implementation from provides an overview of the RAPOD, discussed in detail
existing and proposed ontological and probabilistic ontological below.
engineering methodologies, providing a single collection of
knowledge to solve a domain-specific problem. Their solution
is defined as a domain-specific architecture that may be reused
for comparable problems in similar domain contexts.
C. Model Implementation and Viewpoint
The concept behind the RAPOD is to establish intellectual
control of the probabilistic ontology (PO) model, stimulate
reuse, and provide a basis for development through
instantiation of a particular set of tools the developer will
utilize to design and implement complex probabilistic
ontologies for a particular domain [4]. Intellectual control
establishes common semantics and allows consistent
integration of new system components by anticipating their
inclusion from design. Reuse is a prime tenet of ontological
engineering and is enabled through identification of common
components and relationships. Further, a well-defined and
properly architected PO may be reused entirely through spiral Figure 1
modification to incorporate additional knowledge or
relationships. Most importantly, the architecture serves as a The Reference Architecture for Probabilistic Ontology
blueprint for the PO Developer and a clear mechanism between Development shown in Figure 1 illustrates the scope of the
him and the Stakeholder Decision Maker. The architecture reference architecture from abstract to concrete. At the top of
allows individuals, teams, and organizations to communicate the illustration is the most abstract conceptualization defined as
objectives, requirements, constraints, components and a problem or objective by the Stakeholder Decision Maker that
relationships with a common vocabulary and understanding of requires implementation of a probabilistic ontology. For
the objective. Ontological engineering, and probabilistic example, a military commander may be charged with creating
ontological development, may be completed by several a decision support system that assists in the determination of an
different methodologies depending on the context and domain opposing force given limited sensor information. A Naval
of the problem. Therefore, the RAPOD provides ready access application example is given in [3]. The base of the illustration
to tools, techniques, and procedures that have proven represents the operational implementation of the probabilistic
successful in the past. The RAPOD also exposes synergies in ontology to provide inferential reasoning support. Between lies
algorithms, heuristics and model use between ontological and the probabilistic ontology architecture, which translates the
probabilistic ontological engineering. Through careful conceptualization into a blueprint for development. The
selection of tools with common parameters, the final model is probabilistic ontology architecture is comprised of three
more intuitive. The viewpoint of this reference architecture is interacting layers, which group and characterize similar
that of the Probabilistic Ontology Developer in support of a functionality: the Input Layer, Methodology Layer, and
Stakeholder Decision Maker desiring decision support for a Support Layer. These and their relationships are described in
defined area of interest. the following subsections.
A. Input Layer
II. REFERENCE ARCHITECTURE FOR PROBABILISTIC
ONTOLOGY DEVELOPMENT The Input Layer defines external influences on the
probabilistic ontology and is referenced by components of the
The Reference Architecture for Probabilistic Ontology Methodology Layer. It contains those components expected to
Development facilitates PO development and reuse by provide detail on the purpose of the PO and its bounding
providing a template from which multiple PO solutions to constraints in the form of system requirements. Population of
similar problems may be constructed. The output of the the Input Layer occurs primarily during the early stages of the
RAPOD is a domain and problem-type specific architecture development process during which the Stakeholder Decision
that may be used to develop POs for similar problems. Maker and PO Developer work closely to identify the objective
Reusable architectures provide a shortcut to future of the model, expectations of its performance, and resource
development by identifying inputs, methodologies, and support restrictions. Parameters specified in the Input Layer will
artifacts that have previously produced successful solutions constrain the operational implementation.
within the domain.
1) Objectives
In each of its three layers, the RAPOD identifies processes The objectives hierarchy contains a representation of
and artifacts necessary for the construction of a probabilistic performance, cost and schedule attributes that determine the
ontology without specification to particular tools. Working value of the system, with an over-arching Objective Statement
with the stakeholders, the PO Developer selects individual that captures its primary intent [5]. Objectives state the overall
component solutions that suit the problem-type and domain. intent of the project in short, clear, descriptive phrases. They
Specification of a set of tools for each component instantiates
STIDS 2013 Proceedings Page 11
are defined by the Stakeholder DM to bound the scope of the discriminatory, sensitive, and inclusive [9]. In all cases,
final product and set expectations. These are often described in appropriate metrics depend on the system under development
the following form [6]: and its ultimate purpose (objectives).
To Action + Object + Qualifying phrase B. Methodology Layer
For a probabilistic ontology model, applicable categories of The Methodology Layer contains the heart of the
objectives may include: performance, reliability, compatibility, probabilistic ontology development process including the
adaptability, and flexibility. Further descriptions of these and Probabilistic Ontology Development Methodology that allows
other categories may be found in Armstrong [6]. Choosing the creation of a specific probabilistic ontology implementation to
correct objectives ensures that the desired problem is solved support the requirements of a Stakeholder Decision Maker. The
and that the PO Developer and Decision Maker have clearly Methodology Layer references information gathered in the
communicated. The entire project is best focused through a Input Layer and is assembled using components and tools from
Top-level Objective Statement. the Support Layer. Its individual components are introduced
below.
2) Requirements
Requirements define the system to be implemented in terms 1) Probabilistic Ontology Development Methodology
of its behaviors, applications, constraints, properties, and The Probabilistic Ontology Development Methodology
attributes. The systems engineering literature on requirements provides specific activities and tasks that evolve Stakeholder
elicitation and development is rich, but there is consensus that Decision Maker requirements into an ontology that is
no single methodology exists for requirements engineering [7] probabilistically-integrated, a probabilistic ontology. The
[8]. In general, requirements elicitation approaches may be activities of the Probabilistic Ontology Development
categorized as structured or unstructured [8] using a Methodology are shown in the below activity diagram (Figure
combination of strategies depending on the scope of the system 2) and further detailed in [3]. These activities fit well within
under development and the participation commitment of the both Waterfall and Spiral Development Life Cycle processes
Stakeholder Decision Maker. where in Spiral Development iteration is explicitly anticipated.
Requirements are elicited from the Stakeholder Decision Completion of the PODM activities and tasks establishes a
Maker and SMEs through an iterative process that generally framed solution to a specific inferential reasoning problem
includes objective setting, background knowledge acquisition, grounded in an inclusive ontology representing its entities and
knowledge organization, and requirements collection as incorporating probability to represent uncertainty.
introduced by Kotonya and Sommerville [7]. Grady 2) Ontological Engineering
categorizes three strategies for requirements analysis: In Gomez-Perez et al, ontological engineering is defined as
structured analysis, cloning, and freestyle [8]. Using one or the activities that concern the ontology development process,
more of these strategies and concentrating on the four tasks life cycle, construction methodologies and tools [10]. While
above will lead to identification of appropriate requirements to traditional ontological engineering methods ensure that
satisfy valid model development. There is inefficiency and risk ontologies are explicit, logical and defensible, these methods
involved in the unstructured methods as there is nothing to provide insufficient support for the complexity of probabilistic
prevent duplicative work, incompleteness, conflicts and ontology development, as discussed above. A systematic
misdirection. approach to PO development is needed that addresses the
3) Metrics evolution of requirements into an ontology that is
Metrics are used to describe parameters, Measures of probabilistically integrated. The underlying ontology may be
Performance (MOP) and Measures of Effectiveness (MOE) engineered by many methods; but ultimately each
that characterize the criteria against which the fielded system is
to be evaluated. Green defines a hierarchy of effectiveness
measures that follows the system of systems concept [9]. The
following definitions are adapted from those offered by Green
to accommodate the PO development process:
Measures of Effectiveness. A measure of system
performance within its intended environment (e.g. overall
system effectiveness).
Measures of Performance. A measure of one attribute of
system behavior derived from its parameters (e.g. probability
of correct identification).
Parameters. Properties or characteristics whose values
determine system behavior (e.g. error rate).
Armstrong [6] opines that useful metrics take quantifiable
form with both a clear definition of the measure and its
associated units. They must also be mission-oriented,
Figure 2
STIDS 2013 Proceedings Page 12
methodology provides a structured means to produce will be developed into a probabilistic ontology. Buitelar et al.
ontologies from conceptualization to implementation. Some identified innovative aspects of ontology learning that set it
principal design criteria must always be considered: clarity, apart from traditional knowledge acquisition [15]:
coherence, extendibility, minimal encoding bias, and minimal
ontological commitment [11]. x It is inherently multidisciplinary due to its strong
connection with the Semantic Web, which has
3) Ontology Reuse attracted researchers from a very broad variety of
There are two types of ontology reuse: re-engineering and disciplines: knowledge representation, logic,
merging. Ontology re-engineering involves transforming the philosophy, databases, machine learning, natural
conceptual model of an implemented ontology into another language processing, image processing, etc.
conceptual model [10]. On the other hand, ontology merging
uses information captured about one or more domains of x It is primarily concerned with knowledge acquisition
interest in the creation of a new ontology. Therefore, model from and for Web content and is moving away from
reuse is the process by which available knowledge and small and homogeneous data collections.
conceptual models are used as input to generate new models, in x It is rapidly adapting the rigorous evaluation methods
this case ontologies and probabilistic ontologies. Ontology that are central to most machine learning work.
development is a complex and labor-intensive task. The
potential for reuse is an identified strength of ontologies and Through application of ontological learning, both the
allows expansion of existing knowledge bases by capitalizing process of developing a probabilistic ontology and the
on previous research and development [10][11][12][13][14]. development risk may be reduced.
The literature liberally addresses the concept of ontology reuse, Sowa defines three types of ontologies: a formal ontology
but there is little guidance offered for selection of methods for which is a conceptualization whose categories are
merging and/or integration. Integration of similar tasks and the distinguished by axioms and definitions and are stated in logic
addition of tasks emphasizing utility of existing ontologies to support inference and computation, a prototype-based
expand the basic process of ontological engineering to make ontology in which categories are formed by collecting
use of ever-expanding online ontology resources. Before instances extensionally, and a terminological ontology which
beginning construction of a new ontology, it is useful to describes concepts by labels and synonyms without axiomatic
research existing ontologies in related domains to be reused grounding [16]. Ontological learning in support of inferential
and/or extended for the current problem. The ST community is reasoning is concerned primarily with developing the latter two
actively expanding free access to the growing body of categories for the specified domain of interest. The various
ontological knowledge, as discussed below. sources used for ontology elicitation may include databases,
4) Heuristics and Algorithms documents, and taxonomies. As ontologies are typically
Generally, a heuristic is an experience-based technique for hierarchically arranged, the primary means for ontological
problem solving, learning, and discovery and an algorithm is a learning is through clustering. In this method, using a suitable
stepwise procedure for calculation of a problem solution. clustering algorithm, a semantic distance is measured between
Heuristics and algorithms are used to express relationships terms and the nearest terms are clustered and formed into a
between classes within ontologies and probabilistic ontologies prototype-based ontology. Ontological learning may also be
in order to constrain the models. For example, the heuristic “A accomplished through pattern matching using a co-occurrence
weapon is cued by a single sensor” gives a plain-language matrix or bootstrapping from a seed lexicon that is extended by
description of a relationship in which each weapon is assigned measuring similarity.
a single sensor, but sensors may be assigned multiple weapons. The above methods are all primarily focused on learning
This plain language description captures the machine-readable ontologies from plain text corpuses. Recent work includes
cardinality statement of ∞…1 in a format understandable by extracting ontologies from non-text formats including
the entire development group, including the Stakeholder relational databases, structured knowledge bases, and the
Decision Maker and SMEs. Heuristics and algorithms are Semantic Web. Albarrak developed an extensible framework
captured as part of the PODM as described in [3]. for generating ontologies from Relational Database (RDB) and
5) Learning Object-Relational Database (ORDB) data models [17]. Li et al.
Currently, ontology development is a labor-intensive, introduce a novel set of 12 learning rules that build a complete
manual process. However, the need for greater automation OWL ontology of classes, properties, characteristics,
features has been recognized and is a focus of the ST cardinality and instances [18]. A database analyzer extracts key
community. The PODM has integration points primed for information from the relational database, which is then passed
future expansion in the areas of Ontological Learning and to an ontology generator containing the rules. It is also possible
Probabilistic Learning. These two functions assist the modeler to map ontologies through machine learning to transform
in ontology creation and elicitation of probabilities for the existing ontologies within the Semantic Web to a format
probabilistic relationships used for inferential reasoning. useable in the domain context for the current problem. Doan et
al. have introduced the GLUE system to semi-automatically
a) Ontological Learning create these semantic mappings using a multi-strategy learning
Ontological learning is the process of extracting relevant approach based on the joint probability distribution of the
classes, properties and relationships from a given data set, in compared concepts [19] [20]. The concept is to produce a map
this case to reduce effort in development of an ontology which between the existing domain and the desired domain that
STIDS 2013 Proceedings Page 13
translates between taxonomies. Future research promises to learning is performed by a greedy algorithm on the network
reduce the human interaction required for ontological features [25].
engineering.
Multi-Entity Bayesian Network (MEBN) learning also
b) Probabilistic Learning takes advantage of the structure associated with a relational
Elicitation of conditional probabilities to populate database. A key component is generation of a MEBN-RM
distribution tables remains a difficult endeavor, accomplished model that specifies a mapping of MEBN elements to the
through SME interview and experimental data collection. relational model of the database. MEBN parameter learning
Probabilistic learning seeks to reduce the effort involved in estimates the parameters of the local distribution for a resident
establishing prior and conditional probabilities for domain node of an MTheory, given the structure and the database using
entities by specifying a model using empirical data. Pearl maximum likelihood estimation. MEBN structure learning
identified two tasks for probabilistic learning [21]: organizes random variables into MFrags and identifies parent-
child relationships between nodes, given the database. Any
Extracting generic hypothesis evidence-relationships Bayesian Network Structure search algorithm may be used
from records of experience, and [26]. More recently, Park et al. has extended the MEBN
learning algorithm to include both discrete and continuous
Organizing the relationships in a data structure to
random variables [27].
facilitate recall.
Accuracy and consistency in the PO model could be 6) Knowledge Base
improved by learning numerical parameters for a given The knowledge base is a historic collection of domain-
network topology from empirical data instead of relying on specific knowledge contributed by domain SMEs and may
SME input. The literature contains numerous techniques for include ontological information (classes, properties,
parameter learning; two commonly employed methods are: characteristics, and relationships), logical constraints,
heuristics, and probabilities. The breadth of knowledge stored
Maximum Likelihood [22][23] – Parameters are estimated within is unspecified. To distinguish the KB from evidence,
from a set of empirical data using a likelihood weighting there is no temporal component associated with the knowledge
algorithm. base; information contained therein may not represent the
current domain state. Marakas differentiates a database from a
Bayesian Learning [22][23] – Prior knowledge about
knowledge base in this fashion:
parameters is encoded and data is treated as evidence to reduce
the learning process to calculation of posterior distributions. “… a collection of data representing facts is a database. The
Learning is segregated into the categories of structure collection of an expert’s set of facts and heuristics is a
learning and parameter estimation [23][24]. In parameter knowledge base [28].”
estimation, the dependency structure of the probabilistic 7) Ontology Structures
representation is known. The learning task is to define the Ontologies, including probabilistic ontologies, provide a
parameters of the Local Probability Distributions (LPDs). The means to represent knowledge and relationships between
goal of structure learning is to extract the structure of the hierarchically organized classes of objects. Ontologies exist to
probabilistic representation from the dataset. enable knowledge sharing and reuse [11] [13]. As a set of
definitions of formal vocabulary, ontologies allow knowledge
Learning a Probabilistic Relational Model (PRM) requires sharing among hierarchically organized entities. A probabilistic
input in the form of a relational schema that describes the set of ontology addresses the inherent uncertainty involved in
classes, the attributes associated with the classes, and the inferential reasoning applications with inconclusive evidence
relations between objects of classes for the domain. In the by representing it probabilistically.
parameter estimation task, the structure is given, which defines
the parents for each attribute. The parameters that define the a) Ontology
Conditional Probability Disributions (CPDs) for the structure A working ontology captures the classes, properties, and
are learned using the likelihood function to determine the the relationships of a domain of interest. Production of this
probability of the dataset given the model. Structure learning of relational framework facilitates comprehension of the
a PRM is more complex and requires a method to find possible hierarchical organization of domain entities; the relationships
structures and then score them. Getoor et al. describes the use between and properties of domain entities; as well as causal
of a greedy local search procedure to produce a candidate relationships among entities. When uncertainty about aspects
structure which is then scored using the prior probability of the of the domain is important to the purpose for which the
structure and the probability of the dataset, given the structure ontology is being developed, a probabilistic ontology is needed
[23]. to represent the uncertainty.
Recall that the structure of a Markov Logic Network b) Probabilistic Ontology
(MLN) includes a node for each variable and a potential A probabilistic ontology provides a means to represent and
function for each set of nodes that is pairwise linked. Parameter reason with uncertainty by integrating the inferential reasoning
estimation for MLN is performed by computation of the power of probabilistic languages with the first-order
Markov network weights that represent the clique potential expressivity of ontologies. Few things are certain, and inferring
using an optimization of the likelihood function. Structure in the presence of uncertainty allows the decision maker to
STIDS 2013 Proceedings Page 14
focus attention on the most relevant data through designed x Unified Process (UP) – UP is an iterative,
queries. comprehensive development approach adapted to
object oriented models, tools and techniques [29]. It
C. Support Layer
was developed initially for software systems, but in
The Support Layer provides the background technology recent years has been adapted to systems that include
and design strategy necessary to instantiate the hardware and business processes.
conceptualization of a specific probabilistic ontology to satisfy
identified requirements. It includes existing ontologies IDEF0 is commonly associated with hardware systems and
available for reuse or re-engineering, software tools that enable systems-of-systems, especially within the Department of
ontology and probabilistic ontology development, Defense Architecture Framework (DODAF). Class hierarchies
mathematical languages that allow representation of entity are fundamental to ontologies, and object oriented design is
attributes and their relationships, and databases of existing focused on modeling class hierarchies.
facts referenced for learning and knowledge base population. b) Object Relationship Representation
The purpose of the Support Layer is to facilitate probabilistic
Object modeling languages are used to represent
ontology development by identifying technological and
relationships at the system and object level of abstraction to
semantic features specific to a particular inferential reasoning
enable clear, concise communication between Stakeholder
model. The four Support Layer components are discussed
Decision Maker and the PO Developer. While the specific
below.
choice of language is often left to the developer, object
1) Existing Ontologies relationships are frequently represented using languages such
Model reuse is a strength of the ontological engineering as:
discipline and effort should be made to research and
x Unified Modeling Language (UML) – UML is a
incorporate existing ontology material into new application
graphical modeling language for the creation of
areas. This will reduce overall effort and promote commonality
object-oriented models used primarily for software
among different products. Some suggested ontology
engineering [29].
repositories are listed below.
x Systems Modeling Language (SysML) – SysML
2) Modeling Languages
extends UML language with semantic foundation for
A modeling language is a graphical or textual representing requirements, behavior, structure, and
representation used to express knowledge, information, properties of systems and components [30] [31].
processes or systems with a consistent set of rules and syntax.
In the RAPOD, modeling languages serve three functions: There are many diagrams and representations appropriate
to systems architecting available in both UML and SysML; the
x System Architecture Representation PO Developer should select and implement these tools to
x Object Relationship Representation maximize clear communications with the Stakeholder Decision
Maker.
x Ontology (and Probabilistic Ontology) Representation
c) Ontology Representation
A probabilistic ontology is an extension of an ontology Ontology languages allow developers to create explicit,
which incorporates uncertainty while respecting its relational formal conceptualizations of domain models. The main
structure and domain specificity. The output of the RAPOD is requirements of an ontology language identified by Antoniou
a unique instantiated architecture for development of a domain- and Harmelen include [32]:
specific probabilistic ontology to meet an inferential reasoning
requirement. The architecture includes models from each of the x Well-defined syntax
above representation categories and may be reused for
development of new probabilistic ontologies in similar x Well-defined semantics
domains. The following sections describe the purpose of these x Efficient reasoning support
representations.
x Sufficient expressive power
a) System Architecture Representation
An architecture is a conceptual design that defines the x Convenience of expression
structure and behavior of a system. There are two types of Ontology languages are formal, declarative representations
representations commonly employed: traditional and object- that allow compilation and organization of knowledge about a
oriented, represented here by IDEF0 and UP. domain in formal knowledge structures with clearly defined
x Icam Definition for Function Modeling (IDEF0) – semantics. Further, they include reasoning rules to represent
IDEF0 is a process modeling technique that focuses relationships between knowledge classes. The literature
on the functional model of a system. The model is contains many different ontology languages, some of which are
expressed as a set of diagrams, often called pages. optimized for specific domains. Some of the more common
IDEF0 has been applied to the development of examples include [10]:
information systems, business processes and hardware x Web Ontology Language (OWL) – Created by W3C,
systems [5]. derived from DAML+OIL and builds on RDF(S).
STIDS 2013 Proceedings Page 15
x Resource Description Framework (RDF) – Created by x Object Relationship Representation
W3C as a semantic network based language to
describe web resources. x Ontology (and Probabilistic Ontology) Representation
x Knowledge Interchange Format (KIF) (including A probabilistic ontology is an extension of an ontology
OntoLingua) – Based on FOL with an underlying which incorporates uncertainty while respecting its relational
frame paradigm, overlaid by OntoLingua to simplify structure and domain specificity. The output of the RAPOD is
operator functionality. a unique instantiated architecture for development of a domain-
specific probabilistic ontology to meet an inferential reasoning
x DARPA Agent Markup Language + Ontology requirement. The architecture includes models from each of the
Inference Layer (DAML+OIL) – Created by US and above representation categories and may be reused for
EU committee, an extension of RDF(S) with development of new probabilistic ontologies in similar
datatypes and nominals. DAML+OIL has been domains. The following sections describe the purpose of these
superseded by OWL. representations.
x CycL – A declarative language used to represent the 3) Software Tools
knowledge stored in the Cyc Knowledge Base [33]. Modeling tools represent the software implementation
packages used for development and implementation of
x Common Logic (CL) – A FOL language for architectures, ontologies, and probabilistic ontologies in the
knowledge interchange approved and published as an chosen modeling language. With the appropriate modeling
ISO standard for representation and interchange of tools, the entire ontology life cycle may be managed, including
information and data among disparate computer design, implementation, enhancement, and support.
systems [34].
A number of tools are available to capture data and model
x Descriptive Ontology for Linguistic and Cognitive the components of a probabilistic ontology. The PO Developer
Engineering (DOLCE) – A FOL reference module of selects software tools with the correct fidelity to represent
the Wonderweb Project adopted as a starting point for relevant viewpoints and provide the desired communication
comparing and elucidating relationships between and inferential reasoning representation. A combination of
ontologies [35]. these tools gives the PO Developer flexibility in creating
x Basic Formal Ontology (BFO) – An upper-level necessary views for communication, as well as operational
ontological framework used in support of domain ontology and probabilistic ontology models.
ontologies developed for scientific research [36]. a) General Purpose Modeling Tools
OWL has been selected by the World Wide Web Creation of a probabilistic ontology requires representation
Consortium (W3C) as the language of the Semantic Web and of many abstractions of data, processes, and relationships, each
has therefore received broad attention in the research and of which may be best represented in a different software
development communities. Further, OWL is the ontology application. However, to the extent possible, a single, general-
language used by the UnBBayes software tool, allowing purpose tool should be maximized to enhance readability and
evolution of an ontology to a probabilistic ontology without the consistency. Tools such as Microsoft Visio and MagicDraw
need to recreate the classes, instances, and relationships in a assist in visual representation to simplify complex concepts.
new tool. Recall that PR-OWL expresses MEBN in OWL [13].
b) Ontology Engineering Software Tools
Of the above ontology languages, only OWL allows expression
of probabilistic information along with an ontology through the Ontological engineering tools capture the classes,
PR-OWL extension. properties, and instances of ontology entities in a hierarchical
structure. Further, they describe their relationships, domains
d) Probabilistic Ontology Representation and ranges in a contextual environment. The most popular
Probabilistic ontologies are used to comprehensively ontological engineering tool is Protégé, currently in version
describe knowledge about a domain and the uncertainty 4.1.0 (build 239). Protégé also has the advantage of integration
embedded in that knowledge in a principled, structured and with UnBBayes, which allows seamless implementation of
sharable way [13]. The probabilistic web ontology language uncertainty to establish the probabilistic ontology.
(PR-OWL) and its successor (PR-OWL 2) provide a
c) Probabilistic Ontology Engineering Software Tools
knowledge representation formalism with MEBN as the
underlying semantics. A MEBN represents knowledge about Few tools are able to model the complex integration of
attributes of entities and their relationships as a collection of probability and ontologies. The most advanced is UnBBayes,
similar hypotheses organized into theories which satisfy an open source product developed by University of Brasilia
consistency constraints ensuring a unique joint probability and enhanced in collaboration with George Mason University.
distribution over the random variables of interest [37]. UnBBayes has a PR-OWL plug-in that ingests a Protégé
A modeling language is a graphical or textual representation ontology and allows the developer to represent uncertainty
used to express knowledge, information, processes or systems within its hierarchical structure through MEBN Fragments
with a consistent set of rules and syntax. In the RAPOD, using the Probabilistic Web Ontology Language (PR-OWL 2).
modeling languages serve three functions:
x System Architecture Representation
STIDS 2013 Proceedings Page 16
III. SUMMARY [19] AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon
Halevy, "Ontology Matching: A Machine Learning
Use of a reference architecture facilitates design, Approach," in Handbook on Ontologies. Berlin: Springer-
implementation, and reuse of a domain-specific probabilistic Verlag, 2009, pp. 385-404.
ontology construction process by specifying the logical choices [20] Anhai Doan, Jayant Madhavan, Pedro Domingos, and Alon
of components to create a blueprint for a contextual solution. Halevy, "Ontology Matching: A Machine Learning
The instantiated architecture is available for reuse to solve like Approach," in Handbook on Ontologies in Information
problems in similar domains. Systems.: Springer, 2003, pp. 397-416.
[21] Judea Pearl, Probabilistic Reasoning in Intelligent Systems:
REFERENCES Networks of Plausible Inference. San Francisco: Morgan
Kaufmann, 1988.
[1] Office of the Assistance Secretary of Defense for Networks [22] Adnan Darwiche, Modeling and Reasoning with Bayesian
and Information Integration (OASD/NII), "Reference Networks. Cambridge: Cambridge Univeristy Press, 2009.
Architecture Description," Arlington, 2010.
[23] Lise Getoor, Nir Friedman, Daphne Koller, Avi Pfeffer, and
[2] Heather Kreger, Vince Brunssen, Robert Sawyer, Ali Ben Taskar, "Probabilistic Relational Models," in Introduction
Arsanjani, and Rob High. (2012, Jan) IBM Developer Works. to Statistical Relational Learning. Cambridge: The MIT Press,
[Online]. 2007, pp. 129-174.
http://www.ibm.com/developerworks/webservices/library/ws-
[24] James Cussens, "Logic-based Formalisms for Statistical
soa-ref-arch/.
Relational Learning," in Introduction to Statistical Relational
[3] Richard J. Haberlin, Probabilistic Ontology Reference Learning. Cambridge: MIT Press, 2007, ch. 9, pp. 269-290.
Architecture and Design Methodology, PhD George Mason
[25] Pedro Domingos and Matthew Richardson, "Markov Logic: A
University, 2013.
Unifying Framework for Statistical Relational Learning," in
[4] Philippe Kruchten, The Rational Unified Process: An Introduction to Statistical Relational Learning. Cambridge:
Introduction. Upper Saddle River: Addison-Wesley, 2004. The MIT Press, 2007, pp. 339-371.
[5] Dennis M. Buede, The Engineering Design of Systems: [26] Cheol Young Park, Kathryn B. Laskey, Paulo C.G. Costa, and
Models and Methods. New York: John Wiley & Sons, 2000. Shou Matsumoto, "Multi-Entity Bayesian Networks Learning
[6] James E. Armstrong, "Issue Formulation," in Handbook of for Hybrid Variables in Situation Awareness," in Proceedings
Systems Engineering and Management. Hoboken: John Wiley of the 16th International Conference on Information Fusion
& Sons, 2009, pp. 1027-1089. (submitted), Istanbul, 2013, pp. 1-8.
[7] Gerald Kotonya and Ian Sommerville, Requirements [27] Cheol Young Park, Kathryn B. Laskey, Paulo C.G.N. Costa,
Engineering Processes and Techniques. Chichester: John and Shou Matsumoto, "Multi-Entity Bayesian Networks
Wiley & Sons, 1998. Learning in Predictive Situation Awareness," in Proceedings
[8] Jeffrey O. Grady, System Requirements Analysis. New York: of the 18th International Command and Control Research and
McGraw-Hill, Inc., 1993. Technology Symposium, Alexandria, 2013, pp. 1-19.
[9] John M. Green, "Establishing System Measures of [28] George M. Marakas, Decision Support Systems in the 21st
Effectiveness," in Proceedings of the 2nd Biennial National Century. Upper Saddle River: Prentice Hall, 2003.
Forum on Weapon System Effectiveness, Laurel, 2001, pp. 1-5. [29] John W. Satzinger, Robert B. Jackson, and Stephen D. Burd,
[10] Asuncion Gomez-Perez, Fernandez-Lopez Mariano, and Oscar Systems Analysis and Design in a Changing World. Boston:
Corcho, Ontological Engineering with Examples from the Course Technology, 2004.
Areas of Knowledge Management, e-Commerce and the [30] Sanford Friedenthal, Alan Moore, and Rick Steiner, A
Semantic Web. London: Springer-Verlag, 2010. Practical Guide to SysML: The Systems Modeling Language.
[11] Thomas R. Gruber, "Toward Principles for the Design of Amsterdam: Elsevier, 2008.
Ontologies Used for Knowledge Sharing," International [31] Sanford Friedenthal, Alan Moore, and Rick Steiner, OMG
Journal of Human-Computer Studies, pp. 907-928, 1995. Systems Modeling Language Tutorial.: Object Management
[12] Michael K. Bergman, “A Brief Survey of Ontology Group, 2008.
Development Methodologies,” 2011, [Online]. [32] Grigoris Antoniou and Frank Van Harmelen, "Web Ontology
http://www.mkbergman.com/906/a-brief-survey-of-ontology- Language: OWL," in Handbook on Ontologies in Information
development-methodologies/ Systems.: Springer-Verlag, 2003.
[13] Paulo Cesar G. da Costa. Bayesian Semantics for the Semantic [33] Cycorp. (2013, June) CycL: The Cyc Knowledge
Web, PhD George Mason Univeristy, 2005. [Online]. Representation Language. [Online].
http://hdl.handle.net/1920/455 . http://www.cyc.com/cyc/cycl .
[14] Maria C. Keet, “Dependencies between Ontology Design [34] International Standards Organization, "Information technology
Parameters,” International Journal of Metadata, Semantics - Common Logic (CL): a framework for a family of logic-
and Ontologies, pp. 265-284, 2010. based languages," International Standards Organization,
[15] Paul Buitelaar and Bernardo Magnini, "Ontology Learning Standard ISO/IEC 24707:2007(E), 2007.
from Text: An Overview," in Ontology Learning from Text: [35] Institute of Cognitive Science and Technology Italian National
Methods, Applications and Evaluation.: IOS Press, 2005, pp. Research Council. (2013, June) WonderWeb. [Online].
3-12. http://www.loa.istc.cnr.it/DOLCE.html .
[16] John Sowa. (2001) Ontology. [Online]. [36] Institute for Formal Ontology and Medical Information
http://www.jfsowa.com/ontology/ . Science. (2013, March) BFO: Basic Formal Ontology.
[17] Khalid Albarrak, An Extensible Framework for Generating [Online]. http://www.ifomis.org/bfo .
Ontology from Various Data Models, May 2013, PhD [37] Paulo Cesar G. da Costa, K.C. Chang, Kathryn B. Laskey, and
Dissertation. Rommel Novaes Carvalho, "A Multidisciplinary Approach to
[18] Man Li, Xiao-Yong Du, and Shan Wang, "Learning Ontology High Level Fusion in Predictive Situational Awareness," in
from Relational Database," in Proceedings of the 4th Proceedings of the 11th International Conference of the
International Conference on Machine Learning and Society of Information Fusion, Seattle, 2009.
Cybernetics, Guangzhou, 2005, pp. 3410-3415.
STIDS 2013 Proceedings Page 17