=Paper= {{Paper |id=Vol-1097/STIDS2013_T02 |storemode=property |title=A Reference Architecture for Probabilistic Ontology Development |pdfUrl=https://ceur-ws.org/Vol-1097/STIDS2013_T02_HaberlinEtAl.pdf |volume=Vol-1097 |dblpUrl=https://dblp.org/rec/conf/stids/HaberlinCL13 }} ==A Reference Architecture for Probabilistic Ontology Development== https://ceur-ws.org/Vol-1097/STIDS2013_T02_HaberlinEtAl.pdf
  A Reference Architecture for Probabilistic Ontology
                    Development

                  Richard J. Haberlin, Jr.                                                   Paulo C. G. da Costa
                      EMSolutions, Inc.                                                       Kathryn B. Laskey
                      Arlington, Virginia                                         Systems Engineering and Operations Research
                   rjhaberlin@comcast.net                                                  George Mason University
                                                                                                Fairfax, Virginia
                                                                                           pcosta, klaskey @gmu.edu


    Abstract - The use of ontologies is on the rise, as they facilitate   relationships, thereby reducing development time and project
interoperability and provide support for automation. Today,               risk. Further, it standardizes language among participants,
ontologies are popular for research in areas such as the Semantic         provides consistency of development within the domain,
Web, knowledge engineering, artificial intelligence and                   provides a reference for evaluation, and establishes
knowledge management. However, many real world problems in                specifications and patterns [1].
these disciplines are burdened by incomplete information and
other sources of uncertainty which traditional ontologies cannot          A. Background
represent. Therefore, a means to incorporate uncertainty is a                 Development of the RAPOD provides synergy of effort
necessity. Probabilistic ontologies extend current ontology               within the Semantic Technology (ST) community by
formalisms to provide support for representing and reasoning              identifying concepts, processes, languages, theories and tools
with uncertainty. Representation of uncertainty in real-world
                                                                          for designing and maintaining probabilistic ontologies.
problems requires probabilistic ontologies, which integrate the
                                                                          Presently, ontological engineering facilitates the development
inferential reasoning power of probabilistic representations with
the first-order expressivity of ontologies. This paper introduces a
                                                                          of explicit, logical and defensible ontologies for knowledge-
systematic approach to probabilistic ontology development                 sharing and reuse. A similar pragmatics in the form of the
through a reference architecture which captures the evolution of          Probabilistic Ontology Development Methodology has been
a traditional ontology into a probabilistic ontology                      produced for probabilistic ontologies and is described in [3].
implementation for real-world problems. The Reference                     The RAPOD facilitates synergy of effort between multiple
Architecture for Probabilistic Ontology Development catalogues            disciplines including probabilists, logicians, decision analysts
and defines the processes and artifacts necessary for the                 and computer scientists. It describes each of the components
development, implementation and evaluation of explicit, logical           required for a functional probabilistic ontology and their
and defensible probabilistic ontologies developed for knowledge-          interrelationships, and defines the criteria to be satisfied by any
sharing and reuse in a given domain.                                      set of selected tools and methods using a Unified Process-
                                                                          inspired methodology.
    Keywords—probabilistic ontology, knowledge engineering,
reference architecture                                                    B. Scope
                                                                              The RAPOD spans the knowledge, processes, models, and
                         I.   INTRODUCTION                                tools necessary for engineering probabilistic ontologies at a
    The Reference Architecture for Probabilistic Ontology                 high level of abstraction. Through decomposition or
Development (RAPOD) presents a compilation of components                  aggregation of existing methodologies, it provides universal
required for probabilistic ontology development and therefore             techniques and a generalized framework for the fundamental
facilitates design, implementation, and support processes                 components needed to construct probabilistic ontologies from
without rigid adherence to a particular set of tools. The                 conceptualization to operation through multiple tasks,
Department of Defense (DOD) defines a Reference                           including:
Architecture as:
                                                                              x     Model conceptualization and framing
          “…   an authoritative source of information about a
          specific subject area that guides and constrains the                x     Ontology development through elicitation and
          instantiations of multiple architectures and                              ontological learning
          solutions[1].”                                                      x     Probability incorporation through iterative
    Common throughout the literature on reference                                   decomposition
architectures is the idea of serving as a blueprint for architects           There are many participants involved in realizing an
to develop specific solution architectures within a defined               operational probabilistic ontology. The Stakeholder Decision
domain [1] [2]. As the blueprint, it serves as a template for             Maker (DM), Subject-Matter Expert (SME) and Probabilistic
software development, defining integral components and their



                                                      STIDS 2013 Proceedings Page 10
Ontology Developer coordinate to instantiate a collection of      an architecture that is used to develop the PO. Figure 1
concepts and tools for development and implementation from        provides an overview of the RAPOD, discussed in detail
existing and proposed ontological and probabilistic ontological   below.
engineering methodologies, providing a single collection of
knowledge to solve a domain-specific problem. Their solution
is defined as a domain-specific architecture that may be reused
for comparable problems in similar domain contexts.
C. Model Implementation and Viewpoint
    The concept behind the RAPOD is to establish intellectual
control of the probabilistic ontology (PO) model, stimulate
reuse, and provide a basis for development through
instantiation of a particular set of tools the developer will
utilize to design and implement complex probabilistic
ontologies for a particular domain [4]. Intellectual control
establishes common semantics and allows consistent
integration of new system components by anticipating their
inclusion from design. Reuse is a prime tenet of ontological
engineering and is enabled through identification of common
components and relationships. Further, a well-defined and
properly architected PO may be reused entirely through spiral                                 Figure 1
modification to incorporate additional knowledge or
relationships. Most importantly, the architecture serves as a         The Reference Architecture for Probabilistic Ontology
blueprint for the PO Developer and a clear mechanism between      Development shown in Figure 1 illustrates the scope of the
him and the Stakeholder Decision Maker. The architecture          reference architecture from abstract to concrete. At the top of
allows individuals, teams, and organizations to communicate       the illustration is the most abstract conceptualization defined as
objectives, requirements, constraints, components and             a problem or objective by the Stakeholder Decision Maker that
relationships with a common vocabulary and understanding of       requires implementation of a probabilistic ontology. For
the objective. Ontological engineering, and probabilistic         example, a military commander may be charged with creating
ontological development, may be completed by several              a decision support system that assists in the determination of an
different methodologies depending on the context and domain       opposing force given limited sensor information. A Naval
of the problem. Therefore, the RAPOD provides ready access        application example is given in [3]. The base of the illustration
to tools, techniques, and procedures that have proven             represents the operational implementation of the probabilistic
successful in the past. The RAPOD also exposes synergies in       ontology to provide inferential reasoning support. Between lies
algorithms, heuristics and model use between ontological and      the probabilistic ontology architecture, which translates the
probabilistic ontological engineering. Through careful            conceptualization into a blueprint for development. The
selection of tools with common parameters, the final model is     probabilistic ontology architecture is comprised of three
more intuitive. The viewpoint of this reference architecture is   interacting layers, which group and characterize similar
that of the Probabilistic Ontology Developer in support of a      functionality: the Input Layer, Methodology Layer, and
Stakeholder Decision Maker desiring decision support for a        Support Layer. These and their relationships are described in
defined area of interest.                                         the following subsections.
                                                                  A. Input Layer
     II.   REFERENCE ARCHITECTURE FOR PROBABILISTIC
                ONTOLOGY DEVELOPMENT                                  The Input Layer defines external influences on the
                                                                  probabilistic ontology and is referenced by components of the
    The Reference Architecture for Probabilistic Ontology         Methodology Layer. It contains those components expected to
Development facilitates PO development and reuse by               provide detail on the purpose of the PO and its bounding
providing a template from which multiple PO solutions to          constraints in the form of system requirements. Population of
similar problems may be constructed. The output of the            the Input Layer occurs primarily during the early stages of the
RAPOD is a domain and problem-type specific architecture          development process during which the Stakeholder Decision
that may be used to develop POs for similar problems.             Maker and PO Developer work closely to identify the objective
Reusable architectures provide a shortcut to future               of the model, expectations of its performance, and resource
development by identifying inputs, methodologies, and support     restrictions. Parameters specified in the Input Layer will
artifacts that have previously produced successful solutions      constrain the operational implementation.
within the domain.
                                                                    1) Objectives
   In each of its three layers, the RAPOD identifies processes        The objectives hierarchy contains a representation of
and artifacts necessary for the construction of a probabilistic   performance, cost and schedule attributes that determine the
ontology without specification to particular tools. Working       value of the system, with an over-arching Objective Statement
with the stakeholders, the PO Developer selects individual        that captures its primary intent [5]. Objectives state the overall
component solutions that suit the problem-type and domain.        intent of the project in short, clear, descriptive phrases. They
Specification of a set of tools for each component instantiates




                                                STIDS 2013 Proceedings Page 11
are defined by the Stakeholder DM to bound the scope of the          discriminatory, sensitive, and inclusive [9]. In all cases,
final product and set expectations. These are often described in     appropriate metrics depend on the system under development
the following form [6]:                                              and its ultimate purpose (objectives).
             To Action + Object + Qualifying phrase                  B. Methodology Layer
    For a probabilistic ontology model, applicable categories of         The Methodology Layer contains the heart of the
objectives may include: performance, reliability, compatibility,     probabilistic ontology development process including the
adaptability, and flexibility. Further descriptions of these and     Probabilistic Ontology Development Methodology that allows
other categories may be found in Armstrong [6]. Choosing the         creation of a specific probabilistic ontology implementation to
correct objectives ensures that the desired problem is solved        support the requirements of a Stakeholder Decision Maker. The
and that the PO Developer and Decision Maker have clearly            Methodology Layer references information gathered in the
communicated. The entire project is best focused through a           Input Layer and is assembled using components and tools from
Top-level Objective Statement.                                       the Support Layer. Its individual components are introduced
                                                                     below.
   2) Requirements
    Requirements define the system to be implemented in terms          1) Probabilistic Ontology Development Methodology
of its behaviors, applications, constraints, properties, and             The Probabilistic Ontology Development Methodology
attributes. The systems engineering literature on requirements       provides specific activities and tasks that evolve Stakeholder
elicitation and development is rich, but there is consensus that     Decision Maker requirements into an ontology that is
no single methodology exists for requirements engineering [7]        probabilistically-integrated, a probabilistic ontology. The
[8]. In general, requirements elicitation approaches may be          activities of the Probabilistic Ontology Development
categorized as structured or unstructured [8] using a                Methodology are shown in the below activity diagram (Figure
combination of strategies depending on the scope of the system       2) and further detailed in [3]. These activities fit well within
under development and the participation commitment of the            both Waterfall and Spiral Development Life Cycle processes
Stakeholder Decision Maker.                                          where in Spiral Development iteration is explicitly anticipated.
    Requirements are elicited from the Stakeholder Decision             Completion of the PODM activities and tasks establishes a
Maker and SMEs through an iterative process that generally           framed solution to a specific inferential reasoning problem
includes objective setting, background knowledge acquisition,        grounded in an inclusive ontology representing its entities and
knowledge organization, and requirements collection as               incorporating probability to represent uncertainty.
introduced by Kotonya and Sommerville [7]. Grady                        2) Ontological Engineering
categorizes three strategies for requirements analysis:                  In Gomez-Perez et al, ontological engineering is defined as
structured analysis, cloning, and freestyle [8]. Using one or        the activities that concern the ontology development process,
more of these strategies and concentrating on the four tasks         life cycle, construction methodologies and tools [10]. While
above will lead to identification of appropriate requirements to     traditional ontological engineering methods ensure that
satisfy valid model development. There is inefficiency and risk      ontologies are explicit, logical and defensible, these methods
involved in the unstructured methods as there is nothing to          provide insufficient support for the complexity of probabilistic
prevent duplicative work, incompleteness, conflicts and              ontology development, as discussed above. A systematic
misdirection.                                                        approach to PO development is needed that addresses the
   3) Metrics                                                        evolution of requirements into an ontology that is
    Metrics are used to describe parameters, Measures of             probabilistically integrated. The underlying ontology may be
Performance (MOP) and Measures of Effectiveness (MOE)                engineered by many methods; but ultimately each
that characterize the criteria against which the fielded system is
to be evaluated. Green defines a hierarchy of effectiveness
measures that follows the system of systems concept [9]. The
following definitions are adapted from those offered by Green
to accommodate the PO development process:
    Measures of Effectiveness. A measure of system
performance within its intended environment (e.g. overall
system effectiveness).
    Measures of Performance. A measure of one attribute of
system behavior derived from its parameters (e.g. probability
of correct identification).
    Parameters. Properties or characteristics whose values
determine system behavior (e.g. error rate).
   Armstrong [6] opines that useful metrics take quantifiable
form with both a clear definition of the measure and its
associated units. They must also be mission-oriented,
                                                                                                  Figure 2




                                                  STIDS 2013 Proceedings Page 12
methodology provides a structured means to produce                              will be developed into a probabilistic ontology. Buitelar et al.
ontologies from conceptualization to implementation. Some                       identified innovative aspects of ontology learning that set it
principal design criteria must always be considered: clarity,                   apart from traditional knowledge acquisition [15]:
coherence, extendibility, minimal encoding bias, and minimal
ontological commitment [11].                                                       x    It is inherently multidisciplinary due to its strong
                                                                                        connection with the Semantic Web, which has
   3) Ontology Reuse                                                                    attracted researchers from a very broad variety of
    There are two types of ontology reuse: re-engineering and                           disciplines:   knowledge      representation,  logic,
merging. Ontology re-engineering involves transforming the                              philosophy, databases, machine learning, natural
conceptual model of an implemented ontology into another                                language processing, image processing, etc.
conceptual model [10]. On the other hand, ontology merging
uses information captured about one or more domains of                             x    It is primarily concerned with knowledge acquisition
interest in the creation of a new ontology. Therefore, model                            from and for Web content and is moving away from
reuse is the process by which available knowledge and                                   small and homogeneous data collections.
conceptual models are used as input to generate new models, in                     x    It is rapidly adapting the rigorous evaluation methods
this case ontologies and probabilistic ontologies. Ontology                             that are central to most machine learning work.
development is a complex and labor-intensive task. The
potential for reuse is an identified strength of ontologies and                    Through application of ontological learning, both the
allows expansion of existing knowledge bases by capitalizing                    process of developing a probabilistic ontology and the
on previous research and development [10][11][12][13][14].                      development risk may be reduced.
The literature liberally addresses the concept of ontology reuse,                   Sowa defines three types of ontologies: a formal ontology
but there is little guidance offered for selection of methods for               which is a conceptualization whose categories are
merging and/or integration. Integration of similar tasks and the                distinguished by axioms and definitions and are stated in logic
addition of tasks emphasizing utility of existing ontologies                    to support inference and computation, a prototype-based
expand the basic process of ontological engineering to make                     ontology in which categories are formed by collecting
use of ever-expanding online ontology resources. Before                         instances extensionally, and a terminological ontology which
beginning construction of a new ontology, it is useful to                       describes concepts by labels and synonyms without axiomatic
research existing ontologies in related domains to be reused                    grounding [16]. Ontological learning in support of inferential
and/or extended for the current problem. The ST community is                    reasoning is concerned primarily with developing the latter two
actively expanding free access to the growing body of                           categories for the specified domain of interest. The various
ontological knowledge, as discussed below.                                      sources used for ontology elicitation may include databases,
   4) Heuristics and Algorithms                                                 documents, and taxonomies. As ontologies are typically
     Generally, a heuristic is an experience-based technique for                hierarchically arranged, the primary means for ontological
problem solving, learning, and discovery and an algorithm is a                  learning is through clustering. In this method, using a suitable
stepwise procedure for calculation of a problem solution.                       clustering algorithm, a semantic distance is measured between
Heuristics and algorithms are used to express relationships                     terms and the nearest terms are clustered and formed into a
between classes within ontologies and probabilistic ontologies                  prototype-based ontology. Ontological learning may also be
in  order  to  constrain  the  models.    For  example,  the  heuristic  “A     accomplished through pattern matching using a co-occurrence
weapon   is   cued   by   a   single   sensor”   gives   a   plain-language     matrix or bootstrapping from a seed lexicon that is extended by
description of a relationship in which each weapon is assigned                  measuring similarity.
a single sensor, but sensors may be assigned multiple weapons.                      The above methods are all primarily focused on learning
This plain language description captures the machine-readable                   ontologies from plain text corpuses. Recent work includes
cardinality   statement   of   ∞…1   in   a   format   understandable   by      extracting ontologies from non-text formats including
the entire development group, including the Stakeholder                         relational databases, structured knowledge bases, and the
Decision Maker and SMEs. Heuristics and algorithms are                          Semantic Web. Albarrak developed an extensible framework
captured as part of the PODM as described in [3].                               for generating ontologies from Relational Database (RDB) and
  5) Learning                                                                   Object-Relational Database (ORDB) data models [17]. Li et al.
    Currently, ontology development is a labor-intensive,                       introduce a novel set of 12 learning rules that build a complete
manual process. However, the need for greater automation                        OWL ontology of classes, properties, characteristics,
features has been recognized and is a focus of the ST                           cardinality and instances [18]. A database analyzer extracts key
community. The PODM has integration points primed for                           information from the relational database, which is then passed
future expansion in the areas of Ontological Learning and                       to an ontology generator containing the rules. It is also possible
Probabilistic Learning. These two functions assist the modeler                  to map ontologies through machine learning to transform
in ontology creation and elicitation of probabilities for the                   existing ontologies within the Semantic Web to a format
probabilistic relationships used for inferential reasoning.                     useable in the domain context for the current problem. Doan et
                                                                                al. have introduced the GLUE system to semi-automatically
     a) Ontological Learning                                                    create these semantic mappings using a multi-strategy learning
    Ontological learning is the process of extracting relevant                  approach based on the joint probability distribution of the
classes, properties and relationships from a given data set, in                 compared concepts [19] [20]. The concept is to produce a map
this case to reduce effort in development of an ontology which                  between the existing domain and the desired domain that




                                                          STIDS 2013 Proceedings Page 13
translates between taxonomies. Future research promises to           learning is performed by a greedy algorithm on the network
reduce the human interaction required for ontological                features [25].
engineering.
                                                                         Multi-Entity Bayesian Network (MEBN) learning also
     b) Probabilistic Learning                                       takes advantage of the structure associated with a relational
    Elicitation of conditional probabilities to populate             database. A key component is generation of a MEBN-RM
distribution tables remains a difficult endeavor, accomplished       model that specifies a mapping of MEBN elements to the
through SME interview and experimental data collection.              relational model of the database. MEBN parameter learning
Probabilistic learning seeks to reduce the effort involved in        estimates the parameters of the local distribution for a resident
establishing prior and conditional probabilities for domain          node of an MTheory, given the structure and the database using
entities by specifying a model using empirical data. Pearl           maximum likelihood estimation. MEBN structure learning
identified two tasks for probabilistic learning [21]:                organizes random variables into MFrags and identifies parent-
                                                                     child relationships between nodes, given the database. Any
    Extracting generic hypothesis evidence-relationships            Bayesian Network Structure search algorithm may be used
from records of experience, and                                      [26]. More recently, Park et al. has extended the MEBN
                                                                     learning algorithm to include both discrete and continuous
    Organizing the relationships in a data structure to
                                                                     random variables [27].
facilitate recall.
   Accuracy and consistency in the PO model could be                   6) Knowledge Base
improved by learning numerical parameters for a given                    The knowledge base is a historic collection of domain-
network topology from empirical data instead of relying on           specific knowledge contributed by domain SMEs and may
SME input. The literature contains numerous techniques for           include ontological information (classes, properties,
parameter learning; two commonly employed methods are:               characteristics, and relationships), logical constraints,
                                                                     heuristics, and probabilities. The breadth of knowledge stored
   Maximum Likelihood [22][23] – Parameters are estimated            within is unspecified. To distinguish the KB from evidence,
from a set of empirical data using a likelihood weighting            there is no temporal component associated with the knowledge
algorithm.                                                           base; information contained therein may not represent the
                                                                     current domain state. Marakas differentiates a database from a
    Bayesian Learning [22][23] – Prior knowledge about
                                                                     knowledge base in this fashion:
parameters is encoded and data is treated as evidence to reduce
the learning process to calculation of posterior distributions.       “…  a  collection  of  data  representing  facts  is  a  database.  The
    Learning is segregated into the categories of structure              collection  of  an  expert’s  set  of  facts  and  heuristics  is  a  
learning and parameter estimation [23][24]. In parameter                                     knowledge base [28].”
estimation, the dependency structure of the probabilistic              7) Ontology Structures
representation is known. The learning task is to define the              Ontologies, including probabilistic ontologies, provide a
parameters of the Local Probability Distributions (LPDs). The        means to represent knowledge and relationships between
goal of structure learning is to extract the structure of the        hierarchically organized classes of objects. Ontologies exist to
probabilistic representation from the dataset.                       enable knowledge sharing and reuse [11] [13]. As a set of
                                                                     definitions of formal vocabulary, ontologies allow knowledge
    Learning a Probabilistic Relational Model (PRM) requires         sharing among hierarchically organized entities. A probabilistic
input in the form of a relational schema that describes the set of   ontology addresses the inherent uncertainty involved in
classes, the attributes associated with the classes, and the         inferential reasoning applications with inconclusive evidence
relations between objects of classes for the domain. In the          by representing it probabilistically.
parameter estimation task, the structure is given, which defines
the parents for each attribute. The parameters that define the            a) Ontology
Conditional Probability Disributions (CPDs) for the structure            A working ontology captures the classes, properties, and
are learned using the likelihood function to determine the           the relationships of a domain of interest. Production of this
probability of the dataset given the model. Structure learning of    relational framework facilitates comprehension of the
a PRM is more complex and requires a method to find possible         hierarchical organization of domain entities; the relationships
structures and then score them. Getoor et al. describes the use      between and properties of domain entities; as well as causal
of a greedy local search procedure to produce a candidate            relationships among entities. When uncertainty about aspects
structure which is then scored using the prior probability of the    of the domain is important to the purpose for which the
structure and the probability of the dataset, given the structure    ontology is being developed, a probabilistic ontology is needed
[23].                                                                to represent the uncertainty.
    Recall that the structure of a Markov Logic Network                   b) Probabilistic Ontology
(MLN) includes a node for each variable and a potential                  A probabilistic ontology provides a means to represent and
function for each set of nodes that is pairwise linked. Parameter    reason with uncertainty by integrating the inferential reasoning
estimation for MLN is performed by computation of the                power of probabilistic languages with the first-order
Markov network weights that represent the clique potential           expressivity of ontologies. Few things are certain, and inferring
using an optimization of the likelihood function. Structure          in the presence of uncertainty allows the decision maker to




                                                  STIDS 2013 Proceedings Page 14
focus attention on the most relevant data through designed              x   Unified Process (UP) – UP is an iterative,
queries.                                                                    comprehensive development approach adapted to
                                                                            object oriented models, tools and techniques [29]. It
C. Support Layer
                                                                            was developed initially for software systems, but in
    The Support Layer provides the background technology                    recent years has been adapted to systems that include
and design strategy necessary to instantiate the                            hardware and business processes.
conceptualization of a specific probabilistic ontology to satisfy
identified requirements. It includes existing ontologies                IDEF0 is commonly associated with hardware systems and
available for reuse or re-engineering, software tools that enable   systems-of-systems, especially within the Department of
ontology     and     probabilistic    ontology     development,     Defense Architecture Framework (DODAF). Class hierarchies
mathematical languages that allow representation of entity          are fundamental to ontologies, and object oriented design is
attributes and their relationships, and databases of existing       focused on modeling class hierarchies.
facts referenced for learning and knowledge base population.             b) Object Relationship Representation
The purpose of the Support Layer is to facilitate probabilistic
                                                                        Object modeling languages are used to represent
ontology development by identifying technological and
                                                                    relationships at the system and object level of abstraction to
semantic features specific to a particular inferential reasoning
                                                                    enable clear, concise communication between Stakeholder
model. The four Support Layer components are discussed
                                                                    Decision Maker and the PO Developer. While the specific
below.
                                                                    choice of language is often left to the developer, object
  1) Existing Ontologies                                            relationships are frequently represented using languages such
    Model reuse is a strength of the ontological engineering        as:
discipline and effort should be made to research and
                                                                        x   Unified Modeling Language (UML) – UML is a
incorporate existing ontology material into new application
                                                                            graphical modeling language for the creation of
areas. This will reduce overall effort and promote commonality
                                                                            object-oriented models used primarily for software
among different products. Some suggested ontology
                                                                            engineering [29].
repositories are listed below.
                                                                        x   Systems Modeling Language (SysML) – SysML
  2) Modeling Languages
                                                                            extends UML language with semantic foundation for
    A modeling language is a graphical or textual                           representing requirements, behavior, structure, and
representation used to express knowledge, information,                      properties of systems and components [30] [31].
processes or systems with a consistent set of rules and syntax.
In the RAPOD, modeling languages serve three functions:                 There are many diagrams and representations appropriate
                                                                    to systems architecting available in both UML and SysML; the
   x    System Architecture Representation                          PO Developer should select and implement these tools to
   x    Object Relationship Representation                          maximize clear communications with the Stakeholder Decision
                                                                    Maker.
   x    Ontology (and Probabilistic Ontology) Representation
                                                                        c) Ontology Representation
    A probabilistic ontology is an extension of an ontology            Ontology languages allow developers to create explicit,
which incorporates uncertainty while respecting its relational      formal conceptualizations of domain models. The main
structure and domain specificity. The output of the RAPOD is        requirements of an ontology language identified by Antoniou
a unique instantiated architecture for development of a domain-     and Harmelen include [32]:
specific probabilistic ontology to meet an inferential reasoning
requirement. The architecture includes models from each of the          x   Well-defined syntax
above representation categories and may be reused for
development of new probabilistic ontologies in similar                  x   Well-defined semantics
domains. The following sections describe the purpose of these           x   Efficient reasoning support
representations.
                                                                        x   Sufficient expressive power
     a) System Architecture Representation
    An architecture is a conceptual design that defines the             x   Convenience of expression
structure and behavior of a system. There are two types of              Ontology languages are formal, declarative representations
representations commonly employed: traditional and object-          that allow compilation and organization of knowledge about a
oriented, represented here by IDEF0 and UP.                         domain in formal knowledge structures with clearly defined
    x    Icam Definition for Function Modeling (IDEF0) –            semantics. Further, they include reasoning rules to represent
         IDEF0 is a process modeling technique that focuses         relationships between knowledge classes. The literature
         on the functional model of a system. The model is          contains many different ontology languages, some of which are
         expressed as a set of diagrams, often called pages.        optimized for specific domains. Some of the more common
         IDEF0 has been applied to the development of               examples include [10]:
         information systems, business processes and hardware           x   Web Ontology Language (OWL) – Created by W3C,
         systems [5].                                                       derived from DAML+OIL and builds on RDF(S).




                                                 STIDS 2013 Proceedings Page 15
    x    Resource Description Framework (RDF) – Created by             x    Object Relationship Representation
         W3C as a semantic network based language to
         describe web resources.                                       x    Ontology (and Probabilistic Ontology) Representation

    x    Knowledge Interchange Format (KIF) (including                  A probabilistic ontology is an extension of an ontology
         OntoLingua) – Based on FOL with an underlying              which incorporates uncertainty while respecting its relational
         frame paradigm, overlaid by OntoLingua to simplify         structure and domain specificity. The output of the RAPOD is
         operator functionality.                                    a unique instantiated architecture for development of a domain-
                                                                    specific probabilistic ontology to meet an inferential reasoning
    x    DARPA Agent Markup Language + Ontology                     requirement. The architecture includes models from each of the
         Inference Layer (DAML+OIL) – Created by US and             above representation categories and may be reused for
         EU committee, an extension of RDF(S) with                  development of new probabilistic ontologies in similar
         datatypes and nominals. DAML+OIL has been                  domains. The following sections describe the purpose of these
         superseded by OWL.                                         representations.
    x    CycL – A declarative language used to represent the          3) Software Tools
         knowledge stored in the Cyc Knowledge Base [33].               Modeling tools represent the software implementation
                                                                    packages used for development and implementation of
    x    Common Logic (CL) – A FOL language for                     architectures, ontologies, and probabilistic ontologies in the
         knowledge interchange approved and published as an         chosen modeling language. With the appropriate modeling
         ISO standard for representation and interchange of         tools, the entire ontology life cycle may be managed, including
         information and data among disparate computer              design, implementation, enhancement, and support.
         systems [34].
                                                                        A number of tools are available to capture data and model
    x    Descriptive Ontology for Linguistic and Cognitive          the components of a probabilistic ontology. The PO Developer
         Engineering (DOLCE) – A FOL reference module of            selects software tools with the correct fidelity to represent
         the Wonderweb Project adopted as a starting point for      relevant viewpoints and provide the desired communication
         comparing and elucidating relationships between            and inferential reasoning representation. A combination of
         ontologies [35].                                           these tools gives the PO Developer flexibility in creating
    x    Basic Formal Ontology (BFO) – An upper-level               necessary views for communication, as well as operational
         ontological framework used in support of domain            ontology and probabilistic ontology models.
         ontologies developed for scientific research [36].              a) General Purpose Modeling Tools
    OWL has been selected by the World Wide Web                         Creation of a probabilistic ontology requires representation
Consortium (W3C) as the language of the Semantic Web and            of many abstractions of data, processes, and relationships, each
has therefore received broad attention in the research and          of which may be best represented in a different software
development communities. Further, OWL is the ontology               application. However, to the extent possible, a single, general-
language used by the UnBBayes software tool, allowing               purpose tool should be maximized to enhance readability and
evolution of an ontology to a probabilistic ontology without the    consistency. Tools such as Microsoft Visio and MagicDraw
need to recreate the classes, instances, and relationships in a     assist in visual representation to simplify complex concepts.
new tool. Recall that PR-OWL expresses MEBN in OWL [13].
                                                                         b) Ontology Engineering Software Tools
Of the above ontology languages, only OWL allows expression
of probabilistic information along with an ontology through the         Ontological engineering tools capture the classes,
PR-OWL extension.                                                   properties, and instances of ontology entities in a hierarchical
                                                                    structure. Further, they describe their relationships, domains
     d) Probabilistic Ontology Representation                       and ranges in a contextual environment. The most popular
    Probabilistic ontologies are used to comprehensively            ontological engineering tool is Protégé, currently in version
describe knowledge about a domain and the uncertainty               4.1.0 (build 239). Protégé also has the advantage of integration
embedded in that knowledge in a principled, structured and          with UnBBayes, which allows seamless implementation of
sharable way [13]. The probabilistic web ontology language          uncertainty to establish the probabilistic ontology.
(PR-OWL) and its successor (PR-OWL 2) provide a
                                                                         c) Probabilistic Ontology Engineering Software Tools
knowledge representation formalism with MEBN as the
underlying semantics. A MEBN represents knowledge about                 Few tools are able to model the complex integration of
attributes of entities and their relationships as a collection of   probability and ontologies. The most advanced is UnBBayes,
similar hypotheses organized into theories which satisfy            an open source product developed by University of Brasilia
consistency constraints ensuring a unique joint probability         and enhanced in collaboration with George Mason University.
distribution over the random variables of interest [37].            UnBBayes has a PR-OWL plug-in that ingests a Protégé
A modeling language is a graphical or textual representation        ontology and allows the developer to represent uncertainty
used to express knowledge, information, processes or systems        within its hierarchical structure through MEBN Fragments
with a consistent set of rules and syntax. In the RAPOD,            using the Probabilistic Web Ontology Language (PR-OWL 2).
modeling languages serve three functions:
   x    System Architecture Representation




                                                 STIDS 2013 Proceedings Page 16
                              III.   SUMMARY                                  [19] AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon
                                                                                   Halevy, "Ontology Matching: A Machine Learning
    Use of a reference architecture facilitates design,                            Approach," in Handbook on Ontologies. Berlin: Springer-
implementation, and reuse of a domain-specific probabilistic                       Verlag, 2009, pp. 385-404.
ontology construction process by specifying the logical choices               [20] Anhai Doan, Jayant Madhavan, Pedro Domingos, and Alon
of components to create a blueprint for a contextual solution.                     Halevy, "Ontology Matching: A Machine Learning
The instantiated architecture is available for reuse to solve like                 Approach," in Handbook on Ontologies in Information
problems in similar domains.                                                       Systems.: Springer, 2003, pp. 397-416.
                                                                              [21] Judea Pearl, Probabilistic Reasoning in Intelligent Systems:
                               REFERENCES                                          Networks of Plausible Inference. San Francisco: Morgan
                                                                                   Kaufmann, 1988.
 [1]  Office of the Assistance Secretary of Defense for Networks              [22] Adnan Darwiche, Modeling and Reasoning with Bayesian
      and Information Integration (OASD/NII), "Reference                           Networks. Cambridge: Cambridge Univeristy Press, 2009.
      Architecture Description," Arlington, 2010.
                                                                              [23] Lise Getoor, Nir Friedman, Daphne Koller, Avi Pfeffer, and
 [2] Heather Kreger, Vince Brunssen, Robert Sawyer, Ali                            Ben Taskar, "Probabilistic Relational Models," in Introduction
      Arsanjani, and Rob High. (2012, Jan) IBM Developer Works.                    to Statistical Relational Learning. Cambridge: The MIT Press,
      [Online].                                                                    2007, pp. 129-174.
      http://www.ibm.com/developerworks/webservices/library/ws-
                                                                              [24] James Cussens, "Logic-based Formalisms for Statistical
      soa-ref-arch/.
                                                                                   Relational Learning," in Introduction to Statistical Relational
 [3] Richard J. Haberlin, Probabilistic Ontology Reference                         Learning. Cambridge: MIT Press, 2007, ch. 9, pp. 269-290.
      Architecture and Design Methodology, PhD George Mason
                                                                              [25] Pedro Domingos and Matthew Richardson, "Markov Logic: A
      University, 2013.
                                                                                   Unifying Framework for Statistical Relational Learning," in
 [4] Philippe Kruchten, The Rational Unified Process: An                           Introduction to Statistical Relational Learning. Cambridge:
      Introduction. Upper Saddle River: Addison-Wesley, 2004.                      The MIT Press, 2007, pp. 339-371.
 [5] Dennis M. Buede, The Engineering Design of Systems:                      [26] Cheol Young Park, Kathryn B. Laskey, Paulo C.G. Costa, and
      Models and Methods. New York: John Wiley & Sons, 2000.                       Shou Matsumoto, "Multi-Entity Bayesian Networks Learning
 [6] James E. Armstrong, "Issue Formulation," in Handbook of                       for Hybrid Variables in Situation Awareness," in Proceedings
      Systems Engineering and Management. Hoboken: John Wiley                      of the 16th International Conference on Information Fusion
      & Sons, 2009, pp. 1027-1089.                                                 (submitted), Istanbul, 2013, pp. 1-8.
 [7] Gerald Kotonya and Ian Sommerville, Requirements                         [27] Cheol Young Park, Kathryn B. Laskey, Paulo C.G.N. Costa,
      Engineering Processes and Techniques. Chichester: John                       and Shou Matsumoto, "Multi-Entity Bayesian Networks
      Wiley & Sons, 1998.                                                          Learning in Predictive Situation Awareness," in Proceedings
 [8] Jeffrey O. Grady, System Requirements Analysis. New York:                     of the 18th International Command and Control Research and
      McGraw-Hill, Inc., 1993.                                                     Technology Symposium, Alexandria, 2013, pp. 1-19.
 [9] John M. Green, "Establishing System Measures of                          [28] George M. Marakas, Decision Support Systems in the 21st
      Effectiveness," in Proceedings of the 2nd Biennial National                  Century. Upper Saddle River: Prentice Hall, 2003.
      Forum on Weapon System Effectiveness, Laurel, 2001, pp. 1-5.            [29] John W. Satzinger, Robert B. Jackson, and Stephen D. Burd,
 [10] Asuncion Gomez-Perez, Fernandez-Lopez Mariano, and Oscar                     Systems Analysis and Design in a Changing World. Boston:
      Corcho, Ontological Engineering with Examples from the                       Course Technology, 2004.
      Areas of Knowledge Management, e-Commerce and the                       [30] Sanford Friedenthal, Alan Moore, and Rick Steiner, A
      Semantic Web. London: Springer-Verlag, 2010.                                 Practical Guide to SysML: The Systems Modeling Language.
 [11] Thomas R. Gruber, "Toward Principles for the Design of                       Amsterdam: Elsevier, 2008.
      Ontologies Used for Knowledge Sharing," International                   [31] Sanford Friedenthal, Alan Moore, and Rick Steiner, OMG
      Journal of Human-Computer Studies, pp. 907-928, 1995.                        Systems Modeling Language Tutorial.: Object Management
 [12] Michael   K.   Bergman,   “A   Brief   Survey of Ontology                    Group, 2008.
      Development           Methodologies,”       2011,      [Online].        [32] Grigoris Antoniou and Frank Van Harmelen, "Web Ontology
      http://www.mkbergman.com/906/a-brief-survey-of-ontology-                     Language: OWL," in Handbook on Ontologies in Information
      development-methodologies/                                                   Systems.: Springer-Verlag, 2003.
 [13] Paulo Cesar G. da Costa. Bayesian Semantics for the Semantic            [33] Cycorp. (2013, June) CycL: The Cyc Knowledge
      Web, PhD George Mason Univeristy, 2005. [Online].                            Representation                Language.               [Online].
      http://hdl.handle.net/1920/455 .                                             http://www.cyc.com/cyc/cycl .
 [14] Maria   C.   Keet,   “Dependencies   between   Ontology   Design        [34] International Standards Organization, "Information technology
      Parameters,”   International Journal of Metadata, Semantics                  - Common Logic (CL): a framework for a family of logic-
      and Ontologies, pp. 265-284, 2010.                                           based languages," International Standards Organization,
 [15] Paul Buitelaar and Bernardo Magnini, "Ontology Learning                      Standard ISO/IEC 24707:2007(E), 2007.
      from Text: An Overview," in Ontology Learning from Text:                [35] Institute of Cognitive Science and Technology Italian National
      Methods, Applications and Evaluation.: IOS Press, 2005, pp.                  Research Council. (2013, June) WonderWeb. [Online].
      3-12.                                                                        http://www.loa.istc.cnr.it/DOLCE.html .
 [16] John         Sowa.       (2001)         Ontology.      [Online].        [36] Institute for Formal Ontology and Medical Information
      http://www.jfsowa.com/ontology/ .                                            Science. (2013, March) BFO: Basic Formal Ontology.
 [17] Khalid Albarrak, An Extensible Framework for Generating                      [Online]. http://www.ifomis.org/bfo .
      Ontology from Various Data Models, May 2013, PhD                        [37] Paulo Cesar G. da Costa, K.C. Chang, Kathryn B. Laskey, and
      Dissertation.                                                                Rommel Novaes Carvalho, "A Multidisciplinary Approach to
 [18] Man Li, Xiao-Yong Du, and Shan Wang, "Learning Ontology                      High Level Fusion in Predictive Situational Awareness," in
      from Relational Database," in Proceedings of the 4th                         Proceedings of the 11th International Conference of the
      International Conference on Machine Learning and                             Society of Information Fusion, Seattle, 2009.
      Cybernetics, Guangzhou, 2005, pp. 3410-3415.




                                                             STIDS 2013 Proceedings Page 17