=Paper= {{Paper |id=None |storemode=property |title=Modest Use of Ontology Design Patterns in a Repository of Biomedical Ontologies |pdfUrl=https://ceur-ws.org/Vol-929/paper4.pdf |volume=Vol-929 |dblpUrl=https://dblp.org/rec/conf/semweb/MortensenHMN12 }} ==Modest Use of Ontology Design Patterns in a Repository of Biomedical Ontologies== https://ceur-ws.org/Vol-929/paper4.pdf
Modest Use of Ontology Design Patterns in a Repository
              of Biomedical Ontologies

             Jonathan M. Mortensen, Matthew Horridge, Mark A. Musen, and
                                    Natalya F. Noy

                      Stanford Center for Biomedical Informatics Research
                         Stanford University, Stanford CA 94305, USA



         Abstract. Ontology Design Patterns (ODPs) provide a means to capture best
         practice, to prevent modeling errors, and to encode formally common modeling
         situations for use during ontology development. Despite the popularity of ODPs
         and supposed positive effects from their use, there is scant empirical evidence
         of their level of adoption in real world ontologies or on their effectiveness.
         Knowing the goals of ODPs, they may assist in the development of large-scale
         biomedical ontologies. Before studying ODP effectiveness and applicability, we
         ask the following questions to understand better the landscape of ODP use: Are
         ODPs used in biomedical ontologies? Which patterns do the ontology developers
         use? In which ontologies? How frequently are patterns used? To answer these
         questions, we determined the adoption of ODPs from two popular ODP libraries
         among the ontologies in BioPortal, a large ontology repository that contains over
         300 biomedical ontologies. We encoded 68 ODPs from two online libraries in the
         Ontology Pre-Processor Language, and, using these encodings, determined ODP
         prevalence in BioPortal ontologies. We found modest use of ODPs, with 33%
         of the ontologies containing at least one pattern. Upper Level Ontology,
         Closure, and Value Partition were the three most commonly used
         patterns, occurring in 20%, 9%, and 6% of the BioPortal ontologies, respectively.
         The low prevalence of ODPs may be due to lack of proper tooling, lack of
         user knowledge of and education about them, the age of the ontologies in the
         repository, or the specificity of some ODPs. We noted that there is a tension
         between the high expressivity of many ODPs and the goal of maintaining low
         expressivity of some biomedical ontologies. Additional tooling is necessary to
         make ODPs more accessible to domain experts. Furthermore, we suggest that
         ODPs may be developed in a bottom-up fashion, much like software-design
         patterns. 1

         Keywords: OWL, biomedical ontologies, BioPortal, Ontology Design Pattern,
         Ontology Pre-Processor Language


1      Ontology Design Patterns
There is a large body of research establishing and creating Ontology Design Patterns
(ODPs) [11, 5]. Yet, there is little work to determine their use or effectiveness.
In biomedicine, the development and use of ontologies are growing rapidly. This
 1
     Accompanying online resources at http://www.stanford.edu/people/mortensen/odp
2

development process can be difficult and/or error prone. As such, ODPs would likely
assist with this development process. In this study, as initial work in evaluating the
effectiveness and applicability of ODPs in biomedical ontologies, we examine the
prevalence of ODPs in a large corpus of ontologies related to biomedicine.
1.1   ODPs and ODP libraries
Software Design Patterns emerged in the 1990s, capturing recurring software design
techniques seen in software [10]. Following a similar motivation, the Semantic Web
community developed ODPs to alleviate some of the complexities in developing
ontologies. ODPs, defined as “a modeling solution to solve a recurrent ontology design
problems” [11], capture best practice and common modeling situations. The developers
of ODPs suggest that by using the patterns, one can more easily avoid modeling errors,
improve ontology quality, maintainability, and reuse [3].
    ODPs have become quite popular recently, with multiple workshops held at ISWC,
including one during ISWC 2012. There are two online catalogs of ODPs, the
Manchester ODPs Public Catalog for bio-ontologies (MBOP) and OntologyDesignPat-
terns.org (ODP-Wiki) [9, 1]. These catalogs describe each pattern by the problem that it
solves, the proposed solution, and the formal representation by which to instantiate the
pattern. MBOP contains 17 patterns derived from its authors’ experience in modeling
ontologies in the biomedical domain and working with OWL-based ontologies in
general. ODP-Wiki is a crowd-sourced effort to create an ODP library. The website
owners ask for pattern submissions and then a committee reviews these submissions for
approval. The approved patterns are then noted as such online. As of this writing, the
committee has not approved any patterns but there are over 150 submissions.
    Most of the submissions on ODP-Wiki are “content” ODPs. However, the site cate-
gorizes many other different types of ODPs. ODP-Wiki includes “structural” (methods
to workaround for language expressivity limitations or define ontology shape/structure),
“content” (modeling solutions for a specific domain), “correspondence” (methods to re-
engineer an ontology to a different form or map an ontology to another), “reasoning”
(patterns that enable one to obtain desired reasoning results),“presentation” (good prac-
tices for readability and usability), and “lexico-syntactic” (mapping linguistic structures
to ontology entities) patterns—a categorization based on descriptions by Gamgemi and
colleagues [11]. MBOP categorizes patterns as “extension” (workarounds for language
expressivity limitations), “good practice” (good modeling practice) and “domain
modeling” (solutions specific to certain domains). The “structural” classification
encompasses the majority of the MBOP patterns. In this work, the structural and
content ODPs are most relevant. Structural patterns are either logical, adding logical
expressions not contained directly in the ontology language, or architectural, defining
the structure/hierarchy of the ontology itself. Content ODPs model a specific domain
situation, and are directly re-usable (i.e., they should be directly imported into an ontol-
ogy and used). We omit lexico-syntactic, presentation, reasoning, and correspondence
patterns from this work, as we cannot test for them using our framework.
    Accompanying the MBOP, the Manchester group also developed the Ontology Pre-
Processing Language (OPPL), both a language based on the Manchester syntax for
OWL, and a software library, which leverages the OWL-API [14]. OPPL provides a
way to manipulate ontologies, query for ODPs and instantiate them [16, 15, 2].
                                                                                          3

1.2     Biomedical Ontologies
In biomedicine, ontology use is rapidly increasing [7, 21]. For example, the National
Center for Biomedical Ontology’s BioPortal,2 a repository of biomedical ontologies,
contains over 300 ontologies and controlled terminologies as of this writing [18].
Biologists use biomedical ontologies to manage the large amount of data. Hospitals and
related entities use them in the process of recording information about clinical encoun-
ters, during clinical decision support, billing, and so on. Because biomedical ontologies
are often large and complex, developing them and ensuring that they conform to best
practices poses a formidable challenge. Even the widely used ontologies frequently
contain modeling errors. For instance, Rector and colleagues discovered modeling
issues in SNOMED CT, one of the most widely used biomedical ontologies [19].
Researchers have found modeling errors in the National Cancer Institute thesaurus [8].
ODPs may be especially important in assisting with the challenge of modeling the large
and complex biomedical domains while preventing errors. Before assessing the effect of
using ODPs on the biomedical ontology modeling process, we first find the prevalence
of ODPs in a large biomedical ontology corpus.
2      Methods
We quantified the use of ODPs from both MBOP and ODP-Wiki in BioPortal using
OPPL and the OWL API. We first encoded ODPs in OPPL and validated their
correctness (1) by using an expert opinion and (2) by comparing them to the examples
in the library that served as a gold standard. We then obtained the ontologies from
BioPortal, removing cases by use of predefined filtering criteria (See section 2.2). We
normalized the ontologies to remove any differences in how they were specified, and
then checked both the normalized and the original version for each encoded pattern,
first filtering out patterns that cannot be represented in the ontology because it lacks the
proper relations.
2.1     Pattern Selection
We used the following criteria to select the set of patterns for this study: The pattern
must be (1) detectable, (2) non-trivial (that is, not just a template), (3) positively
reviewed (if a review is available), and (4) available in a public catalog (in our case,
either MBOP or ODP-Wiki). We use these criteria for the following reasons:
 1. Using only detectable patterns may seem obvious; however, there are many patterns
    such as n-ary relations, or re-engineering patterns that cannot be detected
    without more information than just the ontology.
 2. A template style pattern may not require the presence of any particular elements.
    Thus, it would be trivially present even if the ontology contained no elements of
    the pattern.
 3. When available, we considered review information on ODP-Wiki. Poorly reviewed
    patterns may not yet be refined, making them difficult to encode, especially if they
    have a logical error.
 4. We chose only publicly available patterns, as it is a necessary condition for both
    reproducibility of this study and the expectation of pattern re-use.
 2
     http://bioportal.bioontology.org
4

Applying the criteria above to MBOP and ODP-Wiki, produced the following results:

    – From the 17 patterns in MBOP, we used 15. The remaining 2 were undetectable
    – From the 150 patterns in ODP-Wiki, we used 53. The remaining patterns were
      either optional or not positively reviewed.

      Thus, we selected 68 patterns of 167.
2.2    Ontology Selection
From the available ontologies in BioPortal, we selected those ontologies that were
publicly available, parseable, locatable (a file was easily obtainable), non-retired,
available as a single file, and available as either OWL or OBO format. Applying these
criteria to the 312 ontologies that were available in BioPortal as of January 2012,
resulted in a set of 256 ontologies.
2.3    Pattern Encoding
OPPL and the OWL API are open-source standard libraries available to work with
ontology design patterns and ontologies. We encoded the MBOP and ODP-Wiki
patterns with OPPL. Some patterns could not be encoded in OPPL. Those patterns
we encoded directly in Java using the OWL API. An example OPPL encoding of the
Value Partition pattern (a way to specify a set of disjoint qualities the describe a
concept) follows:

      ?v1:CLASS, ?v2:CLASS, ?param:CLASS
      SELECT
      ASSERTED ?param EquivalentTo ?v1 or ?v2,
      ASSERTED ?v1 DisjointWith ?v2
      BEGIN
      ADD ?v1 subClassOf Thing
      END;

     In order to reduce computational complexity, we pruned pattern–ontology pairs by
first checking whether the ontology contains the specific relationships between concepts
that a given ODP requires. An ontology without those relationships cannot have the
pattern as the catalog specifies it. Furthermore, for those patterns that could not occur
in any ontology from our selection, based on the required relationships, we did not
encode the pattern. In particular, many content patterns refer to specific relationships in
the ontology. For example, according to ODP-Wiki, the pattern Part Of requires the
relationship “isPartOf”. Thus, if an ontology does not have this relationship “isPartOf”,
we know that it will not have the pattern. When searching, we disregard the namespace
of any given pattern, in case the pattern simply uses a different namespace (i.e., we
only match on the URI fragment, not including the namespace). One might consider
searching with possible lexical variants of this relationship term to ensure one finds
occurrences which capture the intension of the specified relationship. However, the
point at which a given string no longer matches the initial string is not well defined.
Furthermore, content ODPs directly import a small module, thus the relation should not
vary across ontologies.
                                                                                            5

           Table 1. Transforms applied exhaustively to an ontology to normalize it.

                     Axiom                                      Transformation
prop min 1 C                                    prop some C
prop exactly n C                                prop min n C, prop max n C
prop value i                                    prop some i
Property in Anonymous Class                     Simplify Property (Removing inverses) and re-
                                                insert
C1 and (C2 and C3)                              C1 and C2 and C3
C1 or (C2 or C3)                                C1 or C2 or C3
C1 EquivalentTo C2                              C1 SubClassOf C2, C2 SubClassOf C1
C1 DisjointUnionOf C2 ... Cn                    DisjointClasses: C2 ... Cn, C1 EquivalentTo (C2
                                                ... Cn)
C1 or ... or Cn SubClassOf D1 and ... and Dn    C1 SubClassOf D1 ... Cn SubClassOf D1 ... C1
                                                SubClassOf Dn ... Cn SubClassOf Dn
DisjointClasses: C1 ... Cn                      Ci DisjointWith Cj for 1 <= i