=Paper= {{Paper |id=Vol-2931/ICBO_2019_paper_35 |storemode=property |title=Building a Shared Ontology Use Patterns Repository |pdfUrl=https://ceur-ws.org/Vol-2931/ICBO_2019_paper_35.pdf |volume=Vol-2931 |authors=Jonathan P. Bona,Joseph Utecht,Sarah Bost,Corey J. Hayes,Mathias Brochhausen |dblpUrl=https://dblp.org/rec/conf/icbo/BonaUBHB19 }} ==Building a Shared Ontology Use Patterns Repository== https://ceur-ws.org/Vol-2931/ICBO_2019_paper_35.pdf
                           Building a Shared Ontology Use Patterns Repository

      Jonathan P. Bona*1, Joseph Utecht1, Sarah Bost1, Corey J. Hayes1,2,3, Mathias Brochhausen1
           1
             Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
                  2
                    Department of Psychiatry, University of Arkansas for Medical Sciences, Little Rock, AR, USA
  3
    Center for Mental Healthcare and Outcomes Research, Central Arkansas Veterans Healthcare System, Little Rock, AR, USA
             jpbona@uams.edu, jutecht@uams.edu, sjbost@uams.edu, cjhayes@uams.edu, mbrochhausen@uams.edu




Abstract
This paper proposes and reports on the creation of a                      This approach is useful both as the basis for semantic
biomedical ontology use patterns repository. This work aims               representations used for newly collected/generated data that
to facilitate the curation, sharing, discovery, and use, of               can be instantiated automatically by ontology-based
information about how OBO and other biomedical                            software systems, including our efforts in Comparative
ontologies are used by informatics researchers and other                  Assessment Framework for Environments (CAFE) of
ontology users to transform biomedical instance data into                 Trauma Care (1), and in the Data Coordinating and
realist semantic representations. By encouraging sharing of               Operations Center (DCOC) for the IDEAS States Pediatric
information about how ontologies are actually used with                   Clinical Trials Network. It is useful as well for enhancing
instance data this resource will reduce the total effort by               and integrating pre-existing data, or other data whose
ontology users to design and implement representations for                ongoing generation beyond the control of ontologists, for
use with their data. We believe this will ultimately result in            instance in our work on the Platform for Imaging in
more widespread use of higher-quality representations, and                Precision Medicine (PRISM) initiative (2) and related
in improved semantic interoperability of data. Our                        ongoing projects.
repository proof of concept implementation is available as a              In collaborations of multidisciplinary teams, especially with
web application built and organized using semantic web                    collaborators who are not accustomed to using ontology-
technologies.                                                             driven knowledge representation strategies, we sometimes
Keywords:                                                                 encounter the initial expectation that in order to build
                                                                          semantic representations for instance data, it is necessary
Ontology reuse, semantic interoperability, biomedical data
                                                                          only to select the single ontology term that best matches the
                                                                          meaning for each column in a spreadsheet of clinical data,
Introduction
                                                                          for instance. In fact, such one-to-one mappings are rarely
                                                                          possible or desirable given the complexity of the world
In our biomedical knowledge representation work we often                  (which these data are supposed to be about) and the
need to transform instance data that originates as entries in             corresponding complexity of realist ontologies that have
tabular files or other representations into semantically                  been carefully designed to represent the relevant portions of
enhanced knowledge graphs by instantiating ontology                       this reality.
classes as RDF individuals, along with assertions about the
relations between these individuals, in a triple store                    For example, a positive hpv diagnosis might appear as a
database. This transformation greatly enhances the                        plus sign, or as the value ‘true’ or similar, in a column
usefulness of the underlying information by making its                    labeled ‘hpv status’ or similar in a table of clinical and other
meaning explicit, and by making it available for querying                 non-image data uploaded with a collection of head and neck
and reasoning. The benefits of this approach are well-known               cancer images in a cancer image archive (2). Our approach
in the biomedical ontologies community: by using                          to representing the information that this particular patient
axiomatically-rich realist ontologies that share a common                 has been diagnosed with HPV involves generating and
upper level based on a shared theory of reality, we can                   asserting instances of classes from several different OBO
generate consistent representations for data that are trivially           Foundry Ontologies, and the relations among those
interoperable, and include machine-accessible semantics                   instances, applying an instantiation pattern for each record,
that allow semantic web reasoners and related tools to infer              for instance the pattern shown in Fig 1. This pattern
new information based on the represented instances.                       represents the human being (who is infected with HPV), the


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
instance of HPV disease that inheres in that person, as well          project collaborators. Similarly, the program that realizes
as the diagnosis itself and the planned processes that                the RDF instantiation process is also not usually published
produced it. Note that this example does not deal with                or shared in a discoverable way, and in the best case
mistaken or retracted diagnoses, though the representation            scenario ends up in a code repository with some
used in this could be used as part of a referent tracking (3)         documentation that the project participants know how to
approach to handling such complications.                              access, but that is not easily discovered or used in other
                                                                      efforts where it is relevant.
                                                                      One obvious issue with this practice is the duplication of
                                                                      effort that results when two users of OBO ontologies
                                                                      unwittingly work to independently represent the same, or
                                                                      very similar, phenomena. Even within a single group
                                                                      working on multiple projects, we have found it useful to
                                                                      have a space to share these ontology instantiation patterns.
                                                                      These reusable patterns consist of instances linked using
                                                                      rdf:type to the ontology classes that they instantiate, and
                                                                      with relations among them necessary to represent the
                                                                      phenomena that the patterns are about.
                                                                      To the extent that there is one clear and correct way of
                                                                      representing instance data about a phenomenon, the
                                                                      biomedical ontologies community will benefit from an open
 Fig. 1. Representing an instance of HPV and its diagnosis            repository of representational patterns for instance data that
                                                                      supports their publication, discussion, and reuse.
There are many possible workflows for designing such                  In other cases there may even be multiple possible patterns
representation patterns and applying them to instance data.           that can seem equally correct, especially where domain
In our work, we usually begin by sketching such patterns              ontologies inadvertently overlap, or where the ontology
visually using a drawing program (or even a whiteboard or             terms used do not have definitions that completely constrain
piece of paper), with software tools such as Protégé and              how they should be used. This will often be the case, as
Ontobee on hand to support discovering and exploring                  domain ontology developers cannot be expected to predict
ontology terms. Once a representation pattern has been                exactly what their ontologies will be used to represent.
sketched out, it is manually translated into a format usable          One example we have encountered in our work concerns
by an executable computer program that can interpret the              how to represent just a few of the entities involved in
target instance data and instantiate RDF that matches the             prescribing a drug to a patient. We have identified several
representation pattern. This program is then run with the             possible representations that all seem to be permissible and
data as input, generating the semantically-enhanced                   reasonable uses of the terms involved to represent this
representations suitable for use in a triple store.                   phenomenon. These possible patterns are shown here in Fig
Except in the small percentage of cases where this graphical          2. The first (a) has the patient (an instance of ‘Homo
depiction of the pattern is then used as an example figure in         sapiens’ bearing the ‘patient role’) as a direct participant in
a publication, it is usually not shared outside the project, or       the drug prescribing process. The second (b) has as its
even necessarily put in a shared space accessible to all              participant some instance of ‘patient role,’ with a path to the




                   Fig. 2. Several possible patterns to represent an instance of prescribing a drug to a patient.
actual patient through their ‘bearer of’ relation to that role.
While some ontology developers may assume that the
participation relation only holds between occurrents and
their independent continuants, this constraint is not specified
in the definition of the ‘has participant’ object property. The
third (c) links the drug prescribing process to the individual
‘patient role’ via the ‘realizes’ relation, which is arguably a
more suitable and more informative relation to connect
occurrents and dependent continuants that are realizable
entities, and again links to the actual person involved
through their bearing that particular role. A fourth option (d)       Fig. 3. A pattern used to generate instance data about
connects the process to the role and person (patient) only              trauma program organization in the CAFE project
indirectly through the output of the process (a ‘drug
prescription’) being about the person. While this is a correct     the question "Does your institution have a trauma program
use of ‘is about’ (4), note that ‘is about’ is a fairly general    manager?" would result in an instantiation of the triples in
relation to use here, as a prescription is certainly about         Fig 3., which shows that a human who is a member of the
several different things, including the patient who has            user’s organization is the bearer of a trauma program
received the prescription. Note also that (d) can be               manager role. The CAFE project has many general
combined with the various approaches reflected in (a)-(c),         instantiation patterns ranging from the simple example
resulting in even more potential representation patterns that      above to more complex representations. However, these
an ontology user might decide to implement.                        patterns exist only in the project’s internal database with no
                                                                   easy way to reuse or share them.
Related Work

The general approach of this work is related to, but still         Methods
quite distinct from, the basic idea of ontology design
patterns (ODPs) that has been put forward by Gangemi and           To address this need we have implemented an ontology
Presutti (3). While the aim of this paper is to provide a          usage patterns repository as a web application built using a
pattern-based solution to the potential of different               standard Python web framework combined with semantic
instantiation-based representations using the same                 web technology. This system provides a simple web
ontologies in managing RDF data, the goal of ODPs is to            interface that allows the user to search, browse, view, and
support ontology design by providing design patterns to            download information about, ontology usage patterns. These
guide the representation of a domain in a mainly class- and        pattern specifications include downloadable/reusable RDF
object-driven manner. Recently, ODPs have become a new             representations, figures, textual descriptions, and other
focus of interest in research regarding automatization of          information about the pattern itself.
ODP creation to facilitate sharing and integration of existing
ontologies (5,6).                                                  Implementation
A good example of this is Gangemi and Presutti’s agent-role        Our repository application is implemented in Python 3.6
pattern that aims to create a standard way to represent an         using the Flask web framework (8) and semantic web tools,
agent and link it to a role (5), e.g. “John Doe” and “student      including rdflib and other Python libraries. The user
role.” An example of this pattern can be found here:               interface consists of HTML forms and pages rendered by
http://www.ontologydesignpatterns.org/cp/owl/agentrole.ow          Flask, with simple styling in CSS. Some of the more
l. This patterns consists of 4 classes (in addition to             complex planned extensions to this work discussed more in
owl:Thing) {Concept, Object, Role, Agent} and 4 object             our Future Work section below will require the use of
properties {classifies, is classified by, is role of, has role}.   Javascript for more complicated interactions.
Notably, the pattern does not provide an instance-oriented
view. While this makes perfect sense for supporting                The only database underlying this application is a triple
ontology development, patterns that provide insight for            store, which is used to persistently store all information
using ontologies to manage RDF data will necessarily               needed to operate and organize the repository, as well as
include instantiations. The development of ODPs have led           user-added information such as pattern definitions, contents,
to the implementation of a semantic web portal for sharing         and metadata. We are currently using as a triple store a
and discussing ontology design patterns (7).                       version of Ontotext GraphDB (9), which is a proprietary
                                                                   commercial triple store system available for use free of
Our own CAFE project (1), which includes the aim to                charge. However, this application will work with any
represent the organizational structures of trauma centers and      system that supports SPARQL queries. The application uses
trauma systems in RDF, has an internal collection of RDF           the SPARQLWrapper Python library to query our GraphDB
instantiation patterns. These roughly 150 representations          instance via its endpoint interface.
were created to model the organizations described by
answers to survey questions. For example, a "yes" answer to
Application Ontology                                                    Name: ontology use pattern specification
The information required to operate this repository and                 URI:
represent ontology usage patterns in a triple store back-end            http://purl.org/ontology-use-
is encoded using just RDF/OWL and a handful of classes                  patterns#OUP_000001
and properties from standard semantic web resources and                 Superclass: IAO         ‘directive   information      entity’
OBO ontologies, such as the Information Artifact Ontology               (IAO_0000033)
(IAO). We define only a single new class, ‘ontology use
pattern specification’ to represent the information needed to           Definition: A directive information content entity that
operate this repository, so technically this application is             specifies a specific RDF representation for instance data
backed by a very small application ontology. That class is a            of a particular sort using terms from pre-existing
subclass of IAO: ‘directive information entity’, as shown in            ontologies. This specification may include figures and
Table 1.                                                                metadata associated with the pattern in addition to RDF
                                                                        triples.
In this ontology we use the persistent URL
http://purl.org/ontology-use-patterns# as
                                                                                Table 1: ontology use pattern specification
a prefix for its identifiers, including for the ‘ontology use
pattern specification’ class, any future classes or properties         yet exist. In such a case, it is clearly better not to have a
that are added to achieve additional functionality, and                database that contains unconditioned assertions about those
named individuals repository triple store, for instance those          entities when they do not exist.
instances of ‘ontology use pattern specification’ used to
represent each individual pattern.                                     Leaving aside things that do not yet exist and other
                                                                       hypothetical entities, it is also clearly better not to have
Triple store & Named graph                                             assertions to the effect that a particular instance of homo
The RDF triples defining each pattern added to the                     sapiens exists, that a particular instance of some disease
repository are stored within the triple store in a dedicated           exists, that the disease instance inheres in the human being,
named graph (10) used only for that pattern. Named graphs              and so on, with instances for those individuals sitting in this
are useful for combining information in a single triple store          repository, because this repository is not intended to store
while maintaining some separation based on its origin, for             assertions about patients and their diseases. Even more
example to assemble genomics data from different sources               practically, the system does not seek to constrain how users
and manage that data along with information about its                  may choose to “name” the RDF blank node individuals that
provenance in a single database (11).                                  appear in their patterns (e.g. _:person1), though it is
                                                                       recommended to use names that hint at the type of
By using a separate named graph for the contents of each               individual indicated (but to never rely on this hint in place
pattern, we prevent our triple store from containing                   of actual type assertions). By separating out each pattern
assertions that could be interpreted as making claims about            into its own named graph, we avoid the possibility of
the world that may be false, unverifiable, or non-referring. It        conflicts, for instance, assertions across patterns that appear
should be possible, and is often desirable, to design and              to be about the same individuals. Keeping names separate is
specify a representation pattern for future use that would             also made easier by allowing GraphDB to generate unique
contain falsehoods if it were instantiated now. An example             symbols to name individuals. Because these generated
is a pattern designed for use with some instance data that             symbols are long and unwieldy for users to deal with, we
will in the future be part of a data collection that does not


          PREFIX oup: 
          oup:pattern_000001 rdf:type oup:OUP_000001 .
          oup:pattern_000001 rdfs:label "An example pattern" .
          _:fig1 rdf:type  .
          _:fig1 rdfs:label "example_figure1.svg" .
          _:fig1  oup:pattern_000001 .

 Listing 1: example triples defining an individual ‘ontology use pattern specification,’ and specifying the figure it has as its part

          # a person
          _:person1 rdf:type  .
          # some HPV inhering in the person
          _:hpv1 rdf:type  .
          _:hpv1  _:person1 .

                        Listing 2: RDF triples defining part of an ontology use pattern for HPV diagnoses
truncate them for display purposes within the system.              unique blank node identifiers that contain the user’s original
                                                                   identifier as a suffix (e.g. _:person1 becomes
In addition to providing a tidy way to keep separate the
                                                                   _:genid-
RDF definitions of patterns in our repository, using named
                                                                   bc43f3ab4ec54362af7ed97c9dddcf44-
graphs for each pattern also allows us to specify additional
                                                                   person1). When displaying a pattern for view or
information about the pattern, including textual descriptions
that explain the intended use, figures that show the pattern       download, the patterns repository interface strips out the
rendered in visual format, and additional “metadata” such as       generated part of the identifier. As currently implemented
the creator, the license for use, etc. This is achieved by         this feature does rely on GraphDB’s unique way of handling
using as the name for each named graph a URI that is               blank nodes internally, and would not generalize to other
asserted to be an instance of the ‘ontology usage pattern          triple store implementations. Future versions of our tool will
specification’ class described above, and inserting the triples    implement a more general approach for assigning
that define the RDF pattern within that named graph.               meaningful names to variables in patterns, for instance by
                                                                   using an annotation property.
Assertions about a pattern instance are made within the
main graph of the triple store (that is, outside of any named      Currently adding a pattern definition to the repository
graph). For example, the RDF triples in Listing 1 below are        through our application involves producing a representation
used to store the information that the individual                  of the pattern as a set of RDF triples expressed in text
oup:pattern_000001 is an instance of ‘ontology use                 formatted as N-Triples (12). Possible future work includes a
                                                                   more user-friendly interface for creating such patterns, as
pattern specification,’ that there is an individual (here
                                                                   discussed more below.
identified by the blank node _:fig1) that is an instance of
IAO: ‘figure’ (IAO_0000308), that this figure has the              Fig. 4 illustrates the use of named graphs to represent
filename ‘example_figure1.svg’, and that the figure                instances of ‘ontology use pattern specification’, RDF triple
is a part of the pattern specification.                            patterns within the named graphs, and information asserted
                                                                   about the patterns (descriptions, and other parts, such as
Pattern definition triples                                         figures) in the triple store.
The most crucial piece of a pattern is the set of RDF triples
that define the pattern itself. As mentioned above, the            Results
system expects blank node identifiers (e.g. _:person1) to
be used for the instances in these patterns. For example, the      We have created an initial implementation of the ontology
following shows triples (part of the pattern shown in Fig. 1)      use patterns repository proposed and described above. This
representing a person and an instance of HPV that inheres in       repository is available at
that person.                                                       http://purl.org/ontology-use-patterns.
When a pattern definition is inserted into the database,           It is currently populated with several ontology usage pat-
GraphDB replaces its blank node identifier with generated          terns used in projects within our group.




        Figure 4: Ontology use pattern specification instances represented as named graphs containing pattern RDF triples
The repository system implements a core set of features,           Another planned feature is tooling to support copying and
including the ability to capture and store information about       editing an existing pattern from within the system itself.
ontology use patterns within a semantic database. This             We are also considering adding a pattern-based search
information about patterns includes:                               interface that would allow for more complex queries than
● RDF definitions of the pattern specifications themselves         the current text-based interface supports.
● Pattern specification names and descriptions and other           In addition to this search capability we also plan a more
     metadata                                                      term-based navigation option that will allow users to
● Figures depicting the patterns                                   explore available patterns based on which terms appear in
                                                                   them. In the simplest case, this would involve renderug a
The system also provides the ability to search for and view        page for each ontology term that is used in any pattern
patterns based on their names and descriptions. We will            within the repository that links to those patterns that use it.
shortly add the option to also search based ontology terms         In a pattern repository actively populated and used by the
that are used within the patterns, including text-based search     biomedical ontologies community, such a term landing page
for patterns over term labels and other annotations used in        could provide useful information about how, and how often
the patterns. This allows the user to identify patterns that use
terms of interest by inputting a search string that the system
then uses to identify all terms that appear across the entire      Acknowledgements
database whose labels contain the string, determining which
named graphs contain triples using any of those terms, and         Work on this project has been funded in part with federal
return the list of those pattern specifications for the user to    funds from the National Cancer Institute, National Institutes
browse.                                                            of Health under Contract No. HHSN261200800001E. The
                                                                   content of this publication does not necessarily reflect the
Once the user has found and loaded a pattern of interest, the
                                                                   views or policies of the Department of Health and Human
system displays the pattern’s details in a single page that        Services, nor does mention of trade names, commercial
includes the name, description, and metadata about the             products, or organizations imply endorsement by the U.S.
pattern; a linked rendering of the pattern’s RDF triples           Government. Under this contract the University of Arkansas
representation with ontology term identifiers appearing as         is funded by Leidos Biomedical Research subcontract
clickable links that resolve to the terms themselves via
                                                                   16X011. Funding was also provided by U24CA215109
Ontobee; a listing of labels for the terms in the pattern; one
or more figures depicting the pattern visually; and a link to
                                                                   The CAFE project described in this paper is funded by the
download the pattern as an RDF file in the N-Triples format.       National Institute of General Medical Sciences of the Na-
                                                                   tional Institutes of Health under award number
Discussion & Future Work                                           R01GM111324.

This paper has proposed and presented a solution for the           The DCOC work described in this paper was funded by
problem of sharing and reusing information about how               grant number U24OD024957 from the National Institutes of
ontology terms are typically combined into patterns used to        Health Office of the Director through the ECHO program.
instantiate instance data: a repository of ontology use
patterns that allows users to create, view, and reuse these        The work on representing prescription data described in this
patterns and their descriptions. We have implemented an            paper was supported by the Translational Research Institute
initial release of such a tool and populated it with a diverse     (TRI), grant UL1TR000039 through the National Center for
set of example patterns related to our work. Development is        Advancing Translational Sciences of the National Institutes
ongoing to add new features and other improvements.                of Health (NIH). The content is solely the responsibility of
                                                                   the authors and does not necessarily represent the official
One planned feature is a diagramming tool for creating RDF         views of the NIH. This work was also supported in part by
instance diagrams with a consistent style like that used in        BJA-2018-13607: CATEGORY 5 HAROLD ROGERS
many of the figures in this document, possibly following           PRESCRIPTION DRUG MONITORING PROGRAM
conventions established by VOWL (13). This tool will then          (PDMP)             IMPLEMENTATION                     AND
automatically generate the RDF definition of the pattern           ENHANCEMENT PROJECTS
based on the user’s interaction with the diagramming tool.
This will allow users to create ontology use patterns within       References
the repository in a single step without going to the separate
effort of sketching a diagram and manually creating the            1.       Utecht J, Judkins J, Otte JN, Colvin T, Rogers N,
RDF pattern, as is currently required. It will also help to        Rose R, et al. OOSTT: a Resource for Analyzing the Organ-
ensure the consistency of the main pattern figures used in         izational Structures of Trauma Centers and Trauma Sys-
the repository, as well as the ease of interpreting them. The      tems. CEUR Workshop Proc [Internet]. 2016 Aug [cited
CAFE project already includes a tool for generating                2019        Apr       17];1747.       Available      from:
diagrams from its internal instantiation patterns.                 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5312685/
2.        Jonathan P. Bona, Tracy S. Nolan. Ontology-
Enhanced Representations of Non-image Data in The Can-
cer Imaging Archive. In: Proceedings of the International
Conference on Biological Ontology 2018. CEUR-WS.org;
2018.
3.        Ceusters W, Smith B. Strategies for referent track-
ing in electronic health records. Journal of Biomedical In-
formatics. 2006 Jun 1;39(3):362–78.
4.        Ceusters W, Smith B. Aboutness: Towards Foun-
dations for the Information Artifact Ontology. In: Proceed-
ings of the Sixth International Conference on Biomedical
Ontology (ICBO). CEUR vol. 1515; 2015. p. 1–5.
5.        Gangemi A, Presutti V. Ontology Design Patterns.
In: Staab S, Studer R, editors. Handbook on Ontologies [In-
ternet]. Berlin, Heidelberg: Springer Berlin Heidelberg;
2009 [cited 2019 Apr 15]. p. 221–43. (International Hand-
books on Information Systems). Available from:
https://doi.org/10.1007/978-3-540-92673-3_10
6.        Ławrynowicz A, Potoniec J, Robaczyk M, Tudor-
ache T. Discovery of Emerging Design Patterns in Ontolo-
gies Using Tree Mining. Semant Web. 2018;9(4):517–44.
7.        Daga            E,            Presutti           V.
http://ontologydesignpatterns.org [ODP]. Proceedings of the
Poster and Demonstration Session at the 7th International
Semantic Web Conference (ISWC2008). 2008;401:2.
8.        Grinberg M. Flask Web Development: Developing
Web Applications with Python. Sebastopol, CA: O’Reilly
Media; 2014. 258 p.
9.        Ontotext GraphDBTM - a Semantic Graph Database
Free Download [Internet]. Ontotext. [cited 2019 Apr 17].
Available                                              from:
https://www.ontotext.com/products/graphdb/
10.       Gandon F, Corby O. Name That Graph [Internet].
W3C. 2009 [cited 2019 Apr 16]. Available from:
https://www.w3.org/2009/12/rdf-ws/papers/ws06/
11.       Zhao J, Miles A, Klyne G, Shotton D. Linked data
and provenance in biological data webs. Brief Bioinform.
2009 Mar 1;10(2):139–52.
12.       RDF 1.1 N-Triples [Internet]. [cited 2019 Apr 17].
Available from: https://www.w3.org/TR/n-triples/
13.       Visualizing    Ontologies      with    VOWL       |
www.semantic-web-journal.net [Internet]. [cited 2019 Apr
17].      Available     from:     http://www.semantic-web-
journal.net/content/visualizing-ontologies-vowl-0