=Paper= {{Paper |id=Vol-2285/ICBO_2018_paper_15 |storemode=property |title=Quality Assurance of Ontology Content Reuse |pdfUrl=https://ceur-ws.org/Vol-2285/ICBO_2018_paper_15.pdf |volume=Vol-2285 |authors=Michael Halper,Christopher Ochs,Yehoshua Perl,Sivaram Arabandi,Mark A. Musen |dblpUrl=https://dblp.org/rec/conf/icbo/HalperOPAM18 }} ==Quality Assurance of Ontology Content Reuse== https://ceur-ws.org/Vol-2285/ICBO_2018_paper_15.pdf
       Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                      1




          Quality Assurance of Ontology Content Reuse

                Michael Halper1, Christopher Ochs2, Yehoshua Perl1, Sivaram Arabandi3, Mark A. Musen4

            1                                 2                                        3                              4
           NJIT                              Nokia Bell Labs                          Ontopro LLC                      Stanford University
  Newark, NJ 07102 USA                 Murray Hill, NJ 07974 USA                 Houston, TX 77025 USA              Stanford, CA 94305 USA
     {michael.halper,                   christopher.ochs@nokia-                 sivaram.arabandi@gmail.com            musen@stanford.edu
  yehoshua.perl}@njit.edu                     bell-labs.com



    Abstract—Building ontologies is difficult and time-consuming.              the Cancer Chemoprevention Ontology (CanCo) [11]. These
As such, content reuse has been promoted as an important guiding               ontologies all reused content from other ontologies (e.g., BFO),
principle in ontology development. Reusing content from other                  and in the context of these ontologies, some of the QA
ontologies can reduce the overall effort involved in new ontology              problems we encountered related to content reuse.
construction and provide better alignment with existing knowledge
modeling. However, reuse is not a panacea, and it comes with its                   In this paper, we focus strictly on such ontology QA
own attendant difficulties. In this paper, we investigate some common          problems and investigate a broader collection of ontologies that
quality assurance issues associated with reuse, such as duplicated             reuse content. The main purpose for this study is to alert
content and versioning problems. Some heuristic-based approaches               curators and authors, especially those new to the process, of the
are proposed for analyzing ontologies for these kinds of quality               pitfalls of reuse in terms of the errors that they are likely to
assurance issues. An analysis is carried out on a sample of the large          encounter. This awareness will help in avoiding the errors in
collection of BioPortal-hosted ontologies, many of which employ                the first place and enhancing the content of their own
reuse. The findings indicate that curators and authors, particularly           ontologies. Let us note that ontology errors can come in a wide
those new to the reuse process, should be on the alert when                    range of severity and causes, such as with unsatisfiability,
developing an ontology with reused content to avoid introducing                incoherence, and inconsistency of concepts. Even so, we will
problems into their own ontologies.                                            use the term “error” throughout this paper, though one may
                                                                               argue in certain circumstances whether an irregular modeling
   Keywords—ontology; modeling; ontology reuse; ontology quality               issue truly warrants that designation.
assurance; BioPortal
                                                                                   Moreover, let us state at the outset that ontology
                      I. INTRODUCTION                                          development is intrinsically difficult, and the findings that we
                                                                               present are in no way meant as indictments of anyone’s work.
    Ontology reuse is a well-established design pattern. An                    In fact, some of the errors reported arose from the work of one
ontology author may reuse content to save on development                       of the co-authors (SA), who took great care in the construction
time and effort, promote interoperability with other ontologies,               of the SDO. Ontology developers have the best intentions to do
and ensure that a consistent representation of a domain is                     a good job and take great pains to review their work. Even with
included in their ontology. Support for importing and reusing                  that being the case, the inherent complexity of ontology design
ontology content is included in the Web Ontology Language                      and the reuse of content makes the appearance of errors almost
(OWL) (through the use of owl:imports axioms) [1], and the                     inescapable. It is our intention to alert ontology maintenance
paradigm is supported by the Protégé ontology editing                          personnel to this fact through the results of our study.
environment [2]. Top-level ontologies such as the Basic                        Additionally, we are not criticizing reuse in ontology design,
Formal Ontology (BFO) [3] were designed specifically to                        with its numerous advantages. We just wish to caution
support content reuse and alignment of ontologies. Top-domain                  ontology designers to be careful about the potential
ontologies, like the Ontology for General Medical Sciences                     disadvantages and pitfalls of reuse.
(OGMS) [4] and BioTop [5], extend the BFO and add general
domain knowledge that can also be reused by an ontology                           Our focus is on the collection of ontologies hosted in
author.                                                                        BioPortal [12]. The specific QA issues that we wish to examine
                                                                               are duplicated content (including duplicated classes and
    While there are enormous benefits to reuse, an ontology                    properties), versioning problems with respect to source
author also needs to be keenly aware of potential issues that                  ontologies of reuse, and mechanical import errors. The
can affect the quality of the resulting ontology. There may be                 heuristic methods that were used in our analyses are described,
unintended consequences if reused content is not incorporated                  and our findings from among the BioPortal ontologies are
correctly or not maintained properly. In previous studies [6],                 reported.
we investigated the issue of quality assurance (QA) in the
context of the Sleep Domain Ontology (SDO) [9], the
Ontology for Drug Discovery Investigations (DDI) [10], and



       ICBO 2018                                                   August 7-10, 2018                                                   1
      Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                       2


                       II. BACKGROUND                                         However, there may be unexpected                   consequences
                                                                              downstream, especially after classification.
A. Prior Reuse QA
                                                                                  Alternatively, the author of O may reuse a fixed version of
    Ochs et al. [6] performed a QA review of the SDO and                      S’s content (either the complete contents of the ontology or a
discovered several significant issues related to the import of                selected subset of the ontology, extracted using, e.g., the
content from other ontologies. For example, pairs of duplicated               MIREOT approach [21]). Reusing a fixed version of the
classes (e.g., two Clinical finding classes and two Organism                  content provides the author of O with greater control over
classes), originating from different ontologies, were found and               when reused content is updated, at the expense of making it
corrected. However, on revisiting the SDO using a change                      labor intensive to align changes from S into O.
analysis methodology called a diff partial-area taxonomy [13],
which visually summarizes the differences between two
releases of a given ontology, several additional QA issues                                            III. METHODS
related to the reuse of content were uncovered.                                   In this study, we reviewed a collection of ontologies from
                                                                              BioPortal, looking for errors and inconsistencies arising from
    These preliminary studies, along with further discussions                 the reuse of content from other ontologies. In analyzing the
with ontology authors and maintainers, motivated the research                 collection, we employed several heuristic-based methodologies
described herein. The reuse design pattern, and the way it is                 to determine the prevalence of duplicate content, versioning
applied, can have serious, unintended impacts on an ontology.                 problems, and any import issues. The collection that was
The advantages of reusing content often come with a cost to                   examined was extracted from the 355 ontologies studied by
the quality of the overall ontology.                                          Ochs et al. [18], which were obtained from BioPortal in April
                                                                              2015. Specifically, the collection consisted of the 197
B. Prior Analysis of Ontology Reuse                                           ontologies (55.5%) that were found to reuse content.
    Previous studies have reviewed the existence and
                                                                                  We define a source ontology as an ontology that has
prevalence of ontology reuse. Kamdar et al. [14] analyzed
                                                                              content included in another ontology O. As in Ochs et al. [18],
term reuse among ontologies and noted several error patterns
                                                                              we define reuse according to the URIs of the entities in an
with ontology reuse. Ghazvinian et al. [15] reviewed the
                                                                              ontology. For each ontology in this study, we identified its
orthogonality of the OBO Library [16] ontologies. Ochs et al.
                                                                              base URI (e.g., the base URI of the BFO is
[18] investigated how reused content is utilized in a sample of
                                                                              http://purl.obolibrary.org/obo/bfo). Similarly, the base URI of
355 ontologies in BioPortal.
                                                                              the SDO is http://mimi.case.edu/ontologies/2009/1/SDO.owl.
   Among the ontologies in BioPortal, reuse of the BFO, an                    In general, all of the entities in an ontology have a URI that
upper-level ontology, is somewhat common. This is expected                    starts with the ontology’s base URI. Different versions of an
given the principle of a “commitment to collaboration”                        ontology may have different base URIs. For this study, an
espoused by the OBO Foundry [16]. Content reuse from top-                     entity (i.e., class or property) was considered reused if it had a
domain ontologies, like the OGMS [4], and domain-specific                     different base URI from the ontology it is residing in (e.g., a
ontologies, like GO [19] and ChEBI [20], is also fairly                       BFO class in SDO will have the BFO base URI). In this study,
common.                                                                       we did not distinguish between content imported directly and
                                                                              content imported by transitivity.
C. Methods of Ontology Reuse                                                      In the following, we describe the kinds of errors that were
    There are several ways an author of an ontology O can                     sought and the approaches to finding them. Examples from the
reuse content from a source ontology S. Each method of reuse                  SDO are used to illustrate how each type of error may manifest
has several advantages and disadvantages, particularly in                     itself during the ontology editing process. Additionally, for
relation to maintaining and updating reused content. Content                  each kind of error, we describe the heuristic-based approach
included from another ontology may be updated periodically                    that we utilized to determine the prevalence of the error among
at its source. Corrections of errors and inconsistencies                      the set of 197 BioPortal ontologies.
performed during maintenance of the source ontology S will
need to be propagated to O. While an ontology like the BFO                    A. Duplicated Content
may be updated only once every several years (e.g., BFO 1.1
                                                                                  An author may reuse content from multiple ontologies. If
was released in 2009 and BFO 2.0 was released in 2015),
                                                                              content from two ontologies is reused, and the ontologies cover
other ontologies are updated much more frequently. ChEBI
                                                                              a similar domain, the potential exists for the inclusion of
and GO, which Ochs et al. [18] found to be reused by 37 and
                                                                              duplicate classes (i.e., the author could inadvertently include
33 ontologies, respectively, are updated quite frequently:
                                                                              two classes, from two different ontologies, that represent the
ChEBI, almost every month, and GO, on a daily basis (though
                                                                              same concept). This kind of duplicate information is not
a new version may only be published monthly).
                                                                              desired. As mentioned previously, we identified several pairs
    An ontology author may include content via the                            of duplicated classes in the SDO [6].
owl:imports mechanism defined in OWL syntax, and
                                                                                  Class duplication can cause significant issues. For example,
implemented in the OWL API [1]. This approach includes the
                                                                              the abovementioned duplicate Clinical finding classes in SDO
entire contents of S into O “on the fly,” which allows updates
                                                                              had the same name and represented the same entity but were
in S to be included in O without work from the author of O.
                                                                              not set equivalent, and their restrictions were not the same.




      ICBO 2018                                                   August 7-10, 2018                                                    2
       Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                             3


This issue will cause problems for both users and authors alike,                   As with duplicate classes, the introduction of duplicate
as they will typically not suspect a duplicate and will likely not             properties may be due to the fact that two ontologies cover a
suspect that classes representing the same entity will have                    similar domain. For example, in the SDO, there are has
different modeling within a single ontology.                                   participant object properties included from the Relations
                                                                               Ontology (RO) [23] and BioTop, both of which represent the
                                                                               same kind of relationship. Both properties are utilized in the
                                                                               modeling of the SDO. Some SDO classes have restrictions
                                                                               using the RO version of has participant, while other SDO
                                                                               classes have the BioTop version. See Fig. 2 for some examples
                                                                               of this. These properties were not defined as equivalent in the
                                                                               SDO.
                                                                                   To identify ontologies with duplicated classes, we can
                                                                               utilize two heuristic-based methods. (These methods can also
                                                                               be used to identify duplicate properties). First, an ontology may
                                                                               have duplicate classes if it reuses classes from two source
                                                                               ontologies that cover a similar, or identical, domain. For
                                                                               example, if an ontology reuses classes from FMA [24] and
                                                                               Uberon [25], ontologies that model the domain of anatomy,
Fig. 1. Hierarchical paths between classes human and organism in BioTop        then there is a greater chance of finding duplicate classes than
(left) and CPRO (right).                                                       in ontologies that reuse content from only one ontology.
    Beyond individual classes being duplicated, two source                         Second, if an ontology reuses two classes with the same
ontologies may have subhierarchies of duplicate (or very                       label, but those classes originate from different source
similar) classes, often modeled with different levels of                       ontologies, then they may be duplicates. One can search for all
granularity. For example, in the SDO, we found duplicated                      pairs (or, in general, sets) of classes where the label is the same
classes for organism and human, originating from BioTop and                    but the URIs of the classes are different. While this method
CPRO [22]. (Actually, the terms are slightly different in each:                potentially returns many false positives—e.g., “cold,” as in
living organism vs. organism and human vs. human/person,                       temperature, and “cold,” as in the disease, which are expected
respectively.) In BioTop, living organism is a distant ancestor                to have different URIs and may be modeled in different
of human; there are seven other ancestor classes on the                        domains—it provides an indicator for a potential problem.
ancestry path between them (e.g., great ape, primate, and
mammal). In CPRO, human/person is a direct subclass of                         B. Versioning Problems
organism. See Fig. 1. The two versions of human have                               There is also the potential of versioning problems when
different relationship structures. The one on the left has a                   reusing properties. In general, ontologies should reuse content
defined participates in relationship. The one of the right does                consistently from a single version of a source ontology.
not, though its two children, patient and physician, do have                   However, an ontology may inadvertently include content from
have the relationship. The use of one human version alone may                  multiple versions of the same source ontology. This may occur
lead to deficient modeling in an application.                                  due to import by transitivity. For example, the SDO includes
    Duplicate classes can also be unknowingly included. The                    multiple versions of the part of property: one from an old
duplicate content may be imported by transitivity, i.e., an                    version of RO included via FMA and another from a more
ontology was reused by another reused ontology and the author                  recent version of RO via CPRO. Again, both of these
may or may not have been aware of this. Different versions of                  properties represent the same relationship. See Fig. 3 for
the same ontology may be reused. For example, as we report                     illustrations. Below, we identify several ontologies that reuse
below, we identified several ontologies that appear to import                  content from multiple versions of the BFO.
content from multiple versions of the BFO.
   Duplicate properties can also be introduced into an
ontology via reuse. Let us point out that the presence of
duplicate properties in itself is not necessarily an error. It is the
inconsistent use of such properties that constitutes an error.
This situation is analogous to the software engineering scenario
where multiple libraries are imported; in such a case, there is a
high potential for similar functions to be present.

                                                                               Fig. 3. Property part of from two versions of RO in the SDO.

                                                                                   To analyze inconsistent versioning, we can identify the
                                                                               base URI of every ontology in our data set, under the heuristic
                                                                               that a different base URI indicates a significantly different
Fig. 2. Two examples of SDO classes using the has participant property.        version of the same ontology. A number of source ontologies,



       ICBO 2018                                                    August 7-10, 2018                                                         3
      Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                            4


such as the BFO, GO, FMA, and others, were found to have                          For example, in the Synapse Ontology (SYN), there are
multiple base URIs among the BioPortal ontologies. For                        many (apparently) duplicated classes reused from NCIt and CL
example, in Ochs et al. [18], we identified six base URIs for                 (e.g., pairs of acinar cell classes). Within SYN, we found three
FMA, and below we describe several base URIs for BFO. We                      separate Cell subhierarchies. One subhierarchy, from GRO,
mapped each source ontology to its set of base URIs. If an                    consists of two classes. The other two Cell subhierarchies,
ontology O included entities from the same source ontology S,                 from NCIt and CL, are much larger. There are no equivalences
but the entities had different base URIs, the ontology is                     set between the classes in these subhierarchies. For the use case
considered to have a reuse versioning problem.                                of SYN, this might be an intentional design decision, but from
                                                                              an ontology design perspective, it is not typical compared to
C. Owl:imports Errors                                                         other ontologies that reuse NCIt, CL, etc.
    OWL’s owl:imports mechanism enables an ontology author                        The CSEO contains over 200 potential duplicate class
to include external ontologies without defining the classes and               pairs. It includes a large portion of the Disease subhierarchy
properties in their own ontology. The entire external ontology                from NCIt and defines its own Finding subhierarchy. In these
will be included when the importing ontology is opened (e.g.,                 two subhierarchies, there are many similar classes (e.g.,
in the OWL API). However, if the URI for the source ontology                  Abscess) that represent diagnoses. Looking more closely, we
is not correct, or the ontology is no longer available at the                 found additional pairs of duplicate diagnoses. Similarly, many
specified URI, then the source ontology cannot be loaded.                     classes related to various kinds of anatomical structures and
   To investigate issues related to owl:imports errors, we                    tissues (e.g., Tongue and Uterus) are included from NCIt and
opened every ontology with the OWL API and logged which                       added in CSEO. In all of these cases, there are no connections
ontologies encountered an error related to a missing                          (e.g., equivalences or restrictions) to indicate that these pairs of
                                                                              classes are related to one another. On the other hand, CSEO
owl:imports file(s).
                                                                              does define equivalences between classes reused from NCIt
                                                                              and classes reused from UO (e.g., Lux and Liter).
                          IV. RESULTS
                                                                                  For duplicated properties, we found 31 ontologies with
    Our analysis of the various kinds of errors resulting from                properties that have the same label and different base URIs.
reuse was carried out on the 197 ontologies in BioPortal that                 Twenty of these (64.5%) were found to contain one or more
were found to reuse content by Ochs et al. [18]. See [12] for                 pairs of duplicated properties. For instance, ENM contains
more information pertaining to the individual ontologies                      several pairs of duplicated properties from BAO, RO, and NPO
referred to in this section.                                                  (e.g., properties named derives from and has part).
A. Duplicate Classes and Properties                                           B. Versioning Problems
    Reuse of classes from multiple sources is common, with an                     The large majority of cases of reuse that appear to have
average of more than five sources [18]. But we found that it is               versioning problems, based on different base URIs, were found
relatively uncommon for an ontology to reuse classes from two                 among ontologies that reuse the BFO and RO. We identified
or more ontologies that cover a similar domain. However,                      eleven BioPortal ontologies (3.1% of all the ontologies in the
when we investigated cases where ontologies did reuse such                    BioPortal at the time) that included classes from multiple
content, there were several potential errors. For example, the                versions of the BFO. For example, the DDI uses all 39 classes
Cell Line Ontology (CLO) reuses content from the FMA and                      from an OWL release of BFO and one class from a version of
Uberon. In it, we found several potential duplicate class pairs.              the BFO with an OBO URI. Fig. 4 shows eight examples of
For example, there is Scalp from Uberon and Scalp from FMA.                   ontologies that include classes from multiple versions of the
There were also duplicate Pelvis classes from EFO and                         BFO. Two of the ontologies, CHEMINF and TEO, include all
Uberon. Many such classes are related using class equivalence
                                                                              of the content from two versions of the BFO.
axioms (e.g., Amnion, Colon, and Intestine). However, other
duplicate classes are not related in this way (e.g., Scalp, Aorta,
and Liver). Analyzing these examples, one can see that CLO
includes the Anatomical structure subhierarchy from Uberon
and the Organism part subhierarchy from EFO. In such a case,
the potential exists for additional duplicate classes.
    When looking for pairs of classes with the same label but
different base URIs, we found that class duplication does not
occur frequently. In total, 149 ontologies were found to reuse
at least one class from another ontology. Among the 149
ontologies, 46 ontologies (30.1%) contain at least one potential
duplicate pair based on our criteria. In general, we found very
few such pairs in a given ontology. Most of the 46 ontologies
either have just a single pair or between two and ten pairs. We
did find several ontologies (e.g., CLO, CSEO, and SYN) that
have many such pairs, and these ontologies reuse content from                 Fig. 4. Example ontologies reusing content from multiple versions of the
multiple ontologies that cover the same—or similar—domains.                   BFO




      ICBO 2018                                                   August 7-10, 2018                                                         4
          Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                               5


    The reuse of classes from multiple versions of non-BFO                                 the specified location. Five of the 44 ontologies (11.4%) were
ontologies was relatively uncommon. We identified a few                                    previously hosted on Google Code (which is no longer
ontologies that included classes from multiple versions of the                             available, as of January 2016).
same ontology. For example, COGPO and DDI include classes
from multiple versions of PATO and UO. These classes are not                                   There were relatively few errors caused by other types of
set equivalent. In cases where multiple versions of an ontology                            invalid import statements. For example, various Psychology
appear, the numbers of classes reused from each are typically                              (APA thesaurus) ontologies on BioPortal all have owl:imports
disproportionate. For example, the Cell Culture Ontology                                   statements that reference local files. We note that not all
(CCONT) includes classes from multiple versions of EFO (one                                instances of an ontology using owl:imports are instances of
class, obsolete normal, from one version and 4,882 classes                                 reuse, since the owl:imports mechanism is also frequently
from another). Both ENM and EP include multiple versions of                                used to include modules from the same ontology.
PATO. In the case of EP, 48 classes are included from one
version and 1,570 classes from another. HUPSON includes one                                                       V. DISCUSSION
class from one ChEBI version, and 83 classes from another.
MF includes classes from multiple versions of NBO; MIRNAO                                      We note that when an author is designing an ontology, it is
includes several classes from multiple versions of the GO.                                 often with the intention of supporting a specific set of use-
                                                                                           cases, or some specific application. Thus, some of the issues
    We found many different versions of the RO, OBO REL,                                   we identified in this paper may not be problematic for the
and BFO properties (e.g., has part and part of) reused in our                              intended purposes. However, once it has been discovered that
data set. Along with SDO, we found several ontologies that                                 an ontology appears to contain inconsistencies due to reuse, the
reuse properties from multiple versions of these ontologies                                issue should be brought to the attention of the author of the
(often in class restrictions). Consider, for example, the has part                         ontology. These problems could have deleterious effects if the
property. We identified 14 versions of this object property in                             ontology is utilized beyond its original scope.
our dataset (see Table I). Reviewing the ontologies enumerated
in Table I, we identified a total of 20 ontologies that include                                One significant complication is that, based on the metrics
multiple versions of these (and other) RO relationships. Four                              provided in BioPortal, hundreds of ontologies have not been
ontologies, namely, AERO, ONSTR, TAO, and VSO, include                                     updated in several years (if ever). Many of these ontologies are
object properties from three versions of the RO.                                           no longer maintained and reuse old versions of source
                                                                                           ontologies that are long out of date. This leads to, for example,
                                                                                           twelve versions of OBO REL/RO/BFO properties appearing
  TABLE I.              VARIOUS VERSIONS OF THE HAS PART PROPERTY FOUND                    throughout BioPortal’s ontologies (as illustrated in Table I).
                        AMONG THE BIOPORTAL ONTOLOGIES
                                                                                           This situation can impact ontology authors who decide to reuse
                          URI
                                                             # Ontologies That Reuse       the contents of these “dormant” ontologies (using, e.g., the
                                                                has part Property          BioPortal reuse plugin [26] for Protégé). In future work, we
 http://purl.obolibrary.org/obo/bfo_0000050                                       48       will investigate ways of warning ontology authors about
 http://www.obofoundry.org/ro/ro.owl#part_of                                      28
 http://purl.obolibrary.org/obo/temp#part_of                                      27
                                                                                           potential issues when reusing an ontology’s classes. We will
 http://purl.obolibrary.org/obo/obo_rel#_part_of                                    7      also investigate semi-automated techniques for identifying and
 http://purl.obolibrary.org/obo/bfo_00000050                                        5      preventing issues when reusing content (which could, e.g.,
 http://purl.obolibrary.org/obo/obo_rel_part_of                                     4      automatically align the different part of properties used in the
 http://purl.org/obo/owl/obo#part_of                                                2      SDO and other ontologies).
 http://purl.org/obo/owl/obo_rel#part_of                                            2
 http://purl.org/obo/owl/ro#part_of                                                 2          The errors reported on in this paper are from the year 2015.
 http://purl.obolibrary.org/obo/http://www.obofoundry.org/
                                                                                   1       Checking on a sample of them in the current version of the
 ro/ro.owl#part_of
 http://purl.org/obo/owlapi/relationship#obo_rel_part_of                           1
                                                                                           BioPortal, we found that the errors mentioned here are still in
 http://www.ifomis.org/obo/ro/1.0#partof                                           1       existence because we did not alert the curators of the specific
 http://obofoundry.org/ro/ro.owl#part_of                                           1       ontologies at the time. We can assume that many of the other
 http://purl.obofoundry.org/ro/ro.owl#part_of                                      1       errors still exist. In fact, a July 2018 scan of a sample of the
 Total:                                                                          130       ontologies reported on in the results revealed a number of
                                                                                           ontologies whose latest BioPortal release predated 2015 (e.g.,
                                                                                           AERO, COGPO, DDI, CSEO, SYN, etc.). Moreover, all these
C. owl:imports Errors                                                                      ontologies had relatively significant numbers of visits at their
    A total of 44 ontologies could not be loaded by the OWL                                BioPortal pages in the second quarter of 2018, indicating
                                                                                           continued interest in them. Since many more ontologies have
API due to errors caused by missing imported ontologies.
                                                                                           been added to the BioPortal in the interim, another review
There were several reasons for these errors; however, the large
                                                                                           would probably uncover more errors, but we were not in a
majority were caused by URIs being no longer valid web                                     position to perform such a study. Although the examples are
addresses. For example, the DDI ontology includes                                          from 2015, they reflect the reality of some phenomena that
http://www.obofoundry.org/ro/ro.owl, but no ontology file                                  curators and authors are liable to encounter when engaging in
exists at that location. In a similar manner, RoleO includes                               the practice of ontology reuse. The timeliness of the results is
http://purl.obolibrary.org/obo/RoleO/external/bfo_import.owl.                              not critical since the purpose of the paper is to alert ontology
Similarly, SDO includes its custom-built Units Ontology via                                designers and maintenance personnel, especially those new to
an owl:imports statement, but the ontology no longer exists at                             the process of content reuse, to the kinds of problems and



          ICBO 2018                                                             August 7-10, 2018                                                  5
        Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                                     6


errors they are likely to face when creating an ontology with                       [8]  Z. He, C. Ochs, A. Agrawal, Y. Perl, D. Zeginis, K. Tarabanis, G.
the aid of reuse.                                                                        Elhanan, M. Halper, N. Noy, and J. Geller, “A family-based framework
                                                                                         for supporting quality assurance of biomedical ontologies in BioPortal,”
    Also in future work, we plan to offer a set of guidelines for                        in Proc. 2013 AMIA Annual Symposium, Washington, DC, Nov. 2013,
ontology reuse in order to preempt some of the troubles                                  pp. 581–590.
described herein. Major aspects of those guidelines will deal                       [9] S. Arabandi, C. Ogbuji, S. Redline, R. Chervin, J. Boero, R. Benca et
                                                                                         al., “Developing a Sleep Domain Ontology,” in Proc. AMIA Clinical
with ontological commitment and the proper consideration of                              Research Informatics Summit, 2010.
the hierarchical context of reused content. We will also review                     [10] D. Qi, R. D. King, A. L. Hopkins, G. R. Bickerton, and L. N. Soldatova,
some of the software tools available to complement these                                 “An ontology for description of drug discovery investigations,” Journal
guidelines.                                                                              of Integrative Bioinformatics, vol. 7, no. 3, Mar. 2010.
                                                                                    [11] D. Zeginis, A. Hasnain, N. Loutas, H. F. Deus, R. Fox, and K. A.
                                                                                         Tarabanis, “A collaborative methodology for developing a semantic
                      VI. CONCLUSION                                                     model for interlinking Cancer Chemoprevention linked-data sources,”
    The reuse of content from existing ontologies is an                                  Semantic Web, vol. 5, no. 2, pp. 127–142, 2014.
important design principle that can facilitate the work of                          [12] “BioPortal,” available at http://bioportal.bioontology.org/. Accessed
curators and authors when creating new ontologies. It can also                           May 7, 2018.
help to ensure alignment of the new ontologies with previously                      [13] C. Ochs, Y. Perl, J. Geller, M. Haendel, M. Brush, S. Arabandi, and S.
                                                                                         Tu, “Summarizing and visualizing structural changes during the
modeled knowledge. However, the process of reuse is not a
                                                                                         evolution of biomedical ontologies using a Diff abstraction network,”
simple one, and there are potential pitfalls. In this paper, we                          Journal of Biomedical Informatics, vol. 56, pp. 127–144, 2015.
studied a collection of BioPortal ontologies to determine what                      [14] M. R. Kamdar, T. Tudorache, and M. A. Musen, “A systematic analysis
problems may have been introduced via reuse. We focused on                               of term reuse and term overlap across biomedical ontologies,” Semantic
three kinds of errors and presented heuristic methodologies to                           Web Journal, vol. 8, no. 6, pp. 853–871, 2017.
uncover these within a collection of ontologies. The results                        [15] A. Ghazvinian, N. F. Noy, and M. A. Musen, “How orthogonal are the
showed that significant errors could arise from reuse. This                              OBO Foundry ontologies?” Journal of Biomedical Semantics, vol.
should encourage ontology maintenance personnel to be                                    2(Suppl 2): S2, 2011.
cautious and vigilant when adopting the reuse approach.                             [16] “The OBO Foundry,” available at http://www.obofoundry.org/.
                                                                                         Accessed April 30, 2018.
                                                                                    [17] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters et al.,
                           ACKNOWLEDGMENT                                                “The OBO Foundry: Coordinated evolution of ontologies to support
                                                                                         biomedical data integration,” Nature Biotechnology, vol. 25, pp. 1251–
Research reported in this publication was supported by the                               1255, 2007.
National Cancer Institute of the National Institutes of Health                      [18] C. Ochs, Y. Perl, J. Geller, S. Arabandi, T. Tudorache, and M. A.
under award number R01CA190779. The content is solely the                                Musen, “An empirical analysis of ontology reuse in BioPortal,” Journal
responsibility of the authors and does not necessarily represent                         of Biomedical Informatics, vol. 71, pp. 165–177, 2017.
the views of the National Institutes of Health.                                     [19] “Gene Ontology Consortium,” available at
                                                                                         http://www.geneontology.org. Accessed May 6, 2018.
                                REFERENCES                                          [20] “Chemical Entities of Biological Interest (ChEBI),” available at [12].
                                                                                         Accessed April 2, 2018.
[1]    “OWL Web Ontology Language Reference,” available at
      https://www.w3.org/TR/owl-ref. Accessed May 11, 2018.                         [21] M. Courtot, F. Gibson, A. L. Lister, J. Malone, D. Schober, R. R.
                                                                                         Brinkman, and A. Ruttenberg, “MIREOT: The minimum information to
[2]   “Protégé,” available at http://protege.stanford.edu. Accessed May 13,              reference an external ontology term,” Applied Ontology, vol. 6, no. 1,
      2018.                                                                              pp. 23–33, 2011.
[3]   P. Grenon, B. Smith, and L. Goldberg, “Biodynamic ontology: Applying          [22] “Computer-based        Patient   Record       Ontology,”  available   at
      BFO in the biomedical domain,” Ontologies in Medicine, pp. 20–38,                  https://code.google.com/archive/p/cprontology. Accessed May 13, 2018.
      2004.
                                                                                    [23] B. Smith, W. Ceusters, B. Klagges, J. Köhler, A. Kumar, J. Lomax, C.
[4]   “OGMS – Ontology for General Medical Science,” available at                        Mungall, F. Neuhaus, A. L. Rector, and C. Rosse, “Relations in
      https://code.google.com/archive/p/ogms. Accessed May 13, 2018.                     biomedical ontologies,” Genome Biology, vol. 6:R46, 2005.
[5]   E. Beisswanger, S. Schulz, H. Stenzhorn, and U. Hahn, “BioTop: An             [24] C. Rosse and J. L. V. Mejino, “A reference ontology for biomedical
      upper domain ontology for the life sciences – a description of its current         informatics: The Foundational Model of Anatomy,” Journal of
      structure, contents, and interfaces to OBO ontologies,” Applied                    Biomedical Informatics, vol. 36, no. 6, pp. 478–500, Dec. 2003.
      Ontology, vol. 3, no. 4, pp. 205–212, 2008.
                                                                                    [25] C. J. Mungall, C. Torniai, G. V. Gkoutos, S. E. Lewis, and M. A.
[6]   C. Ochs, Z. He, Y. Perl, S. Arabandi, M. Halper, and J. Geller,                    Haendel, “Uberon, an integrative multi-species anatomy ontology,”
      “Refining the granularity of abstraction networks for the Sleep Domain             Genome Biology, vol. 13:R5, 2012.
      Ontology,” in Proc. Fourth Int’l Conference on Biomedical Ontology
      (ICBO 2013), Montreal, Canada, Jul. 2013, pp. 84–89.                          [26] J. Nair, T. Tudorache, T. Whetzel et al., “The BioPortal Import Plugin
                                                                                         for Protégé,” in Int’l Conference on Biomedical Ontology (ICBO 2011),
[7]   Z. He, C. Ochs, L. Soldatova, Y. Perl, S. Arabandi, and J. Geller,                 2011, pp. 298–299.
      “Auditing redundant import in reuse of a top level ontology for the Drug
      Discovery Investigations ontology,” in Proc. Int’l Workshop on Vaccine
      and Drug Ontology Studies (VDOS-2013), Montreal, Canada, Jul. 2013




        ICBO 2018                                                        August 7-10, 2018                                                             6