Definition Coverage in the OBO Foundry
                      Ontologies: The Big Picture
                              Daniel R. Schlegel                                                     Selja Seppälä
                              and Peter L. Elkin                                      Department of Health Outcomes and Policy
                  Department of Biomedical Informatics                                     University of Florida, FL, USA
             University at Buffalo, SUNY, Buffalo, NY, USA                                    Email: sseppala@ufl.edu
                 Email: <drschleg, elkinp>@buffalo.edu


                          I. I NTRODUCTION                                                              II. M ETHODS
   High quality ontologies have both textual and logical def-                     Textual definitions tell us about the properties of the in-
initions for their terms. Definitions serve many purposes:                     stances of a class in an ontology. They typically have two
good textual definitions allow for experts and non-experts                     parts: (i) a genus that states the type of thing of which they
alike to understand the content of an ontology and use it                      are instances, and (ii) one or more differentia(e) that state
in the manner the authors intended; logical definitions are                    the properties of these instances that differentiate them from
necessary for reasoners to verify that an ontology is consistent,              instances of neighboring types.
and may make application of the ontology easier for users.                        To identify textual definitions, we used the IAO annotation
Ideally, logical and textual definitions would convey the same                 property definition used in 103 of the 119 ontologies in
information, and each can provide an accuracy check on the                     this study. We also examined the set of annotation properties
other [1], [2].                                                                used in the OBO Foundry ontologies that contained the string
                                                                               def but did not contain the strings editor, source,
   Producing definitions is difficult and time-consuming. Thus,
                                                                               citation, defines, or defined to try to capture any
despite the best efforts of ontology developers and the exis-
                                                                               non-standard annotation properties which might have been
tence of a number of tools and methods to populate ontologies
                                                                               used to signal a definition. We also included the IAO annota-
with definitions, it is not uncommon to see missing textual or
                                                                               tion property elucidation for ontologies that contain some
logical definitions, if not both. This is also the case in the
                                                                               primitive classes that cannot be, strictly speaking, defined.
Open Biomedical Ontologies (OBO) Foundry [3] ontologies.
                                                                                  One of the main components of ontologies are classes,
   The OBO Foundry contains 9 ‘core’ ontologies and 128
                                                                               which are defined by class expressions. Class expressions
non-core ontologies.1 These ontologies are developed in a
                                                                               represent conditions that individuals must satisfy to be mem-
coordinated way according to a set of shared principles.2 One
                                                                               bers of a class. Some axioms, such as SubClassOf and
of the OBO Foundry principles is about definitions: member
                                                                               EquivalentClass, define relationships between class ex-
ontologies should have “textual definitions ... for a substantial
                                                                               pressions. These two axiom types, specifically, constitute the
and representative fraction [of terms], plus equivalent formal
                                                                               logical definitions of the ontology terms. We say an axiom
definitions (for at least a substantial number of terms).”3
                                                                               contains a genus for the definition of class c1 if the axiom
The statement of this principle is rather vague and elicits an
                                                                               contains some other class, c2 , where c2 is not part of an
obvious question: How much is ‘substantial’?
                                                                               object property restriction; an axiom contains one or more
   We examine the coverage of textual and logical definitions                  differentiae for the definition of a class if the axiom contains
throughout the OBO Foundry ontologies. In particular, we aim                   any object property restrictions.
to determine: (1) if the prevalence of definitions is different                   For each ontology, we computed the number of classes that
between the core and non-core ontologies; (2) if there are more                contain: (i) at least one genus; (ii) at least one differentia; and
textual than logical definitions; (3) if the size of ontologies has            (iii) at least one of both. A class specified by both a genus
an effect on definitional coverage. To conclude, we discuss                    and one or more differentiae has a complete logical definition.
ways of quantifying the notion of ‘substantial’ definition
coverage to determine to what extent the principle of having                                            III. R ESULTS
textual and logical definitions for a substantial number of terms                 We review our results in light of our goals stated in section I.
is upheld.                                                                        Item (1): Table I shows that the prevalence of definitions is
                                                                               different between the core and non-core ontologies. We found
   1 Our study focuses on 119 ontologies out of the 137 present in the OBO
                                                                               that coverage within the 9 core ontologies was quite high, with
Foundry, since 18 non-core ontologies were either unavailable on the web due
to broken links, or they failed to load using the OWL API.
                                                                               6 having textual definitions for over 90% of their terms. On
   2 http://obofoundry.org/principles/fp-000-summary.html.                     average, core ontologies have textual definitions for 85.6%
   3 http://obofoundry.org/principles/fp-006-textual-definitions.html.         of their terms (stdev = 21%); non-core ontologies, 63%
                            TABLE I                                                                                                                        IV. D ISCUSSION AND C ONCLUSION
   C OVERAGE OF TEXTUAL DEFINITIONS , LOGICAL DEFINITIONS , AND
  PARTS OF LOGICAL DEFINITIONS ACROSS THE CORE , NON - CORE , AND                                                             Determining if the principle of having textual and logical
  SUM TOTAL OF THE ANALYZED ONTOLOGIES IN THE OBO F OUNDRY.                                                                definitions for a substantial number of terms is upheld requires
                                                                                                                           quantifying the notion of ‘substantial’ definition coverage.
                                                                             Core       Non-Core          Total
                              Textual Definition Coverage                    86%            64%           66%              If we consider that ‘substantial’ equates with the average
                              Logical Definition Coverage                    53%            28%           30%              definition coverage measured over the core ontologies, then
                              Genera Covereage                               91%            86%           86%              an adequate coverage to be included in the OBO Foundry
                              Genera Only Coverage                           39%            58%           57%
                              Differentiae Covereage                         53%            34%           36%              would be to have at least 86% of the terms specified with a
                              Differentiae Only Covereage                     0%             6%            6%              textual definition and 53% with a complete logical definition.
                                                                                                                           Whereas, considering all of the (analyzed) ontologies in the
                                                                                                                           OBO Foundry, we get, respectively, 66% and 30%.
                                                                                                                              To expect that all ontologies have coverage as complete as
                                                                                                                           the core ontologies is unrealistic. Therefore, we quantify ‘sub-
(stdev = 38%). Coverage for complete logical definitions
                                                                                                                           stantial’ at roughly 65% for textual definitions, and propose
among the core ontologies was 53% (stdev = 34%), and only
                                                                                                                           that logical definitions be held to this standard as well.
28% (stdev = 29%) for the non-core ontologies. Over the full
                                                                                                                              Having set a measure for substantial definition coverage in
set of analyzed OBO Foundry ontologies, textual definition
                                                                                                                           the OBO Foundry ontologies, our results show that on average
coverage is on average 66% (stdev = 37%) and complete
                                                                                                                           there is substantial coverage of textual definitions, but not of
logical definition coverage, 30% (stdev = 30%).
                                                                                                                           logical definitions.
  Item (2): Figure 1 shows that the studied ontologies have                                                                   Definitions, both logical and textual, are essential compo-
more textual than logical definitions and that the trends are                                                              nents of an ontology. The OBO Foundry has the noble goal of
nearly opposite. Relatively few ontologies have poor textual                                                               creating a repository for ontologies developed using a shared
definition coverage, while a large number have 90-100%                                                                     set of principles, including some (vague) requirements for
coverage. Conversely, a large number of ontologies have very                                                               including definitions. This study is the first one not only to
poor logical definition coverage (0-10%), and few have good                                                                analyze the “big picture” of definition coverage in the OBO
logical definition coverage.                                                                                               Foundry, but also to suggest a numeric value for ‘substantial’
                                                                                                                           definition coverage.
   Item (3): Figure 2 shows a correlation between ontology
size and logical definition coverage. We grouped the ontologies                                                                                                                    R EFERENCES
as follows: ‘very small’ (0-99 terms, n=17); ‘small’ (100-999,                                                             [1] S. Seppälä, Y. Schreiber, and A. Ruttenberg, “Textual and logical
n=42); ‘medium’ (1,000-9,999, n=44); ‘large’ (10,000-99,999,                                                                   definitions in ontologies,” in Proceedings of DIKR 2014, IWOOD 2014,
n=11); and ‘very large’ (100,000+, n=3). We found that nearly                                                                  and OBIB 2014, Boyce, R., et al., Ed., vol. Vol-1309. Houston,
                                                                                                                               TX, USA: CEUR Workshop Proceedings (CEUR-WS.org), October 6-7
all groups had textual definitions for roughly 60-70% of their                                                                 2014, pp. 35–41. [Online]. Available: http://ceur-ws.org/Vol-1309/
terms. The ‘large’ category formed the only outlier, with a 33%                                                            [2] S. Seppälä, Y. Schreiber, A. Ruttenberg, and B. Smith, “Definitions in
coverage. We examined logical definition coverage in three                                                                     ontologies,” Cahiers de lexicologie, vol. 4, no. Numéro thématique ”Au
                                                                                                                               coeur de la définition”, forthcoming.
ways — the percent of classes: with genera; with differentiae;                                                             [3] B. Smith, M. Ashburner, C. Rosse et al., “The OBO Foundry: coordinated
and with both. The percent coverage of complete logical                                                                        evolution of ontologies to support biomedical data integration,” Nature
definitions rose slowly as ontology size grew.                                                                                 biotechnology, vol. 25, no. 11, pp. 1251–1255, 2007.


                                                                                                                                                                     Definition Coverage by Ontology Size
                                     Number of Ontologies with % Definition Coverage
                                                                                                                                                100
                         60                                                                                                                      90
                                                                                                                                                 80
                                                                                                                             Coverage Percent


                         50                                                                                                                      70
                                                                                                                                                 60
  Number of Ontologies


                         40                                                                                                                      50
                                                                                                                                                 40
                         30                                                                                                                      30
                                                                                                                                                 20
                         20                                                                                                                      10
                                                                                                                                                  0
                         10                                                                                                                            Very Small            Small                Medium               Large        Very Large
                                                                                                                                                      (0-99 terms)         (100-999)           (1,000-9,999)      (10,000-99,999)   (100,00+)
                         0                                                                                                                                                                Ontology Size (terms)
                              0-10    11-20   21-30     31-40        41-50    51-60     61-70     71-80   81-90   91-100
                                                                    Percent Coverage                                                                                 Textual Definition Coverage    Logical Genera Coverage

                                                                                                                                                                     Logical Differentia Coverage   Complete Logical Definition
                                              Textual Definitions      Complete Logical Definitions


                                                                                                                           Fig. 2. The coverage of textual and logical definitions by ontology size. Both
Fig. 1. The number of ontologies with percent coverage of textual and                                                      the genus and differentia components of the logical definitions are shown,
complete logical definitions.                                                                                              along with coverage for the complete logical definitions.