=Paper= {{Paper |id=Vol-1747/IT404_ICBO2016 |storemode=property |title=Ten Simple Rules for Biomedical Ontology Development |pdfUrl=https://ceur-ws.org/Vol-1747/IT404_ICBO2016.pdf |volume=Vol-1747 |authors=Melanie Courtot,James Malone,Chris Mungall |dblpUrl=https://dblp.org/rec/conf/icbo/CourtotMM16 }} ==Ten Simple Rules for Biomedical Ontology Development == https://ceur-ws.org/Vol-1747/IT404_ICBO2016.pdf
               Ten simple rules for biomedical ontology
                             development
         Mélanie Courtot                                   James Malone                               Christopher J Mungall
           EMBL-EBI                                        FactBio Ltd                              Lawrence Berkeley National
          Hinxton, UK                                     Cambridge, UK                                    Laboratory
       mcourtot@gmail.com                               james@factbio.com                                Berkeley, USA
                                                                                                       cjmungall@lbl.gov


    Abstract—Biomedical ontology development is often a time         possible. While this introduces additional constrains such as
and resource consuming endeavor. To maximize efficiency of the       need to keep in sync or decisions about positioning and
process, we present a set of 10 simple rules covering basic          modifications, the advantages of doing so greatly outweigh the
technical requirements such as scoping and versioning, while         disadvantages. Reusing terms from other resources allows
considering additional elements such as licensing and community      developers to rely on the knowledge of domain experts who
engagement. When applied, the rules will help avoid common           curated them and to dedicate more work time for novel terms.
pitfalls and jump-start ontology building.                           The Minimum Information to Reference an External Ontology
                                                                     Term guidelines [3] specifies a mechanism to selectively
   Keywords—ontology development, tutorial, rules
                                                                     import a term from a source ontology into a target resource,
                                                                     without the overhead of importing the whole external file. For
                        INTRODUCTION                                 example, the Gene Ontology (GO, [4]) currently imports
    Biomedical Ontologies are notoriously challenging and            selected terms from the Chemical Entities of Biological
laborious to develop, despite their uncontested usefulness for       Interest (ChEBI, [5]) to model physiological responses to
data description, sharing and integration. As the amount of data     drugs. Avoiding duplication of resources additionally increases
generated keeps increasing, ontologies are becoming a de facto       interoperability: a single URI is created per term, preventing
requirement for scientific creation and maintenance of datasets.     the need for tedious mappings between terms with the same
While advantages to using an ontology are many, it is not            meaning in different resources.
straightforward for inexperienced users to choose which to use
[1] before considering development of their own. Additionally,      III. PUBLISH THE ONTOLOGY LICENSE AND ATTRIBUTION MODEL
there is often no single resource providing exactly what is
                                                                         When building an ontology, you should think about
needed, and many biologists embark on a new ontology
                                                                     licensing early on [6]. Indeed, licenses cannot be made more
building task without being fully aware of some basic notions
                                                                     restrictive; they can only be loosened towards a more
in ontology development. This paper seeks at documenting
                                                                     permissive one. Within the OBO Foundry [7], we chose to
some general rules and guide neophyte users towards practical
                                                                     recommend the Creative Commons licenses [8], specifically
considerations for efficient biomedical ontology building.
                                                                     CC-by, which requires attribution upon reuse. The OBO
                                                                     Foundry only requires the original URIs be reused for
           I. SET THE SCOPE FROM USERS’ NEEDS                        attribution, which prevents ‘attribution stacking’ : only the URI
    It is often very tempting to ‘dig in’, and start creating new    need to be cited, without the need for adding extra citations to
terms and organize them in a hierarchy. However before               individuals or projects. However other efforts such as Wikidata
proceeding with ontology development itself, it is crucial to        [9] require resources be available under CC-0 (i.e. public
take a step back and consider the use cases the ontology is          domain) for reuse, so the chosen license can and will impact
attempting to address. Typically this is in the form of              the usage that can be made of your resource. Proper attribution
competency questions [2] - queries which the ontology should         will be important when trying to track usage, and can help
be able to satisfy in order to be considered correct and usable.     justify supporting it to funding agencies.
Building an ontology from the bottom up will ensure there is
coverage, i.e. the ‘terms’ required are present, but it will not       IV. PROVIDE STABLE URIS & VERSION YOUR ONTOLOGY
always ensure that the queries required are satisfiable. This
                                                                         While ontologies evolve through time, stability of
requires an understanding of those questions and from there
                                                                     identifiers is a fundamental tenet of their life cycle. Each entity
building in class descriptions and structure such that they can
                                                                     described should have a unique identifier, and this identifier
be answered by the ontology.
                                                                     should be stable through time [10]. When terms become
                                                                     obsolete, a deprecation policy such as this of the GO [11]
   II. DO YOUR RESEARCH & REUSE AS MUCH AS POSSIBLE                  should be followed. Using URLs as identifiers enables for their
    When choosing to create a new resource, care should be           dereferencing, i.e., resolution into human readable information
taken to reuse work done in the context of other efforts where       in a browser as well as RDF for machine in the background.
The adoption of the OBO Foundry ID policy by many OBO                strategy, use of design patterns or domain interoperability. For
library resources has enabled common tooling to be built, such       either kind of evaluation, publish your results alongside with
as Ontobee [12] which provides built-in dereferencing for            the ontology, pointing to the version that was being evaluated
OBO resources.                                                       and changes that were being made when performing sequential
                                                                     evaluations.
    V. USE A VERSION CONTROL SYSTEM OR EQUIVALENT
    Version Control Systems (VCS) allow for storage of                        VIII. DOCUMENT YOUR DESIGN PATTERNS
ontologies and their versions in a common shared space, with a           Consider the knowledge you are trying to describe. In many
history of all edits preserved in a transparent fashion. In the      cases in biology, a repetitive pattern can be seen. For example,
world of software engineering, almost all software is developed      the transport of a protein process in GO includes a starting
using a VCS, and we argue that the same should hold for              point, an endpoint, a cargo, whether we are describing amino
ontology engineering. In particular, we advocate for the use of      acid import into cell or oligopeptide export from
a publicly hosted VCS system, such as GitHub or GitLab.              mitochondrion. In the Ontology of Biomedical Investigations
These systems also provide mechanisms to help make stable            [19], assays are described via their input, output, and their
releases, as well as provide issue trackers and tools to allow the   evaluant (i.e., what is being measured). Using patterns for
wider community to interact with and comment on aspects of           defining logical axioms allows for fast addition of new classes
the development process. Many ontology developers have               via script, as well as easier maintenance should the patterns be
chosen to adopt a common folder structure with which to              updated. Uberon documents a variety of anatomical entity
organize project file, which helps users find things in              design patterns on its wiki, and many of these are applicable to
consistent places. Tools such as the ontology-starter-kit [13]       other ontologies [20]. The GO and several other ontologies
can help you bootstrap a project using a standard layout.            including the Cell Type Ontology [21] already use standard
                                                                     patterns to generate new terms via the TermGenie tool [22]. In
             VI. USE A COMMON METADATA SET                           GO around 80% of new terms are added via this route. Other
                                                                     tools such as Tawny OWL [23] and the ontology Pre-
    Usage of common annotation properties allows tool                Processing Language [24], for example as implemented via
developers to rely on them to build their user interface, and        Webulous [25] are also available. A newer, simpler, version of
enables users to go back and check on the origin of the term         templates is being implemented, ‘Dead Simple OWL design
and what its intended meaning is, and/or contact the relevant        patterns’ (submitted). Adopting an upper level ontology can
individual should they need more clarification about its usage.      help ensure that the hierarchy developed is compliant with
While it is usually non controversial that at least a label and      others which adhere to the same type of representation. This is
definition be provided for each entity in the ontology, we found     important in the context of reuse of resources, or to ensure easy
that other properties are useful in providing documentation and      communication between developers. For example, ‘cancer’ can
traceability. For example, source of the definition – such as a      refer to a disease or an aggregate of cells, which would be in
PMID or web citation - is often useful to capture and provides       clearly separated areas of the ontology. Many upper ontologies
additional context for the term. Annotation properties should        are available [26]. In the OBO Foundry, the Basic Formal
be used to indicate evolution of the ontology: ‘replaced_by’         Ontology (BFO [27]) has been widely adopted.
indicates one-to-one replacement of obsolete terms and can be
followed by scripts to update annotations for example, and
‘creation_date’ or ‘created_by’ can help audit the resource. A       IX. MAKE ONTOLOGY AS DETAILED AS IT NEEDS TO BE. BUT NO
common metadata set [14] has been proposed and is currently                                FURTHER.
used by many resources in the OBO Foundry. Other efforts                 Including users who understand the domain in question is
exist to formalize metadata, such as the Simple Knowledge            also a valuable consideration. While there are many, freely
Organization System (SKOS) [15] and the Dublin Core (DC)             available resources from which biomedical information can be
Metadata element set [16].                                           collected, some are more reliable than others. Crowd sourcing
                                                                     such knowledge can be a productive method for collecting
          VII. EVALUATE EARLY, OFTEN & OPENLY                        knowledge for inclusion into an ontology [9] but expertise
                                                                     from the biomedical domain in question is critical in ensuring
     Collecting datasets first will ensure the resource developed    the validity of the ontology content. Care should also be taken
fits the use case, and that there will be a gold standard against    to capture the appropriate level of information. For example,
which the ontology can ultimately be evaluated. Some tools,          when describing a disease, is only the diagnosis needed, or
such as the Ontology Lookup Service [17] allow calculating           should the symptoms and signs be described as well? To
deltas (or diffs) between ontologies to explore their                maximize effectiveness, a resource need to abide by the
development, quality of content in terms of definitions, and         Goldilocks principle [28] and capture just the right amount of
compliance with ontology development best practice. For              information.
example, adherence to OBO Foundry principles [18] for
ontology best design can provide qualitative evaluation. For
external evaluation, other metrics can be useful to provide a                     X. ENGAGE WITH THE COMMUNITY
quantitative overview, such as number of classes, properties in          Finally, don’t be afraid to ask for help! There are many
the ontology, or number of projects using the resource (as part      places where to get help, starting with the trackers of the
of their own ontology or to annotate their datasets), evolution      resources you are interested in. The biomedical ontology
community is relatively small, and many developers have been                    [6]  Science Commons - Ontology Copyright Licensing Considerations.
working together for a long time. While this means discussions                       Available from http:// sciencecommons.org/resources/readingroom/
                                                                                     ontology-copyright-licensing-considerations/, Accessed May 2016.
can sometimes become heated, it also implies a long shared
                                                                                [7] OBO Foundry. OBO library. Available from http://obofoundry. org/,
history and respect for each other’s work. The community                             Accessed May 2016.
often comes together at yearly events such as the International                 [8] Creative Commons. Creative commons licenses. Available from https:
Conference on Biomedical Ontology, the International                                 //creativecommons.org/licenses/, Accessed May 2016.
Biocuration Conference or the Bio-ontology Special Interest                     [9] Elvira Mitraka, Andra Waagmeester, Sebastian Burgstaller-
Group. General mailing lists, such as public-semweb-                                 Muehlbacher, Lynn M. Schriml, Andrew I. Su, and Benjamin M. Good.
lifesci@w3.org or obo-discuss@lists.sourceforge.net are also                         Wikidata: A platform for data integration and dissemination for the life
good places where to engage with other users and developers.                         sciences and beyond. In Proceedings of the 8th International
                                                                                     Conference on Semantic Web Applications and Tools for Life Sciences
Many other documents and blogs, such as Ontogenesis [29],                            (SWAT4LS), 2015.
can also provide assistance. Engaging a wider community
                                                                                [10] Julie McMurry, Juha Muilu, Michel Dumontier, Henning Hermjakob,
means that in the longer-term more people may contribute, and                        Nathalie Conte, Philipp Gormanns, Murat Sariyar, Janna Hastings,
will help establish a community of editors that provides some                        Alejandra Gonzalez-Beltran, Niklas Blomberg, Chris Morris, Jean-
level of sustainability to the resource.                                             Karim He ́riche ́, Melissa A Haendel, Rafael C Jimenez, Tony Burdett,
                                                                                     Philippe Rocca-Serra, Nicolas Le Nove`re, Nick Juty, Katherine
                                                                                     Wolstencroft, Simon Jupp, Wolfgang Mu ̈ller, Donal K Fellows, Maria J
                              CONCLUSION                                             Martin, Neil Swainston, Helen Parkinson, Carole Goble, Johanna R
                                                                                     McEntyre, Camille Laibe, Jacky L Snoep, Nicole Washington, Susanna-
    Building a new ontology can be a daunting task, and should                       Assunta Sansone, Natalie J Stanford, Jon C Ison, Alan R Williams,
not be taken on lightly. Good ontology development requires                          Christopher J Mungall, and James Malone. 10 Simple rules for design,
time and dedication, but if done correctly will provide                              provision, and reuse of persistent identifiers for life science data.
advantages in storing and analysing biomedical data.                                 Available from http:// zenodo.org/record/18003. May 2015.
Following a simple set of rules from early development on will                  [11] GO Curator Guide. Available from http://wiki.geneontology.
prevent unnecessary proliferation of custom resources which                          org/index.php/Curator_Guide:_Obsoletion. Accessed May 2016.
are doomed to disappearing as their funding ends, and foster                    [12] Ontobee webserver. Available from http://www.ontobee.org. Accessed
                                                                                     May 2016.
building of interoperable community resources.
                                                                                [13] Creating an ontology project, an update. Available from
                                                                                     https://douroucouli.wordpress.com/2015/12/16/creating-an-ontology-
                         ACKNOWLEDGMENTS                                             project-an-update/. Accessed May 2016.
                                                                                [14] Ontology metadata common set. Available from http://information-
MC was funded by EMBL-EBI core funds. The authors would                              artifact-ontology. googlecode.com/svn/releases/2015-02-23/ ontology-
like to thank Helen Parkinson for helpful comments and                               metadata.owl. Accessed May 2016.
suggestions on the manuscript.                                                  [15] W3C. Simple Knowledge Organization System (SKOS). Available from
                                                                                     http://www.w3.org/TR/2009/ REC-skos-reference-20090818/, Accessed
                                                                                     May 2016.
                              REFERENCES                                        [16] Dublin Core Metadata Initiative. Dublin Core Metadata Element Set.
                                                                                     Available from http://dublincore.org/documents/ dces/, Accessed May
                                                                                     2016.
[1]   James Malone, Robert Stevens, Simon Jupp, Tom Hancocks, Helen
      Parkinson, and Cath Brooksbank. Ten simple rules for selecting a bio-     [17] Olga Vrousgou, Tony Burdett, Simon Jupp, and Helen Parkinson.
      ontology. PLOS Comput Biol, 12(2):e1004743, 2016.                              Biomedical ontology evolution in the embl-ebi ontology lookup service.
                                                                                     In Proceedings of the Workshops of the EDBT/ICDT 2016 Joint
[2]   Kamal Azzaoui, Edgar Jacoby, Stefan Senger, Emiliano Cuadrado                  Conference (EDBT/ICDT 2016), 2016.
      Rodr ́ıguez, Mabel Loza, Barbara Zdrazil, Marta Pinto, Antony J
      Williams, Victor de la Torre, Jordi Mestres, Manuel Pastor, Olivier       [18] OBO       Foundry     principles    -   overview.     Available      from
      Taboureau, Matthias Rarey, Christine Chichester, Steve Pettifer, Niklas        http://obofoundry.org/principles/fp-000-summary.html. Accessed May
      Blomberg, Lee Harland, Bryn Williams- Jones, and Gerhard F Ecker.              2016.
      Scientific competency questions as the basis for semantically enriched    [19] A Bandrowski, R Brinkman, M Brochhausen, MH Brush, B Bug, MC
      open pharmacological space development. Drug discovery today, 18(17-           Chibucos, K Clancy, M Courtot, D Derom, M Dumontier, et al. The
      18):843–52, sep 2013.                                                          ontology for biomedical investigations. PloS one, 11(4):e0154556,
[3]   M. Courtot, F. Gibson, A. L. Lister, J. Malone, D. Schober, R. R.              2016.
      Brinkman, and A. Ruttenberg. MIREOT: The minimum information to           [20] Uberon       Design     patterns.   Available    from      https://github.
      reference an external ontology term. Applied Ontology, 6(1):23–33,             com/obophenotype/uberon/wiki/Manual#design-patterns. Accessed May
      2011.                                                                          2016.
[4]   M Ashburner, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A     [21] Bard, Jonathan, Seung Y Rhee, and Michael Ashburner. “An Ontology
      P Davis, K Dolinski, S S Dwight, J T Eppig, M A Harris, D P Hill, L            for Cell Types.” Genome Biology 6.2 (2005): R21. PMC. Web. 30 June
      Issel-Tarver, A Kasarskis, S Lewis, J C Matese, J E Richardson, M              2016.
      Ringwald, G M Rubin, and G Sherlock. Gene ontology: tool for the          [22] Heiko Dietze, Tanya Z Berardini, Rebecca E Foulger, David P Hill, Jane
      unification of biology.TheGeneOntologyConsortium.Naturegenetics,               Lomax, David Osumi-Sutherland, Paola Roncaglia, and Christopher J
      25(1):25–9, may 2000.                                                          Mungall. Termgenie– a web-application for pattern-based ontology class
[5]   Janna Hastings, Paula de Matos, Adriano Dekker, Marcus Ennis,                  generation. Journal of biomedical semantics, 5(1):1, 2014.
      Bhavana Harsha, Namrata Kale, Venkatesh Muthukrishnan, Gareth             [23] Phillip Lord. The semantic web takes wing: Programming ontologies
      Owen, Steve Turner, Mark Williams, and Christoph Steinbeck. The                with tawny-owl. arXiv preprint arXiv:1303.0213, 2013.
      ChEBI reference database and ontology for biologically relevant           [24] Mikel Egana, Alan Rector, Robert Stevens, and Erick Antezana.
      chemistry: enhancements for 2013. Nucleic acids research, 41(Database          Applying ontology design patterns in bio-ontologies. In Knowledge
      issue):D456–63, jan 2013.                                                      Engineering: Practice and Patterns, pages 7–16. Springer, 2008.
[25] Simon Jupp, Tony Burdett, Danielle Welter, Sirarat Sarntivijai, Helen    [28] The          Goldilocks        principle.         Available        from
     Parkinson, and James Malone. Webulous and the webulous google add-            https://en.wikipedia.org/wiki/Goldilocks_principle, Accessed June 2016.
     on-a web service and application for ontology building from templates.   [29] Phillip      Lord.     Ontogenesis.       Available     from      http://
     Journal of Biomedical Semantics, 7(1):1, 2016.                                ontogenesis.knowledgeblog.org/, Accessed May 2016.
[26] Wikipedia - Upper Ontology. Available from https://en.
     wikipedia.org/wiki/Upper_ontology#Available_ontologies. Accessed
     May 2016.
[27] P. Grenon, B. Smith, and L. Goldberg. Biodynamic ontology: applying
     bfo in the biomedical domain. Studies in health technology and
     informatics, 102:20–38, 2004.