<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Python Utility for Working with OBO Foundry Terms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jonathan P. BONA</string-name>
          <email>jpbona@uams.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>782 Little Rock</institution>
          ,
          <addr-line>AR 72205-7199</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Biomedical Informatics, University of Arkansas for Medical Sciences</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Jonathan P.</institution>
          <addr-line>Bona</addr-line>
        </aff>
      </contrib-group>
      <kwd-group>
        <kwd />
        <kwd>Python</kwd>
        <kwd>OBO Foundry</kwd>
        <kwd>Tools</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This poster describes a simple utility that facilitates working with Open Biomedical
Ontologies Foundry resources within programs implemented in the Python language.
This utility allows a programmer to import a representation of an OBO Foundry ontology
as a Python class, and then use that Python class within a program to refer to terms in the
ontology using their labels rather than managing strings representing term URIs.</p>
      <p>
        Term identifier conventions in the OBO Principles[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] require ontology terms to use
numeric local term identifiers, and forbid local identifiers that “consist of labels or
mnemonics meaningful to humans.” This requirement has the unfortunate side effect of
making it difficult for humans to work directly with the identifiers. In order to write by
hand (or in code) an RDF assertion that a certain individual is an instance of the
UBERON[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] term ‘lung’, one must know that the URI is
http://purl.obolibrary.org/obo/UBERON_0002048. A reusable and
human-readable way of associating that URI with its label within the code is preferrable.
      </p>
      <p>
        In Python software that we have written to transform instance data into semantic
representations using OBO Foundry ontologies, including as part of the PRISM[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
project, we started managing the association between ontology labels and URIs with
adhoc mappings involving only terms that were of immediate interest. For example, we
might maintain a Python dictionary object for all the UBERON terms used in our project
and then use the Python expression uberons[‘lung’] to retrieve the URI for that
term. Among the disadvantages of this approach is the need to look up individual terms
using external resources and then copy their labels and URIs into source code. This
manual step is clunky, error-prone, and difficult to reuse from one project to the next.
      </p>
      <p>The solution provided by this utility is simple but useful: rather than managing term
label/URI mappings for oneself in a body of Python code, we provide the ability to
import each ontology as a Python class that contains the labels and URIs for terms
defined in that ontology.</p>
      <p>A draft implementation is available in the public GitHub repository at
https://github.com/jonathanbona/obof-py.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>This tool and the capability described above is implemented by generating a Python class
for each ontology. For each term defined in an ontology, the ontology’s Python class has
a class attribute named using the term’s label, with underscores substituted for spaces
where applicable. The value of a term’s attribute is a string representation of its URI.</p>
      <p>Following the running example, one uses the line from obo import UBERON
at the top of a Python source file that will use UBERON terms, and then uses the Python
class UBERON to get the URI for any term in the UBERON ontology. The following
lines show an interactive Python session getting the URIs for two terms.
&gt;&gt;&gt; UBERON.lung
'http://purl.obolibrary.org/obo/UBERON_0002048'
&gt;&gt;&gt; UBERON.lobe_of_liver
'http://purl.obolibrary.org/obo/UBERON_0001113'
Any Python development environment with code completion will assist in locating
the desired term. For instance, typing “UBERON.lung”&lt;TAB&gt; within python-mode in
the emacs editor brings up a list of 17 term labels beginning with the substring “lung.”</p>
      <p>This tool includes the ability to download an OBO Foundry ontology from its URI
and generate the Python class representation of that ontology. We do not intend users to
be required to initiate the download of ontologies and conversion to Python classes.
Rather, the information used by this tool (at present only URIs and their labels) can be
extracted and cached in a simpler format for distribution.</p>
      <p>UBERON, for example, contains over 15,000 classes, and its OWL file is over
65MB in size. Rather than either (1) downloading this OWL file at the time the UBERON
Python class is imported, or (2) distributing entire OWL files for all OBO ontologies as
part of this tool, we can instead generate the Python classes and serialize those as
compressed pickle files to distribute with this package. These compressed files take up
very little space – 4KB for the UBERON Python class.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion</title>
      <p>We have built a simple tool that vastly simplifies working with terms in OBO
Foundry ontologies from within Python programs. This has been tested and used so far
with a handful of ontologies, but it is still under active development and has remaining
issues to be addressed. One is its handling of imports within ontology files. Another issue
will be the release schedule for versions of this tool: since it consists mainly of Python
classes built automatically from OBO Foundry ontologies, it will need to be updated
regularly as those ontologies are updated. We will target an automated monthly update.
Planned extensions to the basic functionality of this tool include the ability to search term
labels from within code, as well as possibly the option to search additional attributes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>The</surname>
            <given-names>OBO</given-names>
          </string-name>
          Technical Working Group. http://www.obofoundry.org/principles/fp-003-uris.html
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Mungall</surname>
            <given-names>CJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torniai</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkoutos</surname>
            <given-names>GV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            <given-names>SE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haendel</surname>
            <given-names>MA</given-names>
          </string-name>
          .
          <article-title>Uberon, an integrative multi-species anatomy ontology</article-title>
          .
          <source>Genome biology</source>
          .
          <source>2012 Jan</source>
          <volume>1</volume>
          ;
          <issue>13</issue>
          (
          <issue>1</issue>
          ):
          <fpage>R5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Sharma</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarbox</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurc</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bona</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kathiravelu</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bremer</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saltz</surname>
            <given-names>JH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prior</surname>
            <given-names>F.</given-names>
          </string-name>
          <article-title>PRISM: A Platform for Imaging in Precision Medicine</article-title>
          .
          <source>JCO Clinical Cancer Informatics</source>
          .
          <source>2020 Jun;</source>
          <volume>4</volume>
          :
          <fpage>491</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>