<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Falsification approach to create and check ontology definitions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Citlalli Mejía-Almonte</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Collado-Vides</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Genomics Program Center for Genomic Sciences</institution>
          ,
          <addr-line>UNAM Cuernavaca</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>7</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>- One of the most important features of ontological representation of knowledge is the possibility of creating formal definitions that allow automatic reasoning. Reasoning in ontologies is based on symbolic logic representation. This requires that ontological definitions state either necessary conditions or necessary and sufficient conditions. Here we propose a manual approach to review the necessity and sufficiency of ontological definitions that can be used to analyze the most prominent concepts of a domain.</p>
      </abstract>
      <kwd-group>
        <kwd>falsification</kwd>
        <kwd>ontology definition</kwd>
        <kwd>necessary and sufficient conditions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Since the publication of the Gene Ontology, Biomedical
ontologies have thrived. As a result, a growing number of
ontologies are created to represent all aspects of the biological
world. Currently there are 182 ontologies in OntoBee [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
716 in BioPortal [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the OBO foundry [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the NCBO [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
ontology repositories respectively. Some of these ontologies
are foundational, for they are species-independent models
aimed to be reused in or extended by species-specific
ontologies. Although categorization of ontologies into species
dependent and species independent is not straightforward if
authors have not established it in the scope description, we
found 57 species-independent, 36 taxonomically restricted (at
higher taxonomic ranges), 19 whose scope does not include
biological entities, and 63 species-specific ontologies in
OntoBee. When authors did not specify taxonomic range, this
classification was based on the next criteria:
speciesindependent if the ontology includes classes representing
organisms of more than one kingdom, and species-specific if
the ontology is human-centric.
      </p>
      <p>
        This large set of computational models can provide the
means for automatic reasoning to generate mechanistic
hypothesis for the biomedical research [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However,
foundational, species-independent ontologies must have formal
definitions general enough to support pertinent inferences
throughout all kingdoms of life.
      </p>
      <p>Here we present a manual approach to check the suitability
of necessity and sufficiency of ontological definitions for the
current state of affairs in biological sciences. This allowed us
to find out that if we consider natural language definitions of
extant foundational ontologies as necessary and sufficient
conditions, some prokaryotic instances may be left out.</p>
      <p>
        Ontological primitive classes are described only by
necessary conditions, whereas defined classes are described by
necessary and sufficient conditions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Necessary and
sufficient conditions are explained in terms of the conditional
logical relation. Let A be a class or concept and let P be some
property. There are many language items to refer to this [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]:
      </p>
      <p>A only if P; if A, then P; P is necessary for A; and A is
sufficient for P.</p>
      <p>Any of these statements means that all instances of A
satisfy property P, or that for all objects of the universe, if
some satisfies P then it is an instance of A. When this logical
condition holds in both directions, that is:</p>
      <p>A is necessary and sufficient condition for B and B is
necessary and sufficient condition for A</p>
      <p>
        Necessity of P is proved by demonstrating that all instances
of A have property P. However, demonstration of necessity is
epistemologically impossible in experimental sciences, even
assuming an agent with the complete knowledge of the current
state of affairs. Thus, we took a falsification approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>We can disprove sufficiency by finding some object that
has property P and does not belong to A.</p>
      <p>We can disprove necessity by finding some instance of A
that does not hold property P.</p>
      <p>Based on this, we propose the following workflow to
analyze necessity and sufficiency of proposed definitions:
Retrieve definitions from diverse sources such as the
literature and extant ontologies.</p>
      <p>Based on the retrieved definitions, generate a list of the
commonly used properties to define these concepts.
Search counter examples for definitions to discard
necessity or sufficiency of the defining properties.
•
•
Keep those properties that were not falsified to generate a
new definition.</p>
    </sec>
    <sec id="sec-2">
      <title>III. RESULTS</title>
      <p>
        As a matter of example, we apply this approach to the
definition of bacterial promoter in the sequence ontology (SO)
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The following are the two relevant definitions extracted
from this ontology in July 2018:
      </p>
      <p>Promoter: A regulatory_region composed of the TSS(s)
and binding sites for TF_complexes of the basal
transcription machinery
o</p>
      <p>Bacterial RNA-polymerase promoter: A DNA
sequence to which bacterial RNA polymerase
binds, to begin transcription.</p>
      <p>Bacterial RNA-polymerase promoter is a subclass of
promoter. Thus, the list of properties that define a Bacterial
RNA-polymerase promoter is:
•
•
•
•
has part some TSS
has part some basal TF binding sites
initiates some transcription
binds some RNA polymerase</p>
      <p>
        If we assume that basal transcription factor (TF), which is a
term most commonly used in the domain of eukaryotic gene
regulation, is equivalent to the most common sense in which
transcription factor term is used in the domain of prokaryotic
gene regulation, then "has part basal TF binding site" is not a
necessary condition, since we can find counter examples in
constitutive promoter sequences [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] that transcribe without
the need of any transcription factor, and promoters of
endosymbionts, whose reduced genome has been found to have
lost most of the regulation by means of transcription factors
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. On the other hand, from the biological point of view the
closest to those "basal TFs" would be sigma factors. In this
case, definition is correct and just have to be more explicitly
specified in the definition.
      </p>
      <p>A. Automatic logical consistency check is not suitable to
detect these lack of generality</p>
      <p>
        We are aware that logical consistency is one of the main
applications of automatic reasoning [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. However, the
necessity of a restriction is more an issue of ontological
commitment [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] that would be dropping out some class
instances, owing to the lack of generality of definitions.
      </p>
      <p>
        That is, if, in the first assumption scenario (i.e., basal
transcription factors are bacterial transcription factors), we
reuse the current conceptualization of SO and then create an
instance or a subclass representing a specific promoter lacking
the TF binding site constraint, either no logical inconsistency
will rise owing to the open world assumption [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or the
reasoner will fail to infer the subsuming relation and we are
going lose track of this entity as a promoter.
      </p>
      <p>We are currently applying this approach to generate an
ontology on prokaryotic gene regulation. In the process, we are
reviewing the applicability of definitions of the existing
ontologies. This step-by-step workflow can ease up the
involvement of domain-experts in the generation of
logicallysound ontological definitions based on ontological realism.
However, we have not planned any training session to help
other groups to check their ontological definitions.</p>
    </sec>
    <sec id="sec-3">
      <title>IV. LIMITATIONS</title>
      <p>
        This approach can be useful to apply OBO principle of
maintenance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, as it requires huge human effort,
we believe it could be applied in a top-down approach to check
for the necessity and sufficiency of the most general or
prominent concepts of a domain.
      </p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGMENT</title>
      <p>C.M.A. is a doctoral student from Programa de Doctorado
en Ciencias Biomé dicas, UNAM, receives fellowship 576333
from CONACYT and received financial aid from Programa de
Apoyos para Estudios de Posgrado (PAEP) for this conference.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Xiang</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mungall</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruttenberg</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ontobee</surname>
          </string-name>
          :
          <article-title>A Linked Data Server and Browser for Ontology Terms</article-title>
          .
          <source>Proceedings of the 2nd International Conference on Biomedical Ontologies (ICBO)</source>
          ,
          <source>July 28-30</source>
          ,
          <year>2011</year>
          , Buffalo,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA. Pages 279-
          <fpage>281</fpage>
          . URL: http://ceur-ws.org/Vol833/paper48.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>P. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorf</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Griffith</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>BioPortal: ontologies and integrated data resources at the click of a mouse</article-title>
          .
          <source>Nucleic acids research</source>
          ,
          <volume>37</volume>
          (
          <issue>suppl</issue>
          _2),
          <fpage>W170</fpage>
          -
          <lpage>W173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosse</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bug</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceusters</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Leontis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>
          .
          <source>Nature biotechnology</source>
          ,
          <volume>25</volume>
          (
          <issue>11</issue>
          ),
          <fpage>1251</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>P. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chute</surname>
            ,
            <given-names>C. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Story</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          , ... &amp; NCBO team. (
          <year>2011</year>
          ).
          <article-title>The national center for biomedical ontology</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <fpage>190</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>L. E.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Mechanistic hypothesis generation in molecular biology: A grand challenge for knowledge-based reasoning</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knublauch</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rector</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>A practical guide to building OWL ontologies using the Proté gé - OWL plugin and CO-ODE tools edition 1.0</article-title>
          . University of Manchester.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>K</given-names>
          </string-name>
          . Introduction to Mathematical Thinking.
          <source>[Week</source>
          <volume>2</volume>
          : Equivalence] MOOC offered by Stanford University through Coursera.
          <source>Retrieved June 30th</source>
          ,
          <year>2018</year>
          from https://www.coursera.org/learn/mathematicalthinking/lecture/A5msF/lecture-4-equivalence
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Popper</surname>
            ,
            <given-names>Karl.</given-names>
          </string-name>
          <article-title>The logic of scientific discovery</article-title>
          .
          <source>Routledge</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>Christopher J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colin</surname>
            <given-names>Batchelor</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Karen</given-names>
            <surname>Eilbeck</surname>
          </string-name>
          .
          <article-title>"Evolution of the Sequence Ontology terms and relationships</article-title>
          .
          <source>" Journal of biomedical informatics 44.1</source>
          (
          <year>2011</year>
          ):
          <fpage>87</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <surname>S-T.</surname>
          </string-name>
          , et al.
          <article-title>"Activities of constitutive promoters in Escherichia coli1</article-title>
          .
          <source>" Journal of molecular biology 292.1</source>
          (
          <year>1999</year>
          ):
          <fpage>19</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Miravet-Verde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <article-title>Lloré ns-</article-title>
          <string-name>
            <surname>Rico</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Serrano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Alternative transcriptional regulation in genome-reduced bacteria</article-title>
          .
          <source>Current opinion in microbiology</source>
          ,
          <volume>39</volume>
          ,
          <fpage>89</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>L. E.</given-names>
          </string-name>
          <article-title>Knowledge-based biomedical Data Science</article-title>
          .
          <source>Data Science, (Preprint)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Guarino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oberle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>What is an ontology?</article-title>
          . In Handbook on ontologies (pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          ). Springer, Berlin, Heidelberg
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>