=Paper= {{Paper |id=Vol-2285/ICBO_2018_paper_16 |storemode=property |title=A Falsification Approach to Create and Check Ontology Definitions |pdfUrl=https://ceur-ws.org/Vol-2285/ICBO_2018_paper_16.pdf |volume=Vol-2285 |authors=Citlalli Mejía,Julio Collado-Vides |dblpUrl=https://dblp.org/rec/conf/icbo/MejiaC18 }} ==A Falsification Approach to Create and Check Ontology Definitions== https://ceur-ws.org/Vol-2285/ICBO_2018_paper_16.pdf
       Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                          1




A Falsification approach to create and check ontology
                     definitions
                                               Citlalli Mejía-Almonte, Julio Collado-Vides
                                                      Computational Genomics Program
                                                     Center for Genomic Sciences, UNAM
                                                              Cuernavaca, México


    Abstract— One of the most important features of ontological
representation of knowledge is the possibility of creating formal                                         II. METHODS
definitions that allow automatic reasoning. Reasoning in                           Ontological primitive classes are described only by
ontologies is based on symbolic logic representation. This                     necessary conditions, whereas defined classes are described by
requires that ontological definitions state either necessary                   necessary and sufficient conditions [6]. Necessary and
conditions or necessary and sufficient conditions. Here we
                                                                               sufficient conditions are explained in terms of the conditional
propose a manual approach to review the necessity and
sufficiency of ontological definitions that can be used to analyze             logical relation. Let A be a class or concept and let P be some
the most prominent concepts of a domain.                                       property. There are many language items to refer to this [7]:
                                                                               • A only if P; if A, then P; P is necessary for A; and A is
    Keywords—falsification; ontology definition; necessary and                   sufficient for P.
sufficient conditions
                                                                                   Any of these statements means that all instances of A
                        I. INTRODUCTION                                        satisfy property P, or that for all objects of the universe, if
                                                                               some satisfies P then it is an instance of A. When this logical
    Since the publication of the Gene Ontology, Biomedical                     condition holds in both directions, that is:
ontologies have thrived. As a result, a growing number of
ontologies are created to represent all aspects of the biological              • A is necessary and sufficient condition for B and B is
world. Currently there are 182 ontologies in OntoBee [1] and                     necessary and sufficient condition for A
716 in BioPortal [2], the OBO foundry [3] and the NCBO [4]                         We say that A means B, or A is equivalent to B. This
ontology repositories respectively. Some of these ontologies                   relation of equivalency is the one we look for to make
are foundational, for they are species-independent models                      ontological definitions.
aimed to be reused in or extended by species-specific
ontologies. Although categorization of ontologies into species                     Necessity of P is proved by demonstrating that all instances
dependent and species independent is not straightforward if                    of A have property P. However, demonstration of necessity is
authors have not established it in the scope description, we                   epistemologically impossible in experimental sciences, even
found 57 species-independent, 36 taxonomically restricted (at                  assuming an agent with the complete knowledge of the current
higher taxonomic ranges), 19 whose scope does not include                      state of affairs. Thus, we took a falsification approach [8].
biological entities, and 63 species-specific ontologies in
OntoBee. When authors did not specify taxonomic range, this                    • We can disprove sufficiency by finding some object that
classification was based on the next criteria: species-                          has property P and does not belong to A.
independent if the ontology includes classes representing                      • We can disprove necessity by finding some instance of A
organisms of more than one kingdom, and species-specific if                      that does not hold property P.
the ontology is human-centric.
                                                                                  Based on this, we propose the following workflow to
    This large set of computational models can provide the                     analyze necessity and sufficiency of proposed definitions:
means for automatic reasoning to generate mechanistic
hypothesis for the biomedical research [5]. However,                           •    Retrieve definitions from diverse sources such as the
foundational, species-independent ontologies must have formal                       literature and extant ontologies.
definitions general enough to support pertinent inferences
throughout all kingdoms of life.                                               •    Based on the retrieved definitions, generate a list of the
    Here we present a manual approach to check the suitability                      commonly used properties to define these concepts.
of necessity and sufficiency of ontological definitions for the
current state of affairs in biological sciences. This allowed us               •    Search counter examples for definitions to discard
to find out that if we consider natural language definitions of                     necessity or sufficiency of the defining properties.
extant foundational ontologies as necessary and sufficient
conditions, some prokaryotic instances may be left out.




       ICBO 2018                                                   August 7-10, 2018                                                       1
        Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                                 2


•   Keep those properties that were not falsified to generate a                     We are currently applying this approach to generate an
    new definition.                                                             ontology on prokaryotic gene regulation. In the process, we are
                                                                                reviewing the applicability of definitions of the existing
                           III. RESULTS                                         ontologies. This step-by-step workflow can ease up the
                                                                                involvement of domain-experts in the generation of logically-
    As a matter of example, we apply this approach to the                       sound ontological definitions based on ontological realism.
definition of bacterial promoter in the sequence ontology (SO)                  However, we have not planned any training session to help
[9]. The following are the two relevant definitions extracted                   other groups to check their ontological definitions.
from this ontology in July 2018:
                                                                                                     IV. LIMITATIONS
•   Promoter: A regulatory_region composed of the TSS(s)
    and binding sites for TF_complexes of the basal                                 This approach can be useful to apply OBO principle of
    transcription machinery                                                     maintenance [3]. However, as it requires huge human effort,
                                                                                we believe it could be applied in a top-down approach to check
          o     Bacterial RNA-polymerase promoter: A DNA                        for the necessity and sufficiency of the most general or
                sequence to which bacterial RNA polymerase                      prominent concepts of a domain.
                binds, to begin transcription.
                                                                                                          ACKNOWLEDGMENT
   Bacterial RNA-polymerase promoter is a subclass of
promoter. Thus, the list of properties that define a Bacterial                     C.M.A. is a doctoral student from Programa de Doctorado
RNA-polymerase promoter is:                                                     en Ciencias Biomédicas, UNAM, receives fellowship 576333
                                                                                from CONACYT and received financial aid from Programa de
    •     has part some TSS                                                     Apoyos para Estudios de Posgrado (PAEP) for this conference.
    •     has part some basal TF binding sites
                                                                                                              REFERENCES
    •     initiates some transcription
                                                                                [1]  Xiang Z, Mungall C, Ruttenberg A, He Y. Ontobee: A Linked Data
    •     binds some RNA polymerase                                                  Server and Browser for Ontology Terms. Proceedings of the 2nd
                                                                                     International Conference on Biomedical Ontologies (ICBO), July 28-30,
    If we assume that basal transcription factor (TF), which is a                    2011, Buffalo, NY, USA. Pages 279-281. URL: http://ceur-ws.org/Vol-
term most commonly used in the domain of eukaryotic gene                             833/paper48.pdf.
regulation, is equivalent to the most common sense in which                     [2] Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N.,
                                                                                     & Musen, M. A. (2009). BioPortal: ontologies and integrated data
transcription factor term is used in the domain of prokaryotic                       resources at the click of a mouse. Nucleic acids research, 37(suppl_2),
gene regulation, then "has part basal TF binding site" is not a                      W170-W173.
necessary condition, since we can find counter examples in                      [3] Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., ...
constitutive promoter sequences [10] that transcribe without                         & Leontis, N. (2007). The OBO Foundry: coordinated evolution of
the need of any transcription factor, and promoters of                               ontologies to support biomedical data integration. Nature biotechnology,
endosymbionts, whose reduced genome has been found to have                           25(11), 1251.
lost most of the regulation by means of transcription factors                   [4] Musen, M. A., Noy, N. F., Shah, N. H., Whetzel, P. L., Chute, C. G.,
                                                                                     Story, M. A., ... & NCBO team. (2011). The national center for
[11]. On the other hand, from the biological point of view the                       biomedical ontology. Journal of the American Medical Informatics
closest to those "basal TFs" would be sigma factors. In this                         Association, 19(2), 190-195.
case, definition is correct and just have to be more explicitly                 [5] Hunter, L. E. (2018). Mechanistic hypothesis generation in molecular
specified in the definition.                                                         biology: A grand challenge for knowledge-based reasoning.
                                                                                [6] Horridge, M., Knublauch, H., Rector, A., Stevens, R., & Wroe, C.
A. Automatic logical consistency check is not suitable to                            (2004). A practical guide to building OWL ontologies using the Protégé-
    detect these lack of generality                                                  OWL plugin and CO-ODE tools edition 1.0. University of Manchester.
                                                                                [7] Devlin, K. Introduction to Mathematical Thinking. [Week 2:
    We are aware that logical consistency is one of the main                         Equivalence] MOOC offered by Stanford University through Coursera.
applications of automatic reasoning [12]. However, the                               Retrieved            June            30th,          2018           from
necessity of a restriction is more an issue of ontological                           https://www.coursera.org/learn/mathematical-
commitment [13] that would be dropping out some class                                thinking/lecture/A5msF/lecture-4-equivalence
instances, owing to the lack of generality of definitions.                      [8] Popper, Karl. The logic of scientific discovery. Routledge, 2005.
                                                                                [9] Mungall, Christopher J., Colin Batchelor, and Karen Eilbeck. "Evolution
    That is, if, in the first assumption scenario (i.e., basal                       of the Sequence Ontology terms and relationships." Journal of
transcription factors are bacterial transcription factors), we                       biomedical informatics 44.1 (2011): 87-93.
reuse the current conceptualization of SO and then create an                    [10] Liang, S-T., et al. "Activities of constitutive promoters in Escherichia
instance or a subclass representing a specific promoter lacking                      coli1." Journal of molecular biology 292.1 (1999): 19-37.
the TF binding site constraint, either no logical inconsistency                 [11] Miravet-Verde, S., Lloréns-Rico, V., & Serrano, L. (2017). Alternative
will rise owing to the open world assumption [6] or the                              transcriptional regulation in genome-reduced bacteria. Current opinion
                                                                                     in microbiology, 39, 89-95.
reasoner will fail to infer the subsuming relation and we are
                                                                                [12] Hunter, L. E. Knowledge-based biomedical Data Science. Data Science,
going lose track of this entity as a promoter.                                       (Preprint), 1-7.
                                                                                [13] Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In
                                                                                     Handbook on ontologies (pp. 1-17). Springer, Berlin, Heidelberg




        ICBO 2018                                                   August 7-10, 2018                                                              2