Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 1 A Falsification approach to create and check ontology definitions Citlalli Mejía-Almonte, Julio Collado-Vides Computational Genomics Program Center for Genomic Sciences, UNAM Cuernavaca, México Abstract— One of the most important features of ontological representation of knowledge is the possibility of creating formal II. METHODS definitions that allow automatic reasoning. Reasoning in Ontological primitive classes are described only by ontologies is based on symbolic logic representation. This necessary conditions, whereas defined classes are described by requires that ontological definitions state either necessary necessary and sufficient conditions [6]. Necessary and conditions or necessary and sufficient conditions. Here we sufficient conditions are explained in terms of the conditional propose a manual approach to review the necessity and sufficiency of ontological definitions that can be used to analyze logical relation. Let A be a class or concept and let P be some the most prominent concepts of a domain. property. There are many language items to refer to this [7]: • A only if P; if A, then P; P is necessary for A; and A is Keywords—falsification; ontology definition; necessary and sufficient for P. sufficient conditions Any of these statements means that all instances of A I. INTRODUCTION satisfy property P, or that for all objects of the universe, if some satisfies P then it is an instance of A. When this logical Since the publication of the Gene Ontology, Biomedical condition holds in both directions, that is: ontologies have thrived. As a result, a growing number of ontologies are created to represent all aspects of the biological • A is necessary and sufficient condition for B and B is world. Currently there are 182 ontologies in OntoBee [1] and necessary and sufficient condition for A 716 in BioPortal [2], the OBO foundry [3] and the NCBO [4] We say that A means B, or A is equivalent to B. This ontology repositories respectively. Some of these ontologies relation of equivalency is the one we look for to make are foundational, for they are species-independent models ontological definitions. aimed to be reused in or extended by species-specific ontologies. Although categorization of ontologies into species Necessity of P is proved by demonstrating that all instances dependent and species independent is not straightforward if of A have property P. However, demonstration of necessity is authors have not established it in the scope description, we epistemologically impossible in experimental sciences, even found 57 species-independent, 36 taxonomically restricted (at assuming an agent with the complete knowledge of the current higher taxonomic ranges), 19 whose scope does not include state of affairs. Thus, we took a falsification approach [8]. biological entities, and 63 species-specific ontologies in OntoBee. When authors did not specify taxonomic range, this • We can disprove sufficiency by finding some object that classification was based on the next criteria: species- has property P and does not belong to A. independent if the ontology includes classes representing • We can disprove necessity by finding some instance of A organisms of more than one kingdom, and species-specific if that does not hold property P. the ontology is human-centric. Based on this, we propose the following workflow to This large set of computational models can provide the analyze necessity and sufficiency of proposed definitions: means for automatic reasoning to generate mechanistic hypothesis for the biomedical research [5]. However, • Retrieve definitions from diverse sources such as the foundational, species-independent ontologies must have formal literature and extant ontologies. definitions general enough to support pertinent inferences throughout all kingdoms of life. • Based on the retrieved definitions, generate a list of the Here we present a manual approach to check the suitability commonly used properties to define these concepts. of necessity and sufficiency of ontological definitions for the current state of affairs in biological sciences. This allowed us • Search counter examples for definitions to discard to find out that if we consider natural language definitions of necessity or sufficiency of the defining properties. extant foundational ontologies as necessary and sufficient conditions, some prokaryotic instances may be left out. ICBO 2018 August 7-10, 2018 1 Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA 2 • Keep those properties that were not falsified to generate a We are currently applying this approach to generate an new definition. ontology on prokaryotic gene regulation. In the process, we are reviewing the applicability of definitions of the existing III. RESULTS ontologies. This step-by-step workflow can ease up the involvement of domain-experts in the generation of logically- As a matter of example, we apply this approach to the sound ontological definitions based on ontological realism. definition of bacterial promoter in the sequence ontology (SO) However, we have not planned any training session to help [9]. The following are the two relevant definitions extracted other groups to check their ontological definitions. from this ontology in July 2018: IV. LIMITATIONS • Promoter: A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the basal This approach can be useful to apply OBO principle of transcription machinery maintenance [3]. However, as it requires huge human effort, we believe it could be applied in a top-down approach to check o Bacterial RNA-polymerase promoter: A DNA for the necessity and sufficiency of the most general or sequence to which bacterial RNA polymerase prominent concepts of a domain. binds, to begin transcription. ACKNOWLEDGMENT Bacterial RNA-polymerase promoter is a subclass of promoter. Thus, the list of properties that define a Bacterial C.M.A. is a doctoral student from Programa de Doctorado RNA-polymerase promoter is: en Ciencias Biomédicas, UNAM, receives fellowship 576333 from CONACYT and received financial aid from Programa de • has part some TSS Apoyos para Estudios de Posgrado (PAEP) for this conference. • has part some basal TF binding sites REFERENCES • initiates some transcription [1] Xiang Z, Mungall C, Ruttenberg A, He Y. Ontobee: A Linked Data • binds some RNA polymerase Server and Browser for Ontology Terms. Proceedings of the 2nd International Conference on Biomedical Ontologies (ICBO), July 28-30, If we assume that basal transcription factor (TF), which is a 2011, Buffalo, NY, USA. Pages 279-281. URL: http://ceur-ws.org/Vol- term most commonly used in the domain of eukaryotic gene 833/paper48.pdf. regulation, is equivalent to the most common sense in which [2] Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., & Musen, M. A. (2009). BioPortal: ontologies and integrated data transcription factor term is used in the domain of prokaryotic resources at the click of a mouse. Nucleic acids research, 37(suppl_2), gene regulation, then "has part basal TF binding site" is not a W170-W173. necessary condition, since we can find counter examples in [3] Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., ... constitutive promoter sequences [10] that transcribe without & Leontis, N. (2007). The OBO Foundry: coordinated evolution of the need of any transcription factor, and promoters of ontologies to support biomedical data integration. Nature biotechnology, endosymbionts, whose reduced genome has been found to have 25(11), 1251. lost most of the regulation by means of transcription factors [4] Musen, M. A., Noy, N. F., Shah, N. H., Whetzel, P. L., Chute, C. G., Story, M. A., ... & NCBO team. (2011). The national center for [11]. On the other hand, from the biological point of view the biomedical ontology. Journal of the American Medical Informatics closest to those "basal TFs" would be sigma factors. In this Association, 19(2), 190-195. case, definition is correct and just have to be more explicitly [5] Hunter, L. E. (2018). Mechanistic hypothesis generation in molecular specified in the definition. biology: A grand challenge for knowledge-based reasoning. [6] Horridge, M., Knublauch, H., Rector, A., Stevens, R., & Wroe, C. A. Automatic logical consistency check is not suitable to (2004). A practical guide to building OWL ontologies using the Protégé- detect these lack of generality OWL plugin and CO-ODE tools edition 1.0. University of Manchester. [7] Devlin, K. Introduction to Mathematical Thinking. [Week 2: We are aware that logical consistency is one of the main Equivalence] MOOC offered by Stanford University through Coursera. applications of automatic reasoning [12]. However, the Retrieved June 30th, 2018 from necessity of a restriction is more an issue of ontological https://www.coursera.org/learn/mathematical- commitment [13] that would be dropping out some class thinking/lecture/A5msF/lecture-4-equivalence instances, owing to the lack of generality of definitions. [8] Popper, Karl. The logic of scientific discovery. Routledge, 2005. [9] Mungall, Christopher J., Colin Batchelor, and Karen Eilbeck. "Evolution That is, if, in the first assumption scenario (i.e., basal of the Sequence Ontology terms and relationships." Journal of transcription factors are bacterial transcription factors), we biomedical informatics 44.1 (2011): 87-93. reuse the current conceptualization of SO and then create an [10] Liang, S-T., et al. "Activities of constitutive promoters in Escherichia instance or a subclass representing a specific promoter lacking coli1." Journal of molecular biology 292.1 (1999): 19-37. the TF binding site constraint, either no logical inconsistency [11] Miravet-Verde, S., Lloréns-Rico, V., & Serrano, L. (2017). Alternative will rise owing to the open world assumption [6] or the transcriptional regulation in genome-reduced bacteria. Current opinion in microbiology, 39, 89-95. reasoner will fail to infer the subsuming relation and we are [12] Hunter, L. E. Knowledge-based biomedical Data Science. Data Science, going lose track of this entity as a promoter. (Preprint), 1-7. [13] Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In Handbook on ontologies (pp. 1-17). Springer, Berlin, Heidelberg ICBO 2018 August 7-10, 2018 2