Towards an ontology for automatic scientific
                    discovery ?

                      Tezira Wanyana1 and Deshendran Moodley1,2
                  1
                    University of Cape Town, Cape Town, South Africa
           2
               Center for Artificial Intelligence Research (CAIR), South Africa
                             {twanyana, deshen}@cs.uct.ac.za


        Abstract. While some attempts have been made to automate the scien-
        tific discovery process in specific domains, these approaches have limited
        support for formal representation and reasoning about observations and
        phenomena. This research aims to create a generic formal ontology to
        support an intelligent agent for observation induced knowledge discov-
        ery.

        Keywords: Agents · ontologies · Automatic Hypothesis Generation.


    Introduction: One of the goals of intelligent agents is to learn and adapt
to a dynamic environment. An agent typically takes in observations from its en-
vironment, identifies anomalous observations, i.e. unexpected observations, and
determines whether the anomaly is indicative of a new phenomena or a change
in the environment. If this is the case the agent’s goal is to generate and evaluate
a hypothesis as an attempt to explain the underlying causal mechanism for this
phenomenon. A first step towards designing such agents is to settle on a formal
language or ontology for representing and reasoning about hypotheses. In this
research, we explore the requirements for such an ontology.

  Existing Approaches: Some attempts have been made to formalize the
representation of hypotheses using ontologies, e.g. the Robot Scientist[3] uses
LABORS (LABoratory Ontology for Robot Scientists) and the DISK system[2]
uses the DISK ontology. An attempt is made in [4] to link research statements
to associated probabilities using the HELO ontology. There are other hypothesis
representation models analysed in [1]. In this analysis, only the DISK ontology
attempts to cater for most of the aspects except hypothesis classification which
checked if a taxonomy of hypothesis statements is supported. The DISK ontol-
ogy and the other ontologies are not based on phenomena-triggered hypothesis
generation and hence do not represent some of the key hypothesis elements of hy-
pothesis generation and evaluation. For example, the phenomena that triggered
the hypothesis and its detection mechanism. However, some of the elements pre-
sented and lessons learned will be used to design a formal representation for
hypothesis generation and evaluation.
?
    Supported by Center for Artificial Intelligence Research (CAIR) and Hasso-Plattner-
    Institut (HPI)


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0)
2         T. Wanyana and D. Moodley

       Table 1. Summary of the core elements represented in previous ontologies

    Element                  LABORS            DISK                HELO
    Phenomena      detection No,    hypotheses No, initial hypoth- No
    mechanism                are from back- esis is provided by
                             ground knowledge the user
    Triggering phenomenon No                   Yes, in form of ev- No
                                               idence for revised
                                               hypotheses
    Hypothesis    Statement Predicates         RDF Triples         Predicates
    Representation
    Hypothesis Qualifier     No                Yes(confidence      Yes(Probability)
                                               level)
    Hypothesis     appraisal No                No                  No
    mechanism and unsuc-
    cessful hypotheses


  A Hypothesis Ontology; Core Requirements: Hypotheses and their se-
mantic meaning have to be consistently and precisely represented to aid reusabil-
ity and reproducibility [1]. We suggest that the following top level elements as
the core requirements for the representation: 1) The Hypothesis statement: an
assertion of the explanation of the underlying causal mechanism of the phe-
nomenon. 2) The hypothesis Qualifier: the probability value that represents the
agent’s belief of the extent to which the hypothesis explains the observed phe-
nomenon. 3) Triggering Phenomenon: the phenomenon for which the hypothesis
was generated. 4) The Provenance Record: This consists of the phenomenon de-
tection mechanism, the qualifier threshold used in hypothesis selection and the
hypothesis appraisal mechanism used in selecting the most plausible hypothe-
ses. 5)Unsuccessful Hypotheses: These are the competing alternatives that are
unsuccessful. Table 1 shows some of the required elements and which hypothesis
representation ontology has catered for them.

 Conclusion: In conclusion, we have presented some of the core elements to-
wards a generic formal ontology for automatically generating hypotheses to ex-
plain new phenomena in some environment.


References
1. Garijo, D., Gil, Y., Ratnakar, V.: The disk hypothesis ontology: Capturing hypoth-
   esis evolution for automated discovery. In: K-CAP Workshops. pp. 40–46 (2017)
2. Gil, Y., Garijo, D., Ratnakar, V., Mayani, R., Adusumilli, R., Boyce, H., Mallick, P.:
   Automated hypothesis testing with large scientific data repositories. In: Proceedings
   of the Fourth Annual Conference on Advances in Cognitive Systems (ACS). pp. 1–6
   (2016)
3. King, R.D., Rowland, J., Aubrey, W., Liakata, M., Markham, M., Soldatova, L.N.,
   Whelan, K.E., Clare, A., Young, M., Sparkes, A., et al.: The robot scientist adam.
   Computer 42(8), 46–54 (2009)
                       Towards an ontology for automatic scientific discovery       3

4. Soldatova, L.N., Rzhetsky, A., De Grave, K., King, R.D.: Representation of prob-
   abilistic scientific knowledge. In: Journal of biomedical semantics. vol. 4, p. S7.
   BioMed Central (2013)