K-CAP’17 SciKnow, December 2017, Austin, TX, USA                                                                             Garijo et al


                                       The DISK Hypothesis Ontology:
            Capturing Hypothesis Evolution for Automated Discovery
                                           Daniel Garijo, Yolanda Gil and Varun Ratnakar
                  Information Sciences Institute, University of Southern California, Marina del Rey, CA, U.S.A
                                                  {dgarijo, gil, varunr}@isi.edu


ABSTRACT                                                              Creating machine readable representations of research hypotheses
                                                                      would facilitate the organization and management of the
Automated discovery systems can formulate and revise                  literature. To date there is not a standard way of capturing the
hypotheses by gathering and analyzing data. In order to generate      contents and context of a hypothesis to understand its evolution.
new hypotheses and provide explanations of their new findings,            Another important use of formal hypothesis representations is
these systems need a language to represent hypotheses, their          to enable automated discovery systems to do hypothesis testing
revisions, and their provenance. This paper describes the DISK        and revision. Autonomous discovery systems generate hypotheses
hypothesis ontology which fulfills these requirements. The paper      autonomously based on analysis of relevant data [Pankratius et al
then presents a survey of existing models for representing            2016; King 2017; Gil et al 2017].
hypotheses along with their features and tradeoffs. We compare            In this paper, we focus on hypothesis representations to
these hypothesis models in the context of automated discovery         capture hypothesis evolution in automated discovery systems. We
and hypothesis evolution.                                             discuss the requirements that we have found throughout work on
CCS CONCEPTS                                                          the DISK discovery system [Gil et al 2017]. We propose an
                                                                      ontology for hypothesis representation, and compare it to existing
• Information systems → Artificial intelligence; Knowledge            models for representing hypotheses.
representation and reasoning                                              The rest of the paper is organized as follows. Section 2
                                                                      describes the DISK automated discovery system, and introduces
KEYWORDS                                                              its hypothesis ontology. Section 3 introduces an evaluation
Hypothesis       representation,    hypothesis          evolution,    framework for existing models and overviews them. Section 4
nanopublications, micropublications, automated          discovery,    discusses the different alternatives for hypothesis representation,
ontologies.                                                           and Section 5 concludes the paper.

1 INTRODUCTION                                                        2 REPRESENTING HYPOTHESES IN THE
Formal representations of scientific hypotheses would be useful in    DISK AUTOMATED DISCOVERY SYSTEM
many contexts. For instance, in order to keep up with the latest      Our goal is to allow automated discovery systems to test
updates on a research area, scientists need to quickly understand     hypotheses provided by users, and revise them based on the
the contributions of an article and how it was derived from others.   results of running computational experiments autonomously.
However, the vast amount of new scientific publications makes             In prior work, we introduced an approach that captures
this task increasingly complex. If scientists represented             scientists’ strategies for pursuing hypotheses as lines of inquiry
hypotheses formally in publications, related literature could be      that specify the data to be retrieved, the experimental workflows
easily searched for hypotheses of interest. Alternatively, machine    to run, and how to combine the results to generate a revised
reading systems could also extract hypotheses from text in            confidence level and in some cases a revised hypothesis [Gil et al
articles, and generate these formal representations.                  2016]. This approach was implemented in the DISK framework
    Formal representations of hypotheses may also be used to          (Automated DIscovery of Scientific Knowledge) and
improve reproducibility. Community initiatives on reproducibility     demonstrated for cancer multi-omics [Gil et al 2017]. DISK is
promote registering hypotheses and methods before conducting          given a hypothesis statement, such as whether a protein is
the research [Munafo et al 2017]. Hypotheses are stated in textual    associated with a type of cancer, and returns either a confidence
form, which can express arbitrarily complex statements about          level on that hypothesis or a revised hypothesis that refers to a
hypotheses. However, text can be imprecise and ambiguous.             mutation of the protein or a more specific type of cancer. As new
                                                                      data becomes available, DISK re-runs the analysis and
                                                                      continuously revises the original hypothesis. DISK tracks the
K-CAP2017 Workshops and Tutorials Proceedings,                        provenance of revised hypotheses in terms of the original
© Copyright held by the owner/author(s).                              hypotheses and the data analyses that were carried out.
K-CAP’17 SciKnow, December 2017, Austin, TX, USA                                                                           Garijo et al.


Figure 1. Representing hypotheses in the DISK automated discovery system using the DISK hypothesis ontology. The initial
hypothesis statement HS1 is provided by the user. It is then tested through data analysis, which provides evidence HE2 for the
hypothesis, a new hypothesis statement HS1, and a qualification HQ2 with a confidence level L1. The revised hypothesis HG2 is a
revision of HG1, indicated by a link.


Figure 2. Representing hypothesis evolution in the DISK automated discovery system using the DISK hypothesis ontology. In this
example, additional data of two different types becomes available, causing the system to trigger two separate analyses whose results
are hard to combine. A revised hypothesis statement HS3 is added with a new confidence level L2 (included as part of HQ3) backed
by one of the analyses as evidence HE3. The other analysis HE4 qualifies HS3 with HQ4.
K-CAP’17 SciKnow, December 2017, Austin, TX, USA                                                                                Garijo et al

    DISK uses a representation of hypotheses that is needed to          3.1 Comparing hypothesis models
track their evolution. In DISK, a hypothesis consists of:
   1. A hypothesis statement, which is a set of structured              In our analysis, we consider the following key aspects, based on
assertions about entities in the domain. For example, that the          the representation presented in Section 2:
protein EGFR is associated with colon cancer.
                                                                        1.      Statement: Does the model have a representation for
   2. A hypothesis qualifier, which represents the veracity of the
hypothesis based on the data and the analyses done so far. A                    statements in a hypothesis?
typical qualifier is a numeric confidence level. For example, for       2.      Qualifier: Does the model have a means to qualify a
the hypothesis statement above we could have a confidence level                 hypothesis with a confidence level?
given by a p-value of 0.07.                                             3.      Evidence: Does the model describe the supporting evidence
   3. Hypothesis evidence, which is a record of the analyses that               for a hypothesis?
were carried out to test a hypothesis statement. For example, the       4.      History: Does the model represent the relationship between
evidence of a given hypothesis may include an analysis of mass                  hypothesis revisions?
spectrometry data for 25 patients with colon cancer and 25 healthy
controls followed by clustering, cluster metrics and binary             In addition, the following aspects are desirable for flexibility and
hypothesis testing.
                                                                        extensibility:
    4. A hypothesis history, which points to prior hypotheses that
were revised to generate the current one. In our example, a             5.      Classification: Does the vocabulary support a taxonomy of
hypothesis such as the association of protein EGFR with colon                   hypothesis statements?
cancer SubType A would link back to the original hypothesis             6.      Standards: Is the model defined using standards or does it
statement that protein EGFR is associated with colon cancer.                    use proprietary or idiosyncratic formats?
    DISK represents hypothesis statements as a graph, where the
nodes are the entities in the hypotheses and the links are their
relationships. In our work, a hypothesis statement is represented       3.2 Models for representing hypotheses
in RDF as a simple triple, and the triple is linked to its qualifier,
                                                                        This section introduces different approaches to represent
evidence, and history. All those assertions are also made in RDF.
                                                                        hypothesis at different levels of granularity. We group them based
The hypothesis evidence and hypothesis history both represent
                                                                        according to the level of detail at which they describe hypotheses:
different aspects of provenance for the hypothesis. This is             coarse-grained and fine-grained representations.
captured using the PROV provenance standard [Lebo et al 2013].
    Figure 1 illustrates this representation using the running          3.2.1         Coarse-grained hypothesis models
example with protein EGFR. The original hypothesis HG1 had its
own statement HS1 and evidence HE1. The revised hypothesis              We group under this section those vocabularies that include main
HG2 includes its statement HS2, its confidence level L1 (part of        concepts to identify hypotheses, but do not include the means to
the qualifier HQ2), its evidence HE2, and a link to the original        qualify them or describe them at a statement level. For example,
hypothesis HG1. A feature of this representation is the ability to      popular vocabularies like the Semantic Web for Earth and
model different confidence levels associated to a hypothesis            Environmental Terminology Ontology1(SWEET) [Raskin and
statement. This often happens when evidence is obtained from            Pan 2005] contain modules for defining hypotheses as
analyzing different types of data and it is unclear how to combine      “Experimental Activities”. Likewise, the Ontology for
the resulting confidence levels. Figure 2 shows an example. HS3         Biomedical Investigations (OBI)2 [Brandowski et al 2016] and
is qualified with two confidence reports (C2 and C3), which have        the Ontology for Clinical Research (OCRe)3 [Sim et al 2014]
different supporting evidence (HE3 and HE4) each resulting from         have concepts to refer to a hypothesis in the context of a
a different data source.                                                biological experiment.
    The DISK hypothesis ontology is available in OWL and                    Other vocabularies include terms to further describe
documented in [Garijo et al 2017]. A major focus of the DISK            hypotheses. The EXPO Ontology aims to define a model for
hypothesis ontology is capturing hypothesis evolution. The rest of      representing scientific experiments, "including generic knowledge
this paper focuses on comparing this ontology to other                  about scientific experimental design, methodology and results
representations of scientific hypotheses in the literature.             representation" [Soldatova and King, 2006]. The EXPO Ontology
                                                                        extends common upper level ontologies in order to bridge the gap
3 A SURVEY OF HYPOTHESIS                                                between domain specific experiment formalization and upper
REPRESENTATIONS                                                         level ontologies. EXPO aims at describing scientific papers, and
                                                                        has a specific part designed for the description of hypotheses. The
In this section we present a survey of existing models of scientific
hypotheses and assess their features to support automated
                                                                        1
discovery.                                                                  http://sweet.jpl.nasa.gov/2.3/reprSciModel.owl
                                                                        2
                                                                            http://purl.obolibrary.org/obo/OBI_0001908
                                                                        3
                                                                            http://purl.org/net/OCRe/OCRe.owl#OCRE400032


                                                                                                                                           3
K-CAP’17 SciKnow, December 2017, Austin, TX, USA                                                                              Garijo et al.

focus of EXPO is on how the hypothesis is defined on a research        sub:provenance { ##provenance of the assertion graph
                                                                        sub: hypothesisAssertion prov:generatedAtTime "2012-02-
paper (the "part of" relationship between the scientific experiment    03T14:38:00Z"^^xsd:dateTime ;
and the hypothesis), rather than identifying the statements                        ex:hasConfidenceReport ex:conf1.
                                                                                   prov:wasAttributedTo ex:experimentScientist .
contained by the hypothesis itself. However, different classes of      ex:conf1 a ex:ConfidenceReport;
                                                                                   ex:hasConfidenceLevel "0.6".
hypothesis are identified in the ontology (i.e., null hypothesis,                  prov:wasGeneratedBy ex:execution1.
research hypothesis and scientific hypothesis).                        }
                                                                       sub:pubInfo {##publication information of the user who
    Finally, the Linked Science Vocabulary 4 proposes a                performed the hypothesis
lightweight model to express support to hypothesis by some             : prov:generatedAtTime "2016-03-26T12:45:00Z"^^xsd:dateTime;
                                                                                   prov:wasAttributedTo ex:user1 .
research. A hypothesis is represented to make predictions about        }
facts, but it is not described at a statement level.
                                                                          The ovopublication model proposes a simple approach
3.2.2         Fine grained hypothesis models                           designed to capture the provenance of assertions [Callahan and
                                                                       Dumontier 2013]. When contrasted with nanopublications, "the
We group in this section those approaches that provide the means       ovopub is simpler as it consists of only a single named graph with
to represent in detail the statements belonging to a hypothesis,       key provenance information directly contained in and associated
along with their metadata.                                             with the ovopub graph" [Callahan and Dumontier 2013].
    LABORS [Soldatova and Rzhetsky 2011] is designed to                Ovopublications mix the notion of named graphs with reification
                                                                       to refer to the different components and relationships of the own
support investigations run by an automated system for the area of
                                                                       ovopublication. The Ovopub model is integrated as part of the
Systems Biology and Functional Genomics. LABORS uses EXPO              Semanticscience Integrated Ontology (SIO)7, which also provides
as an upper level ontology, and splits the representation of           the means to describe hypothesis as literals
hypotheses into textual and logical representations, using concepts        The Semantic Web Applications in Neuromedicine
from OBI and other upper level ontologies. It also allows              (SWAN) ontology8 [Ciccarese et al 2008] aims to represent the
aggregating hypotheses with multiple statements in hypothesis          scientific discourse of bio-medicine papers in general and neuro-
sets, using a Datalog representation for each hypothesis statement.    medicine papers in particular. The model is composed of several
    The nanopublication model 5 [Groth et al 2010] aims to             modules for representing discourse elements and their
represent “the smallest unit of publishable information”, i.e.,        relationships, different types of agents, the roles, provenance and
every assertion that is part of a hypothesis graph.                    versioning of a given statement and bibliographic references.
                                                                       SWAN was designed to describe statements in papers (along with
Nanopublications are composed of three main graphs: An
                                                                       the evidence supporting them). If we consider a hypothesis as a
assertion graph containing the assertion or multiple assertions        text statement, the following example illustrates the SWAN
which are part of the nanopublication, a provenance graph with         model:
the statements that describe the provenance of the assertion graph
(e.g., the assertion graph came from a publication, a scientific       @prefix swande: <http://purl.org/swan/1.2/discourse-
                                                                       elements/> .
experiment, etc.); and lastly a publication info graph which           @prefix swanco:<http://purl.org/swan/1.2/swan-commons/> .
contains the metadata about the nanopublication itself. (e.g., who     @prefix swanqs: <http://purl.org/swan/1.2/qualifiers/> .
                                                                       @prefix swandr: <http://purl.org/swan/1.2/discourse-
created the nanopublication, date when the nanopublication was         relationships/> .
created, etc.). Each of the graphs is represented using a named        @prefix swanpav: <http://purl.org/swan/1.2/pav/> .
                                                                       @prefix swanci: <http://purl.org/swan/1.2/citations/> .
graph,6 so as to be able to describe it properly with metadata from
any of the other graphs. An example can be seen in the snippet         ex:hypothesis a swande:ResearchStatement ;
                                                                           swande:title "EGFR is associated with colon cancer
below, where a hypothesis H1 as in Figure 1 is represented with        subtype A"@en;
its         provenance          (sub:provenance),          assertion       swanco:researchStatementQualifiedAs
                                                                       <http://swan.mindinformatics.org/ontologies/1.2/rsqualifiers/
(sub:hypothesisAssertion) and publication (sub:pubInfo) graphs.        hypothesis>;
                                                                           swanci:derivedFrom ex:execution1;
                                                                           ex:hasConfidenceReport ex:c1;
@prefix sub: <http://example.org/hypothesis#> .                            swanpav:authoredBy ex:experimentScientist;
@prefix np: <http://www.nanopub.org/nschema#> .                            swanpav:createdOn 2012-02-03T14:38:00Z"^^xsd:dateTime .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org#>                                          In the example, a hypothesis is extracted from a research
sub:defaultGraph {
 sub:n1 np:hasAssertion sub: hypothesisAssertion;                      article. The hypothesis is represented as a statement, which can be
        np:hasProvenance sub:provenance ;                              further described with SWAN. The provenance of the hypothesis
        np:hasPublicationInfo sub:pubInfo ;
        a np:Nanopublication, ex:Hypothesis .                          is represented as well by representing the agents who created the
}                                                                      hypothesis statement.
sub:hypothesisAssertion {##statements contained in the
hypothesis graph
 ex:EGFR ex:associatedWith ex:ColonCancer .}

4
    http://linkedscience.org/lsc/ns/
5                                                                      7
    http://www.nanopub.org/nschema#                                        http://semanticscience.org/ontology/sio.owl
6                                                                      8
    https://www.w3.org/TR/rdf11-concepts/                                  https://www.w3.org/TR/hcls-swan/
K-CAP’17 SciKnow, December 2017, Austin, TX, USA                                                                                 Garijo et al


     Figure 3: The example from Figure 1 adapted to the micropublication model, following [Clark et al 2014]. The namespaces indicate
     the ontology used: mp for micropublications, prov for the PROV ontology, and ext for the extension that would need to be added.

    Finally, micropublications 9 [Clark et al 2014] are derived              The lower half of Table 1 corresponds to fine-grained models
from the SWAN model and can be considered a refinement of the            to describe hypotheses, either defining classes and properties to
nanopublication model. Micropublications propose a semantic              qualify hypothesis statements with provenance metadata or
model of scientific argumentation and evidence that supports             relating its different parts together. Among these, the
natural language statements, data and materials specifications,          nanopublication and micropublication models are the most
discussion, etc. Figure 3 shows an illustrative example, where a         flexible approaches, compliant with most of the requirements of
micropublication uses a mechanism similar to an assertion graph          the DISK model (in the last row). LABORS uses a datalog
to represent the claim of a protein being associated with a subtype      representation for describing hypothesis statements and is domain
of colon cancer, along with its supporting evidence. The                 specific. The ovopublications model is a simplification of the
micropublication model uses the Web Annotation Ontology10 to             nanopublication model to include provenance of assertions or
associate a micropublication and its contents with text from             collections of assertions. Although it could be used for hypothesis
articles.                                                                representation, we consider that the model would need to be
                                                                         thoroughly extended. Similarly, the SWAN model is extended in
4       DISCUSSION                                                       the micropublication approach to represent argumentation of facts
                                                                         in    publications. Therefore,        the nanopublication and
Table 1 summarizes the different candidate models for hypothesis         micropublication models provide a richer initial framework.
representation in automated discovery systems, according to the              A major difference between micropublications and
features described in Section 3.1. Most models lack support for          nanopublications is the scope of the domain. For instance,
qualifying a given hypothesis with confidence levels. In order to        micropublications was explicitly designed to model facts and
overcome this issue, we may follow an approach similar to Figure         argumentation of text statements. If an automated discovery
1: extend the target model with a class (confidence Report) and          system aims to represent single assertions of hypotheses and their
two properties (hasConfidenceReport and hasConfidenceLevel)              evolution, then an argumentation framework such as the one
linking them together. A reason why the confidence level may not         proposed in the micropublication model is not necessary. In
be directly linked to a hypothesis is that the same hypothesis may       contrast, if the provenance trace includes all evidence to support a
be evaluated at different points in time, resulting in multiple          particular claim made in a hypothesis, then micropublications are
confidence levels with different provenance information each             an appropriate model to use.
included in a separate confidence report.                                    Another aspect to consider is the support from the
    The upper half of Table 1 corresponds to the models for              communities that are using these models. The nanopublication
coarse grained hypothesis representation. These models include a         model has been discussed for some time, and has available
main concept to refer to a hypothesis, but lack the means to             tooling, documentation and examples. 11 The micropublication
describe hypothesis statements. Therefore, they do not meet the          model has been documented in detail with examples [Clark et al
majority of requirements that DISK requires for representing             2014], but it has not yet reached the level of adoption and tooling
hypothesis statements, qualifiers, history and evidence. However,        that nanopublications have.
the LinkedScience, OBI and EXPO vocabularies define different
types of hypotheses, and may be potential candidates for reuse if
we need to define a hypothesis taxonomy.

9
    http://purl.org/mp
10                                                                       11
     https://www.w3.org/ns/oa                                                 http://nanopub.org/


                                                                                                                                           5
K-CAP’17 SciKnow, December 2017, Austin, TX, USA                                                                           Garijo et al.


Table 1: Overview of models for hypothesis representation.
 Hypothesis Model                      Hypothesis        Hypothesis   Hypothesis     Hypothesis     Hypothesis       Use of
                                       statement         qualifier    evidence       history        classification   standards
 SWEET [Raskin and Pan 2005]           No                No           No             No             No               Yes (OWL)
 OBI [Brandowski et al 2016]             No                No         No             No             Yes              Yes (OWL)
 EXPO [Soldatova and King 2006]          No                No         No             No             Yes              Yes (OWL)
 OCR [Sim et al 2014]                    No                No         No             No             No               Yes (OWL)
 Linked Science Vocabulary               No                No         Partly         No             No               Yes (OWL)
 LABORS [Soldatova and Rzhetsky          No                No         Yes            No             Yes              Yes (OWL)
 2011]
 Nanopublications [Groth et al 2010]     Text/             No         Yes            Yes            No               Yes    (OWL),
                                         structured                                                                  named graphs
 Ovopublications [Callahan         and   Text/             No         No             Yes            No               Yes    (OWL),
 Dumontier 2013]                         structured                                                                  named graphs
 SWAN [Ciccarese et al 2008]             Text              No         Yes            Yes            No               Yes (OWL)
 Micropublications [Clark et al 2014]    Text              Yes        Yes            No             No               Yes    (OWL),
                                                                                                                     named graphs
 DISK [Garijo et al 2017]                Structured        Yes        Yes            Yes            No               Yes    (OWL),
                                                                                                                     named graphs

    Finally, both the nanopublication and micropublication
models present an important limitation for representing               ACKNOWLEDGMENTS
hypotheses: they have been designed to describe simple facts, i.e.,
                                                                      We gratefully acknowledge support from the Defense Advanced
single statements or a single collection of statements as part of
                                                                      Research Projects Agency through the SIMPLEX program with
their claim. In the nanopublication model this is reflected by        award W911NF-15-1-0555, and from the National Institutes of
having a unique assertion graph per nanopublication, containing       Health under award 1R01GM117097. We also thank our
one or more statements. If we wanted to describe a hypothesis         collaborators in the DISK project, especially Parag Mallick,
composed of multiple statements, each with confidence levels          Ravali Adusumilli, and Hunter Boyce for their useful feedback on
assigned independently by different experiments, we would have        this work.
to extend the nanopublication model. A possibility may be
creating a new class (a hypothesis composition concept such as        REFERENCES
the “hypotheses-set” in LABORS) that aggregates each of its
statements as an individual nanopublication. Likewise, each           [Callahan and Dumontier 2013] Alison Callahan and Michel
micropublication contains a main claim graph and its support. A         Dumontier. Ovopub: Modular data publication with minimal
mechanism for extending and aggregating micropublications               provenance. arXiv preprint arXiv:1305.6800, 2013.
would also be needed to represent hypothesis with multiple            [Brandrowski et al 2016] Bandrowski A, Brinkman R,
statements. Note that the extension would only be necessary in          Brochhausen M, Brush MH, Bug B, et al. (2016) The Ontology
both models if we wanted to keep the provenance for each                for Biomedical Investigations. PLOS ONE 11(4): e0154556.
statement of the hypothesis. Otherwise they can be included in the      https://doi.org/10.1371/journal.pone.0154556
assertion graph in the case of nanopublications or the claim graph    [Clark et al 2014] Tim Clark, Paolo N. Ciccarese and Carole A.
in the case of micropublications.                                       Goble. Micropublications: a semantic model for claims,
                                                                        evidence, arguments and annotations in biomedical
5 CONCLUSIONS AND FUTURE WORK                                           communications. Journal of Biomedical Semantics 2014, 5:28.
                                                                      [Ciccarese et al 2008] Ciccarese P, Wu E, Kinoshita J, et al. The
In this paper we introduced the DISK hypothesis ontology for            SWAN Scientific Discourse Ontology. Journal of biomedical
representing hypotheses evolution, which was developed for the          informatics.                                2008;41(5):739-751.
DISK automated discovery system. We also presented a survey             doi:10.1016/j.jbi.2008.04.010.
of existing vocabularies to represent hypotheses, and assessed        [Garijo et al 2017] The DISK Hypothesis Ontology. Version
their suitability in the context of automated knowledge discovery.      1.0.0. Available from http://disk-project.org/ontology/disk#
Future work includes extending the DISK ontology to align with        [Gil et al 2016] Gil, Y.; Garijo, D.; Ratnakar, V.; Mayani, R.;
these models.                                                           Adusumilli, R.; and Boyce, H. Automated Hypothesis Testing
                                                                        with Large Scientific Data Repositories. In Proceedings of the
                                                                        Fourth Annual Conference on Advances in Cognitive Systems
                                                                        (ACS), pages 1-6, 2016.
K-CAP’17 SciKnow, December 2017, Austin, TX, USA                   Garijo et al.

[Gil et al 2017] Gil, Y.; Garijo, D.; Ratnakar, V.; Mayani, R.;
  Adusumilli, R.; Boyce, H.; Srivastava, A.; and Mallick, P.
  Towards Continuous Scientific Data Analysis and Hypothesis
  Evolution.      In Proceedings of the Thirty-First AAAI
  Conference on Artificial Intelligence (AAAI-17), 2017.
[Groth et al 2010] Groth, Paul; Gibson, Andrew; Velterop, Jan.
  The anatomy of a nanopublication. Information Services and
  Use, 30, 1-2: 52-56, 2010.
[King 2017] Ross King. The Adam and Eve Robot Scientists for
  the Automated Discovery of Scientific Knowledge. Bulletin of
  the American Physical Society, 2017
[Lebo et al 2013] Lebo, T., McGuiness, D., Belhajjame, K.,
  Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik,
  S., and Zhao, J. (2013). The PROV ontology, W3C
  recommendation. Technical report, World Wide Web
  Consortium (W3C), 30th April 2013.
[Munafo et al 2017] Marcus R. Munafò, Brian A. Nosek, Dorothy
  V. M. Bishop, Katherine S. Button, Christopher D. Chambers,
  Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan
  Wagenmakers, Jennifer J. Ware & John P. A. Ioannidis. A
  manifesto for reproducible science. Nature Human Behaviour
  1, Article number: 0021 (2017). doi:10.1038/s41562-016-0021
[Pankratius et al 2016] V. Pankratius, J. Li, M. Gowanlock, D.
  Blair, C. Rude, T. Herring, F. Lind, P. Erickson, C. Lonsdale,
  Computer-Aided Discovery: Towards Scientific Insight
  Generation with Machine Support. IEEE Intelligent Systems
  31(4), pp. 3-10, Jul/Aug 2016.
[Raskin and Pan 2005] Robert G. Raskin and Michael J. Pan.
  Knowledge representation in the semantic web for Earth and
  environmental terminology (SWEET). Computers &
  Geosciences        31(9):1119-1125,       November       2005.
  doi:10.1016/j.cageo.2004.12.004.
[Sim et al 2014] Sim I, Tu SW, Carini S, et al. The Ontology of
  Clinical Research (OCRe): An Informatics Foundation for the
  Science of Clinical Research. Journal of biomedical
  informatics. 2014;52:78-91. doi:10.1016/j.jbi.2013.11.002.
[Soldatova and King 2006]: Soldatova, LN & King, RD. (2006)
  An Ontology of Scientific Experiments. Journal of the Royal
  Society          Interface,        3(11):795-803,        2006.
  doi:10.1098/rsif.2006.0134.
[Soldatova and Rzhetsky 2011]: Soldatova, LN and Rzhetsky, A.
  Representation of research hypotheses. Journal of Biomedical
  Semantics20112(Suppl                2):S9.            2011.
  https://doi.org/10.1186/2041-1480-2-S2-S9