<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>datasets for biomedical knowledge graphs with negative statements</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rita T. Sousa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Silva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catia Pesquita</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LASIGE</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faculdade de Ciências da Universidade de Lisboa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Negative Statements, Protein-Protein Interaction Prediction, Gene-Disease Association Prediction, Dis-</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Biomedical Knowledge Graphs, Biomedical Ontologies, Gene Ontology</institution>
          ,
          <addr-line>Human Phenotype Ontology</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>29</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Knowledge graphs represent facts about real-world entities. Most of these facts are defined as positive statements. The negative statements are scarce but highly relevant under the open-world assumption. Furthermore, they have been demonstrated to improve the performance of several applications, namely in the biomedical domain. However, no benchmark dataset supports the evaluation of the methods that consider these negative statements.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>_ℎ</p>
      <p>(Figure 1).
performs</p>
      <p>perform 
Knowledge Graphs (KGs) have been used to represent knowledge about real-world entities
and their relationships. Most KGs use ontologies as a backbone to describe entities through
ontology-based annotation, which associates an entity with a class. These annotations are
commonly represented as positive statements establishing that an ontology class describes an
entity. For example, in the biomedical domain, positive statements express that a protein  1
_ 
_ℎ</p>
      <p>
        as defined in the Gene Ontology (GO) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Negative
statements are extremely rare but can be used to declare that a given protein  2
does not
SeWebMeDa-2023: 6th International Workshop on Semantic Web solutions for large-scale biomedical data analytics,
(C. Pesquita)
      </p>
      <p>
        The lack of negative statements is a significant issue because KGs operate under the
openworld assumption. Therefore, this lack of information can lead to confusion regarding whether
the absence of a positive statement is due to a lack of knowledge or the actual absence of
the relationship. Moreover, the importance of negative statements to produce more accurate
representations of entities in a KG [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] and improving performance in diferent applications [
        <xref ref-type="bibr" rid="ref4 ref5">4,
5</xref>
        ] is increasingly recognized in the biomedical domain.
      </p>
      <p>
        While there have been attempts to enhance current KGs with interesting negative statements,
to the best of our knowledge, no benchmark datasets have been established to evaluate learning
tasks over those KGs. With this in mind, we enrich existing biomedical KGs with negative
statements and propose a collection of datasets for diferent biomedical tasks of relation
prediction. The biomedical domain was selected because biomedical KGs are usually back-boned by
biomedical ontologies that can express negation. Additionally, negative statements have been
considered relevant for diferent biomedical applications [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Our datasets are grouped according
to the task: protein-protein interaction (PPI) prediction, gene-disease association (GDA)
prediction and disease prediction. Regarding the KGs, we enrich two successful biomedical ontologies:
GO which covers distinct semantic aspects of gene products’ function, and Human Phenotype
Ontology (HP) which describes the universe of concepts related to phenotypic abnormalities
found in human hereditary diseases.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Several approaches to enriching existing KGs with interesting negative statements have been
proposed. Arnaout et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed a method to enrich Wikidata by including interesting
negative statements, which led to improvements in tasks involving entity summarization and
decision-making.
      </p>
      <p>
        In the biomedical domain, several approaches tackle the lack of negative statements in
biomedical ontologies, such as GO. The number of functions that a protein does not have is
larger than the number of functions it has. Therefore, the number of negative statements
describing proteins in the GO should be several orders of magnitude greater than the number
of positive statements. Youngs et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] designed two algorithms to predict negative statements
for GO and populate the NoGo database, one based on empirical conditional probability and
the other on topic modeling applied to genes and annotation. Fu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduced NegGOA,
a new method to enrich the GO with relevant negative statements indicating that a protein
does not perform a given function. This method exploits the GO by using hierarchical semantic
similarity between GO terms. The enriched GO was used for protein function prediction.
Later, Vesztrocy et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] presented a benchmark based on a balanced test set of positive and
negative statements. The negative statements are generated from expert-curated annotations
of protein families on phylogenetic trees. The results of this work demonstrated that negative
statements improve protein function prediction. Regarding the HP, although the importance of
negative statements in gene-phenotype prediction is recognized, the enrichment with negative
statements has yet to be investigated [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Building the Datasets</title>
      <p>We present a collection of datasets that work over two enriched KGs for three relation
prediction tasks: PPI prediction, GDA prediction, and disease prediction. Each benchmark dataset
comprises several pairs of biomedical entities (or instances) that can be of the same type
(proteinprotein) or distinct types (gene-disease and disease-patient) with the respective label (1 for the
positive pairs and zero for the negative pairs). Tables 1 and 2 show the KGs’ and datasets’
statistics for each task. Since for GDA prediction and disease prediction, the target relation
happens between two types of instances (genes and diseases for GDA prediction and diseases
and patients for disease prediction), the instance numbers in Table 2 appear separately.
Moreover, in the case of PPI prediction, we exclusively employ the GO KG that has been subjected
to a negative statement enrichment approach. However, when it comes to GDA prediction
and disease prediction, we rely on the HP KG, which lacks a negative statement enrichment
approach, resulting in a significant imbalance between the number of positive and negative
statements.</p>
      <p>To build these datasets, we adopt three main steps. The first one consists of enriching the
KGs. The KG is constructed using the owlready2 package1, which parses the ontology file
1https://owlready2.readthedocs.io/en/v0.37/
in OWL format and processes the annotation file. The annotation file contains positive and
negative statements used to describe entities. We use the guidelines established by the W3C2
to define the negative statements as negative object property assertions 3. To do so, we use
metamodeling and represent each ontology class as a class and an individual. This situation
translates into using the same IRI. Then, we use a negative object property assertion to state
that the individual representing a biomedical entity is not connected by the object property
expression to the individual representing an ontology class, as depicted in Figure 2. The second
step consists of extracting pairs of entities from bioinformatic databases. The third step involves
selecting the pairs containing KG entities that are well described with positive and negative
statements.</p>
      <p>The following subsections describe in more detail the KGs as the characteristics of each task.</p>
      <sec id="sec-3-1">
        <title>3.1. Biomedical Knowledge Graphs</title>
        <p>Two KGs back-boned by biomedical ontologies are used: the GO KG and the HP KG. Table 1
shows the statistics for each ontology.</p>
        <p>
          The GO is used to describe gene products (proteins or genes) according to the molecular
functions they perform, the biological processes they are involved in, and the cellular
components where they act. The GO KG is built by integrating three sources: the GO4 itself, the GO
2https://www.w3.org/TR/owl2-mapping-to-rdf/
3https://www.w3.org/TR/owl2-syntax/#Negative_Object_Property_Assertions
4The GO was downloaded on September 2021. It is available at http://release.geneontology.org/2021-09-01/ontology/
index.html
Annotation data5 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and negative GO associations produced in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]6.
        </p>
        <p>A GO annotation links a specific gene product with a particular GO class. The majority of
GO annotation data corresponds to positive statements. However, the GO annotation has the
qualifier ‘NOT’ for a few cases, meaning that a gene product has been proven not to carry out a
specific function. The annotations that possess this qualifier were added as negative statements.
In addition to these negative statements, the GO KG was also enriched with negative statements
derived from expert-curated annotations of protein families on phylogenetic trees. The idea is
that, if no evidence exists to suggest otherwise, gene function is maintained over time through
evolution. Therefore, after expert curators have annotated ancestral states in gene phylogenies
with GO classes, they check if the annotations are propagated down the phylogeny. When there
is evidence that the function is absent in a specific sub-tree, a negative statement is added to
that protein. These enriched negative statements were filtered so there were no contradictions
with the GO annotation data.</p>
        <p>HP characterizes phenotypic abnormalities discovered in human hereditary diseases according
to five semantic aspects: phenotypic abnormalities, mode of inheritance, clinical course, clinical
modifier and frequency. HP annotations can link diseases, patients or genes to HP classes via
positive and negative statements. The construction of HP KG7 is similar to that of the GO KG.
A negative annotation from HP that includes ’NOT’ indicates that a disease does not cause that
phenotype, so they are included as negative statements.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Protein-Protein Interaction Prediction Dataset</title>
        <p>
          Predicting PPIs is a fundamental task in molecular biology for understanding biological systems.
Given the high cost of experimentally determining PPI, many computational approaches for
PPI prediction based on available functional information described by the GO [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] have been
proposed to find protein pairs likely to interact and thus provide a selection of good candidates
for experimental analysis. Therefore, the GO KG is used to describe the proteins of the dataset.
        </p>
        <p>The positive examples are extracted from the STRING [10] database. Our selection of protein
pairs was based on the following criteria: (i) interactions between proteins had to be curated or
experimentally determined rather than computationally determined; (ii) interactions needed to
have a confidence score above 0.950 to ensure high confidence; (iii) each protein must have at
least one positive statement for a GO class and one negative statement for another GO class.
The negative examples are generated by random negative sampling over the set of proteins of
the positive examples.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Gene-Disease Association Prediction Dataset</title>
        <p>Knowing which genes are associated with a specific disease is crucial to understanding the
disease mechanisms and recognising potential biomarkers or therapeutic targets. However, once
again, validating these associations in the wet lab is expensive and time-consuming. This has
5The GO positive annotations were downloaded on January 2021. It is available at http://release.geneontology.org/
2021-01-01/annotations/index.html.
6The negative annotations were downloaded from https://lab.dessimoz.org/20_not
7The HP was downloaded on October 2022, while the HP annotations were downloaded on November 2021. A link
to these versions is no longer available.
prompted the evolution of computational methods to identify the most promising associations
to be further validated.</p>
        <p>The two KGs are used for the GDA prediction task dataset. GO KG describes the genes, and
HP KG describes the diseases. The target relations to predict are extracted from DisGeNET [11].
Adapting the approach described in [12], we considered the following criteria to select
genedisease pairs: (i) each gene must have at least one positive statement for a GO class and one
negative statement for another GO class; (ii) each disease must have at least one positive
statement for an HP class and one negative statement for an HP class. We sampled random
negative examples of the same genes and diseases to create a balanced dataset.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Disease Prediction Datasets</title>
        <p>Since human diseases are a complex phenomenon, disease prediction is an essential but still
complicated task that must be executed accurately and eficiently. Therefore, using computational
methods to help physicians prioritize diseases is highly advantageous.</p>
        <p>The dataset to predict if a synthetic patient has been diagnosed with a specific disease is
generated by adapting the methodology proposed in [13]. Thirty-three mendelian diseases
for which they knew the penetrance of each phenotype are selected. Penetrance indicates the
likelihood that a patient sufering from a specific disease will exhibit a particular phenotype.
For each of these 33 diseases, 20 synthetic patients diagnosed with that disease are created. The
patients’ positive annotation is determined by the disease’s penetrance and the patient’s gender.
The gender is defined randomly with an equal likelihood for both genders. For example, the
’Aarskog-Scott syndrome’ is annotated with the phenotype ’Ptosis’ with a penetrance of 0.5061,
meaning that approximately half of the synthetic patients diagnosed with that disease will have
a positive statement for this phenotype. The negation of phenotypes does not have a penetrance
associated, so synthetic patients inherit the negative phenotypes related to the disease. For
example, since the disease ’Aarskog-Scott syndrome’ is annotated with ’NOT Decreased Fertility’,
each patient will have a negative statement for this phenotype. Furthermore, 1000 diseases were
randomly chosen to add complexity to the task. These diseases are annotated with positive and
negative statements.</p>
        <p>Random annotations can also be added to patients to emulate a more realistic situation where
a patient is associated with phenotypes unrelated to the patient’s disease. In addition to the
disease prediction dataset, we present three versions with random annotations. The number of
random annotations is defined by a percentage Noi (Noi=[0, 0.1,0.2,0.4]) concerning a given
patient’s total number of annotations. For example, if Noi=0.5, half of the full annotations of a
given patient are added. Table 3 shows the number of positive and negative statements for each
noise version.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Validation of the Datasets</title>
      <p>KG embedding methods [14] have been successfully employed in several biomedical
applications [14]. Since these methods map KGs into low-dimensional spaces, they have emerged
as a popular way to generate features for machine learning tasks. Therefore, we use two KG
embedding methods to evaluate our datasets - RDF2Vec [15] and OWL2Vec* [16]. RDF2Vec
is a path-based method that generates random walks in the KG that constitutes the corpus of
word sequences given as input to a neural language model. OWL2Vec* was designed to learn
ontology embeddings and it also employs direct walks on the graph to learn graph structure.
These embedding methods generate representations of the biomedical entities that are combined
using the binary Hadamard operator to represent the pair.</p>
      <p>The pair representations are then fed into a Random Forest algorithm for training using Monte
Carlo cross-validation (MCCV) [17]. MCCV is a variation of traditional  -fold cross-validation
in which the data is divided into training and testing sets (with  being the proportion of the
dataset to include in the test split)  times. Our experiments use MCCV with  = 30 and
 = 0.3 for PPI and GDA prediction. Given the large number of pairs for disease prediction, we
use MCCV with  = 5 and  = 0.3 .</p>
      <p>Each embedding method is run with two diferent KGs, one with only positive statements
and the other with both positive and negative statements. Table 4 reports each task’s median of
recall, precision and weighted average F-measure.</p>
      <p>Figure 3 compares the impact of using only positive statements versus both positive and
negative statements on our datasets. The bars represent the diference in performance for
precision, recall and weighted average F-measure, with upward bars indicating improved
performance with both positive and negative statements and downward bars indicating decreased
performance.</p>
      <p>The experiments show that the added information given by negative statements generally
improves the performance of RDF2Vec. However, for OWL2Vec*, the performance only improves
for PPI prediction.</p>
      <p>(a) PPI prediction</p>
      <p>(b) GDA prediction
(c) Disease prediction</p>
    </sec>
    <sec id="sec-5">
      <title>5. Using the Benchmark</title>
      <p>All datasets are available on Zenodo8 under a CC BY 4.0 license. For each dataset, we provide
access to two types of files: (1) one TSV file containing pairs of entities and information about
whether a relationship exists between them or not; (2) OWL files containing the KG used to
describe the biomedical entities that appear in the TSV file. Together, these files can be used to
perform relation prediction tasks since the TSV file provides the specific entities and relations
that need to be predicted, while the OWL file provides the necessary background knowledge
for generating the features.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Benchmark datasets are essential for evaluating and comparing the performance of diferent
approaches that work over KGs. This paper presents a collection of datasets for three relation
prediction tasks in the biomedical domain: PPI prediction, GDA prediction, and disease
prediction. The biomedical domain is chosen since it is already demonstrated that the inadequacy of
approaches to take into consideration negative statements is a limitation for several biomedical
applications. However, although the datasets are domain-specific, they can be used to evaluate
approaches outside the biomedical domain.</p>
      <p>The datasets are validated using two popular KG embedding methods to generate features
that are then given as input for a classifier. The results highlight the importance of incorporating
negative statements into KGs to create more accurate representations of KG entities.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>C. P., S. S., R. T. S. are funded by the FCT through LASIGE Research Unit (ref. UIDB/00408/2020
and ref. UIDP/00408/2020), and the FCT PhD grant (ref. SFRH/BD/145377/2019). It was also
partially supported by the KATY project, which has received funding from the European Union’s
Horizon 2020 research and innovation programme under grant agreement No 101017453, and
by HfPT: Health from Portugal under the Portuguese Plano de Recuperação e Resiliência. The
authors thank Lina Aveiro for the preliminary results of this work.
[10] D. Szklarczyk, A. L. Gable, K. C. Nastou, D. Lyon, R. Kirsch, S. Pyysalo, N. T. Doncheva,
M. Legeay, T. Fang, P. Bork, L. J. Jensen, C. von Mering, The STRING database in 2021:
customizable protein–protein networks, and functional characterization of user-uploaded
gene/measurement sets, Nucleic Acids Research 49 (2020) D605–D612.
[11] J. Piñero, J. M. Ramírez-Anguita, J. Saüch-Pitarch, F. Ronzano, E. Centeno, F. Sanz, L. I.</p>
      <p>Furlong, The DisGeNET knowledge platform for disease genomics: 2019 update, Nucleic
Acids Research 48 (2019) D845–D855.
[12] S. Nunes, R. T. Sousa, C. Pesquita, Predicting gene-disease associations with knowledge
graph embeddings over multiple ontologies, in: ISMB Annual Meeting - Bio-Ontologies,
2021.
[13] A. J. Masino, E. T. Dechene, M. C. Dulik, A. Wilkens, N. B. Spinner, I. D. Krantz, J. W.</p>
      <p>Pennington, P. N. Robinson, P. S. White, Clinical phenotype-based gene prioritization:
an initial study using semantic similarity and the human phenotype ontology, BMC
bioinformatics 15 (2014) 1–11.
[14] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: A survey of approaches
and applications, IEEE Transactions on Knowledge and Data Engineering 29 (2017)
2724–2743.
[15] P. Ristoski, H. Paulheim, RDF2Vec: RDF graph embeddings for data mining, in:
International Semantic Web Conference, Springer, 2016, pp. 498–514.
[16] J. Chen, P. Hu, E. Jimenez-Ruiz, O. M. Holter, D. Antonyrajah, I. Horrocks, OWL2Vec*:</p>
      <p>Embedding of OWL ontologies, Machine Learning (2021) 1–33.
[17] Q.-S. Xu, Y.-Z. Liang, Monte Carlo cross validation, Chemometrics and Intelligent
Laboratory Systems 56 (2001) 1–11.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] GO Consortium, The Gene Ontology Resource: 20 years and still GOing strong</article-title>
          ,
          <source>Nucleic Acids Research</source>
          <volume>47</volume>
          (
          <year>2018</year>
          )
          <fpage>D330</fpage>
          -
          <lpage>D338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Computational methods for prediction of human protein-phenotype associations: A review</article-title>
          ,
          <source>Phenomics</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <fpage>171</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gaudet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dessimoz</surname>
          </string-name>
          ,
          <article-title>Gene Ontology: pitfalls, biases, and remedies</article-title>
          ,
          <source>in: The Gene Ontology Handbook</source>
          , Humana Press, New York, NY,
          <year>2017</year>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Yu, NegGOA: negative GO annotations selection using ontology structure</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>32</volume>
          (
          <year>2016</year>
          )
          <fpage>2996</fpage>
          -
          <lpage>3004</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Warwick Vesztrocy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dessimoz</surname>
          </string-name>
          ,
          <article-title>Benchmarking Gene Ontology function predictions using negative annotations</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>i210</fpage>
          -
          <lpage>i218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kulmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. Z.</given-names>
            <surname>Smaili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoehndorf</surname>
          </string-name>
          ,
          <article-title>Semantic similarity and machine learning with ontologies</article-title>
          ,
          <source>Briefings in Bioinformatics</source>
          <volume>22</volume>
          (
          <year>2021</year>
          )
          <article-title>bbaa199</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Arnaout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          , G. Weikum,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Negative statements considered useful</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>71</volume>
          (
          <year>2021</year>
          )
          <fpage>100661</fpage>
          . Publisher: Elsevier.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Youngs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Penfold-Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bonneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shasha</surname>
          </string-name>
          ,
          <article-title>Negative example selection for protein function prediction: The NoGO database</article-title>
          ,
          <source>PLOS Computational Biology</source>
          <volume>10</volume>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>GO</given-names>
            <surname>Consortium</surname>
          </string-name>
          ,
          <article-title>The Gene Ontology resource: enriching a GOld mine</article-title>
          ,
          <source>Nucleic Acids Research</source>
          <volume>49</volume>
          (
          <year>2021</year>
          )
          <fpage>D325</fpage>
          -
          <lpage>D334</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>