<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cooperation of bio-ontologies for the classi cation of genetic intellectual disabilities : a diseasome approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabin Personeni</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie-Dominique Devignes</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malika Smal-Tabbone</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Jonveaux</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Celine Bonnet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrien Coulet</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Genetics, Nancy University Hospital, Inserm U954, University of Lorraine</institution>
          ,
          <addr-line>Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Stanford Center for Biomedical Informatics Research, Stanford University</institution>
          ,
          <addr-line>Stanford, California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universite de Lorraine</institution>
          ,
          <addr-line>CNRS, Inria, LORIA, Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bio-ontologies are widely used to annotate and characterize biological objects or situations, enabling the use of shared or similar features in classi cation tasks. It may appear bene cial to make two or more bio-ontologies cooperate for building more complete descriptions, and therefore more accurate classi cations of biological objects. This hypothesis is evaluated here for the classi cation of an heterogeneous set of 374 Genetic Intellectual Disabilities (GIDs), using a diseasome approach. These GIDs are annotated with classes of the Human Phenotype Ontology (HPO) and their causal genes with the three aspects of the Gene Ontology (GO). We test two semantic similarity measures, and di erent combinations of ontologies, to connect semantically similar diseases. We then evaluate how well these ontologies, and their combinations, are exploited by the similarity measures to classify GIDs in accordance with an expert classi cation. Results show that combining the three aspects of GO achieves very good overall performance, and that, for each GID class, a particular combination of 2 or 3 GO aspects and occasionally HPO yields the best performance. These results illustrate how bio-ontologies can cooperate in a classi cation by re ning the characterization of biological objects.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic similarity</kwd>
        <kwd>Disabilities</kwd>
        <kwd>Diseasome</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Bio-ontologies</p>
      <p>
        Genetic Intellectual
Bio-ontologies, such as the Gene Ontology (GO) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or the Human Phenotype
Ontology (HPO) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] are used to annotate biological objects such as gene
products or diseases, enabling their semantic comparison. In particular, there exists
a wide collection of semantic similarity measures, allowing to quantify the
similarity of objects with regard to their annotations [
        <xref ref-type="bibr" rid="ref19 ref3">19, 3</xref>
        ]. We investigate in this
article how several bio-ontologies can be used conjointly to cooperate to improve
classi cation of a heterogeneous set of Genetic Intellectual Disabilities (GID).
      </p>
      <p>
        Numerous studies report on the hypothesis that analyzing disease networks,
here named diseasomes, may be a mean to discover new knowledge on
mechanisms or treatments of diseases [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Various methods for diseasome building have
been described in the literature. For instance, two diseases can be associated if
they share one causing gene [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], phenotype [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or are linked through a chain
of protein-protein interactions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Hoehndorf et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed a diseasome
that associates diseases with respect to their phenotypic similarity. They
assembled a dataset, extracted from the literature, of about 6; 000 diseases annotated
with their associated phenotypes, using classes of the Monarch Disease
Ontology (MonDO) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The similarity of diseases with regard to their annotation
were subsequently computed with the SimGIC function [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We propose here
to extend this approach by conjointly using annotations taken from several
ontologies, and by assessing their respective contribution to a disease classi cation
task.
      </p>
      <p>The hypothesis that several bio-ontologies can somehow cooperate to re ne
a diseasome is evaluated here within the task of classifying GIDs. The classi
cation of GIDs is of particular interest, and challenging for experts, because these
diseases are very heterogeneous both in terms of causal genes and clinical
outcomes. We focused on a set of 374 GIDs for which causal genes are known and
used for genetic diagnosis. We manually classi ed these diseases with experts into
ve groups, on the basis of the biological mechanisms disturbed in the disease:
regulation, regulation of genetic expression, metabolic, synaptic, neurogenesis.
We detail in this article a diseasome approach based on semantic similarity of
GIDs at both the phenotype and genetic level, and study how it can match an
expert GID classi cation.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Material and Methods</title>
      <sec id="sec-2-1">
        <title>Data and Ontologies</title>
        <p>
          A dataset of 374 GIDs was built for this study on the basis of a list of 312
genes associated with GIDs derived from the work of Gilissen et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] who
initially compiled two lists of GID genes: a list of 528 \known" GID genes and
a list of 628 \candidates" genes, based on the number of reported patients in
which a mutation or variant of the gene is observed. The 312 genes retained
here (230 \known" genes and 82 \candidates") are those that are found
associated with a genetic disease in OMIM database (Online Mendelian Inheritance in
Man, http://omim.org) and used for diagnosis in the Genetics Laboratory of
Nancy Hospital. Four distinct ontologies were used in this study: Human
Phenotype Ontology (HPO) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and the three aspects of Gene Ontology (GO) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
named here BP for Biological Process, CC for Cellular Component and MF for
Molecular Function. These three aspects of GO are organized into independent
hierarchies of classes related by the subsumption relation, and are here
considered as separate ontologies. HPO annotations of GIDs were collected from
the HPO database (http://hpo.jax.org). BP, CC and MF annotations were
collected from the GOA database [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] at the European Bioinformatics
Institute (https://www.ebi.ac.uk/GOA) for all UniProtKB proteins encoded by the
genes associated to GIDs and transferred to the corresponding GID. The average
number of HPO classes associated per GID was 22:4 17:5, whereas the average
numbers of BP, CC and MF classes per GID were 14:8 16:4, 5:8 4:3 and
5:2 4, respectively. All GIDs could be associated with at least one HPO class
and one BP class. Only 27 GIDs were found lacking one or the other aspect of
GO annotation, mostly CC.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Expert classi cation of GIDs</title>
        <p>
          GID diversity and heterogeneity renders their classi cation di cult. Our
manual classi cation is an attempt to integrate the state-of-the-art knowledge about
GIDs[
          <xref ref-type="bibr" rid="ref13 ref14 ref15 ref5">13, 5, 15, 14</xref>
          ] into the de nition of ve classes. The \Metabolic" class
represents diseases a ecting synthesis or degradation of metabolites, leading
to metabolite de ciency or accumulation with deleterious consequences. The
\Synaptic" class represents diseases a ecting the structure and the function of
synapses. The \Neurogenesis" class represents diseases a ecting neuronal
migration or proper development of central nervous system. The \Regulation of
genetic expression" class represents diseases in which genetic expression (chromatin
structure, transcription and its regulation, translation and post-translational
modi cations) is a ected. The \Regulation" class is for all other diseases in
which control of biological processes other than genetic expression is a ected
(for instance transport of proteins or energetic balance of the cell). Our dataset
of 374 GIDs with their 312 responsible genes was manually distributed into these
ve classes by expert inspection of their OMIM notices (disease and gene ones).
The resulting classi cation likely relies on several subjective arbitrary
statements, but it appeared su cient for the methodology used in this study. Table
1 quantitatively describes the composition of each class of GIDs.
        </p>
        <p>The broad de nition of each class leads to possible assignment of the same
disease and gene to two di erent classes. This is the case of 3, 5 and 10 GIDs
of the Metabolic, Synaptic and Neurogenesis classes, respectively that are also
classi ed in the Regulation class. One additional GID from the Neurogenesis
class is also classi ed as Regulation of genetic expression. This GID (OMIM
#613454: Rett syndrome, congenital form) illustrates the di culty to classify
GIDs, as it is described as a severe neurodevelopmental disorder and therefore
classi ed in the Neurogenesis class, whereas its responsible gene is the FOXG1
gene, which codes for a repressor of the forked-head transcription factor family,
pointing to the Regulation of genetic expression class.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Semantic similarity measures</title>
        <p>
          Semantic similarity measures quantify the proximity of two ontology classes, or
objects described by a set of classes, from an ontology. Such similarity
measures may be used to build a diseasome, using disease annotations linked to an
ontology [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. We applied such similarity-based diseasome approach to build a
diseasome of GIDs, on the basis of both their phenotypic and causal gene
product annotations. These annotations are expressed as classes from four ontologies
or ontologies fragments : HPO and the 3 aspects of Gene Ontology considered
separately | BP, CC and MF.
        </p>
        <p>
          We aim at assessing the contribution of several ontologies to a diseasome, but
we also use two semantic similarity measures, to compare how di erent measures
behave with ontology combinations. First, we use a node-based semantic
similarity measure: SimGIC [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], which computes the ratio of common classes among
the ontology classes of two diseases, weighted by the information content of each
class, and considering all ancestors of each disease annotations. The
information content of a class, with respect to a dataset of annotations, is computed
as IC(x) = log2(P (x)), where P (x) is the probability that an object is
annotated with the class x. Higher values of IC denotes higher speci city of the class.
SimGIC was rst introduced to compute similarity of genes annotated by GO
classes, and has been successfully used with MonDO to build a diseasome based
on phenotypic similarity [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Second, we use the edge-based similarity measure
IntelliGO [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which permits to compare two biological objects by rst computing
distances in the hierarchy between pairs of classes annotating the objects, and
then aggregating each pairwise similarities into a single-object similarity score.
As SimGIC, this aggregation step takes into account the information content of
the compared classes to weight their contribution to the similarity.
        </p>
        <p>To assess the contribution of each ontology, we build several similarity
functions, using every possible combination of ontologies among BP, CC, MF and
HPO, combined with both IntelliGO and SimGIC. For this purpose and in a
rst approach, we simply average the similarities computed separately with each
ontology. We thus test 15 possible combinations of one or more ontologies, in
turn combined with IntelliGO or SimGIC, resulting in 30 similarity functions.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Evaluation of similarity functions with respect to a reference classi cation</title>
        <p>
          Hoehndorf et al. described in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] a methodology to evaluate the accuracy of a
similarity function with respect to a classi cation of diseases. This methodology
aims at verifying that the similarity function gives higher similarity scores to
pairs of diseases that belong to a same class of disease.
        </p>
        <p>This evaluation is based on a Receiver Operating Characteristic (ROC)
analysis, quantifying the accuracy of a binary classi cation model at varying degrees
of sensibility. In particular, a ranking of disease pairs, based on their similarity,
serves as a classi cation model whose sensibility can be adjusted by de ning a
threshold of similarity above which pairs of diseases are labeled as positive by
the model. The ROC curve represents the true positives rate as a function of
the false positive rate. The ROC Area Under the Curve or ROCAUC can be
computed from such a curve, and represents the probability for a random pair
of diseases from the positive class to have a higher similarity than a random pair
of diseases from the negative class. We note ROCAU C(R; P ) the function that
computes the ROCAUC given a ranking R, and the set of positive elements P ,
describing which elements of R are to be considered positive for the purpose of
the evaluation.</p>
        <p>The ROCAUC-based evaluation can be conducted either for each single class
of diseases or for the entire classi cation of diseases. This evaluation can also be
performed with a classi cation of diseases in which a disease can belong to more
than one class.</p>
        <p>Data: The set of diseases D, a similarity between diseases</p>
        <p>sim : D D ! R+
Result: Average ROCAUC for all diseases
begin</p>
        <p>ROCAU Cavg = 0
foreach disease d 2 D do
ranking x 2 (D fdg) ranked in descending order of sim(d; x)
pos fx 2 D j d and x share a disease classg</p>
        <p>ROCAU Cavg ROCAU Cavg + ROCAU C(ranking; pos)
end
end
return ROCAU Cavg=jDj
Algorithm 1: Evaluation algorithm for a similarity function sim on a
classication task with several overlapping classes of diseases.</p>
        <p>Data: The set of diseases D, a disease class C, a similarity between
diseases sim : D D ! R+
Result: Average ROCAUC for diseases of C
begin</p>
        <p>ROCAU Cavg = 0
pos fx 2 D j x has class Cg
foreach disease d in class C do
ranking x 2 (D fdg) ranked in descending order of sim(d; x)
ROCAU Cavg ROCAU Cavg + ROCAU C(ranking; pos)
end
end
return ROCAU Cavg=jDCj
Algorithm 2: Evaluation algorithm for a similarity function sim on a
classication task with respect to a single class of diseases C.</p>
        <p>The Algorithm 1 describes how the evaluation is performed globally on all
disease classes, for an arbitrary similarity function sim. For each disease d of
the dataset, we compute the ranking of the other diseases using sim, from the
most similar to the least similar. Here, we want high-ranking diseases to share a
disease class with d, thus, we consider the positive class to be the set of diseases
sharing a GID class with d. The ranking is then evaluated by computing the
ROCAUC for the prediction on that positive class. ROCAUCs for each disease
in the dataset are then averaged to obtain a global evaluation score.</p>
        <p>The Algorithm 2 describes how a similarity function sim is evaluated with
respect to a single disease class, noted C. For each disease d of the GID class
C, we compute a ranking of the other diseases using sim, and we de ne the
positive class to be the set of diseases that belong to C. Again, the ranking is
then evaluated by computing the ROCAUC for the prediction on that positive
class. ROCAUCs for each disease of the class are then averaged to obtain a score
representing how well the similarity function re ects this class.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>We apply the methodology described previously to our set of 374 GID. This set
of diseases has phenotypic annotations expressed as HPO classes and the genetic
annotations of their causing genes, expressed as GO classes and split into the
three aspects of GO considered here as three independent ontologies. Indeed, as
their class hierarchy are separate, semantic similarity measures such as IntelliGO
and SimGIC cannot compare classes from these di erent aspects.</p>
      <p>We computed all pairwise similarities on our set of GIDs, for all 30 possible
similarity functions resulting from the combinations of the four ontologies (BP,
CC, MF and HPO) and the two semantic similarity measures (IntelliGO and
SimGIC). Each of these similarity functions was then evaluated by computing
the average ROCAUC for each of the 5 GID classes, as described in Algorithm
2, and for all classes considered together, as described in Algorithm 1.</p>
      <p>Tables 2 and 3 present a selection of the results of theses evaluations for di
erent similarity functions based on di erent combinations of ontologies, computed
using IntelliGO and SimGIC respectively. Results are given for 6 classi cation
tasks, one task for each GID class evaluated separately, as described in
Algorithm 2, and a sixth task evaluating the performance of the similarity function
on all 5 GID classes, as described in Algorithm 1. For each task, we tested every
possible combination of ontologies from BP, CC, MF and HPO, but both Tables
report only results for combinations of interest: single ontologies, all GO aspects,
all ontologies, and all combinations that provide the best performance on any
task for either SimGIC or IntelliGO.</p>
      <p>Unsurprisingly, the results are highly variable depending on which
ontologies are considered, and which similarity measure is used, and they also vary
across di erent classi cation tasks. In particular, we observe that HPO does not
positively contribute to the performance when combined with other ontologies,
with the exception of the classi cation task of the Neurogenesis class. We also
note that similarity functions using only HPO have poor performance compared
with those using a single GO aspect in most cases, although such a function
performs better than a random classi er. This suggests that, for classes other
than Neurogenesis, HPO does not bring more information compared to GO than
noise. However, combining several GO aspects produces a great increase in
performance, which is notably visible for the Regulation of Genetic Expression class:
IntelliGO performance increases from 0:740 in the best case with only one GO
aspect to 0:803 with all three of them, and SimGIC performance increases from
0:905 to 0:936.</p>
      <p>If combining all three aspects of GO provides the best overall performance,
we nonetheless observe that this is not necessarily the best combination for the
classi cation task on each individual GID class :
{ The Regulation class is poorly predicted by most similarity functions. In
particular, similarity functions based on HPO alone are not performing better
than a random classi cation.
{ The Regulation of Genetic Expression class is best when computed using
SimGIC (0:936) and considering the three aspects of GO conjointly.
However, in this case, considering HPO on top of GO does not o er any increase
in performance. Similarly, IntelliGO performs best on this classi cation task
when considering the three aspects of GO, but shows a decrease in
performance when also considering HPO.
{ The Metabolic class is best predicted using IntelliGO (0:862) when using all
three aspects of GO. However, we observe that SimGIC (0:785) obtains its
We study in this article how di erent ontologies can cooperate to improve how a
diseasome can re ect the expert knowledge of a classi cation of GID. We evaluate
here two semantic similarity measures, IntelliGO and SimGIC on a classi cation
task realized with 5 GID classes, using di erent combinations of phenotypic and
genetic ontologies. The results show that phenotypic annotations from HPO are
not su cient to re ect our expert classi cation, while genetic annotations from
each aspect of GO can o er better performance. Furthermore, combining several
aspects of GO further improve performance and, for the Neurogenesis GID class,
combining a GO aspect with HPO o er the best performance.</p>
      <p>
        This illustrates that the cooperation of several ontologies is suitable for such a
classi cation task and can improve a diseasome approach. However, the relevance
of each ontology greatly depends on individual disease classes, and considering
too many ontologies may have a negative e ect. This constitutes a limitation of
the use of semantic similarity measures, in that they are sensitive to the quality
of annotations, as well to annotations irrelevant to class to predict. The results
obtained show that overall, considering only the 3 aspects of GO yields often
slightly better classi cation performance than considering both HPO and GO.
Here, the contribution of HPO to the classi cation performance may be limited
by the GID dataset we used, as these diseases may be too di cult to distinguish
based on their phenotypes alone, as other studies show that HPO is suitable to
classify more phenotypically heterogeneous sets of diseases [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It seems that
in such cases, deciding on which ontologies to consider requires an iterative
empirical approach that consider all possible combinations of ontologies. In fact,
because performances do not necessarily increase with the number of ontologies,
proposing a non-exhaustive strategy to select the best combination of ontologies
for a particular task is not trivial.
      </p>
      <p>
        Furthermore, it may be necessary to develop more sophisticated methods
for aggregating similarities based on di erent ontologies. Here, we used an
unweighted average of similarities each using a di erent ontology. Weighting the
contribution of each ontology to the aggregated similarity could be done in
several ways, for instance, by considering the number of annotations in each
ontology for the compared disease, or by empirically determining an appropriate
weighting scheme, rather than including or excluding ontologies. Moreover, as
we observe that di erent settings and therefore di erent diseasome models are
optimal only for certain classes, ways to aggregate di erent models could be
explored, such as boosting algorithms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or bagging predictors [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Diseasomes based on semantic similarity are able to re ect an expert
classi cation of diseases, as illustrated in the di erent classi cation experiments
presented in this article. Such a diseasome can be used to classify new diseases
by simple propagation of the neighboring diseases classes. Here, the cooperation
of several biomedical ontology was shown to be relevant in many cases,
however selecting the right ontologies to consider for a particular task require some
trial and error. We note that both semantic similarity measures, IntelliGO and
SimGIC, have varying performance on the di erent classi cation tasks presented
in this article: some GID classes seem to be better predicted by one of these two
measures. However, these di erences do not permit to conclude that one of these
measures perform strictly better than the other, as their overall performances
are very similar. In summary, semantic similarity measures with various
combinations of ontologies allow to propose a diseasome as a model synthesizing
descriptions of GIDs in regards with several ontologies, in good agreement with
an expert classi cation of such diseases. Such cooperation of bio-ontologies could
also be explored with other machine learning methods.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ashburner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ball</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blake</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Botstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherry</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolinski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwight</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eppig</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          , et al.:
          <article-title>Gene Ontology: tool for the uni cation of biology</article-title>
          .
          <source>Nature genetics 25(1)</source>
          ,
          <volume>25</volume>
          {
          <fpage>29</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barabasi</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulbahce</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loscalzo</surname>
          </string-name>
          , J.:
          <article-title>Network medicine: a network-based approach to human disease</article-title>
          .
          <source>Nature reviews genetics 12(1)</source>
          ,
          <volume>56</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Benabderrahmane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smail-Tabbone</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poch</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devignes</surname>
          </string-name>
          , M.D.:
          <article-title>Intelligo: a new vector-based semantic similarity measure including annotation origin</article-title>
          .
          <source>BMC bioinformatics 11(1)</source>
          ,
          <volume>1</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Bagging predictors</article-title>
          .
          <source>Machine learning 24(2)</source>
          ,
          <volume>123</volume>
          {
          <fpage>140</fpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chelly</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khelfaoui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Francis</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherif</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bienvenu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Genetics and pathophysiology of mental retardation</article-title>
          .
          <source>European Journal of Human Genetics</source>
          <volume>14</volume>
          (
          <issue>6</issue>
          ),
          <volume>701</volume>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Freund</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schapire</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          , et al.:
          <article-title>Experiments with a new boosting algorithm</article-title>
          .
          <source>In: Icml</source>
          . vol.
          <volume>96</volume>
          , pp.
          <volume>148</volume>
          {
          <fpage>156</fpage>
          .
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gilissen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hehir-Kwa</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thung</surname>
          </string-name>
          , D.T., van de Vorst, M.,
          <string-name>
            <surname>van Bon</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willemsen</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwint</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janssen</surname>
            ,
            <given-names>I.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoischen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schenck</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Genome sequencing identi es major causes of severe intellectual disability</article-title>
          .
          <source>Nature</source>
          <volume>511</volume>
          (
          <issue>7509</issue>
          ),
          <volume>344</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Goh</surname>
            ,
            <given-names>K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cusick</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Childs</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barabasi</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          :
          <article-title>The human disease network</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>104</volume>
          (
          <issue>21</issue>
          ),
          <volume>8685</volume>
          {
          <fpage>8690</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Guney</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menche</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barabasi</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          :
          <article-title>Network-based in silico drug e cacy screening</article-title>
          .
          <source>Nature communications 7</source>
          ,
          <issue>10331</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hidalgo</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blumm</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barabasi</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christakis</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>A dynamic network approach for the study of human phenotypes</article-title>
          .
          <source>PLoS computational biology 5</source>
          (
          <issue>4</issue>
          ),
          <year>e1000353</year>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hoehndorf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Scho eld,
          <string-name>
            <given-names>P.N.</given-names>
            ,
            <surname>Gkoutos</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.V.</surname>
          </string-name>
          :
          <article-title>Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases</article-title>
          .
          <source>Scienti c reports 5</source>
          ,
          <issue>10888</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Huntley</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sawford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mutowo-Meullenet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shypitsyna</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonilla</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          , O'donovan, C.:
          <article-title>The goa database: gene ontology annotation updates for 2015</article-title>
          .
          <source>Nucleic acids research</source>
          <volume>43</volume>
          (
          <issue>D1</issue>
          ),
          <source>D1057{D1063</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Inlow</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Restifo</surname>
            ,
            <given-names>L.L.</given-names>
          </string-name>
          :
          <article-title>Molecular and comparative genetics of mental retardation</article-title>
          .
          <source>Genetics</source>
          <volume>166</volume>
          (
          <issue>2</issue>
          ),
          <volume>835</volume>
          {
          <fpage>881</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. van Karnebeek,
          <string-name>
            <given-names>C.D.</given-names>
            ,
            <surname>Stockler</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Treatable inborn errors of metabolism causing intellectual disability: a systematic literature review</article-title>
          .
          <source>Molecular genetics and metabolism 105(3)</source>
          ,
          <volume>368</volume>
          {
          <fpage>381</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kaufman</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>Ayub</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>The genetic basis of non-syndromic intellectual disability: a review</article-title>
          .
          <source>Journal of neurodevelopmental disorders 2</source>
          (
          <issue>4</issue>
          ),
          <volume>182</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Kohler,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Doelken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.C.</given-names>
            ,
            <surname>Mungall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.J.</given-names>
            ,
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Firth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.V.</given-names>
            ,
            <surname>Bailleul-Forestier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.C.</given-names>
            ,
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.L.</given-names>
            ,
            <surname>Brudno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Campbell</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , et al.:
          <article-title>The human phenotype ontology project: linking molecular biology and disease through phenotype data</article-title>
          .
          <source>Nucleic acids research</source>
          <volume>42</volume>
          (
          <issue>D1</issue>
          ),
          <source>D966{D974</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Kohler,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Vasilevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            ,
            <surname>Engelstad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>McMurry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ayme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Baynam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Bello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.</given-names>
            ,
            <surname>Boerkoel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.F.</given-names>
            ,
            <surname>Boycott</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.M.</surname>
          </string-name>
          , et al.:
          <article-title>The human phenotype ontology in 2017</article-title>
          .
          <source>Nucleic acids research</source>
          <volume>45</volume>
          (
          <issue>D1</issue>
          ),
          <source>D865{D876</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McMurry</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balho</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borromeo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brush</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carbon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conlin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engelstad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foster</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gourdine</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacobsen</surname>
            ,
            <given-names>J.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keith</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laraway</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NguyenXuan</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Shefchek</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasilevsky</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Washington, N.,
          <string-name>
            <surname>Hochheiser</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groza</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smedley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haendel</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>The monarch initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>45</volume>
          (
          <issue>D1</issue>
          ),
          <source>D712{D722</source>
          (
          <year>2017</year>
          ). https://doi.org/10.1093/nar/gkw1128, + http://dx.doi.org/10.1093/nar/gkw1128
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faria</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastos</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falcao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Couto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Evaluating go-based semantic similarity measures</article-title>
          .
          <source>In: Proc. 10th Annual Bio-Ontologies Meeting</source>
          . vol.
          <volume>37</volume>
          , p.
          <volume>38</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>