-

Cooperation of bio-ontologies for the classi cation of genetic intellectual disabilities : a diseasome approach

Gabin Personeni

Marie-Dominique Devignes

Malika Smal-Tabbone

Philippe Jonveaux

Celine Bonnet

Adrien Coulet

1 2 0 Department of Genetics, Nancy University Hospital, Inserm U954, University of Lorraine , Nancy , France 1 Stanford Center for Biomedical Informatics Research, Stanford University , Stanford, California , USA 2 Universite de Lorraine , CNRS, Inria, LORIA, Nancy , France

Bio-ontologies are widely used to annotate and characterize biological objects or situations, enabling the use of shared or similar features in classi cation tasks. It may appear bene cial to make two or more bio-ontologies cooperate for building more complete descriptions, and therefore more accurate classi cations of biological objects. This hypothesis is evaluated here for the classi cation of an heterogeneous set of 374 Genetic Intellectual Disabilities (GIDs), using a diseasome approach. These GIDs are annotated with classes of the Human Phenotype Ontology (HPO) and their causal genes with the three aspects of the Gene Ontology (GO). We test two semantic similarity measures, and di erent combinations of ontologies, to connect semantically similar diseases. We then evaluate how well these ontologies, and their combinations, are exploited by the similarity measures to classify GIDs in accordance with an expert classi cation. Results show that combining the three aspects of GO achieves very good overall performance, and that, for each GID class, a particular combination of 2 or 3 GO aspects and occasionally HPO yields the best performance. These results illustrate how bio-ontologies can cooperate in a classi cation by re ning the characterization of biological objects.

Semantic similarity Disabilities Diseasome

Bio-ontologies

Genetic Intellectual Bio-ontologies, such as the Gene Ontology (GO) [ 1 ] or the Human Phenotype Ontology (HPO) [ 16 ] are used to annotate biological objects such as gene products or diseases, enabling their semantic comparison. In particular, there exists a wide collection of semantic similarity measures, allowing to quantify the similarity of objects with regard to their annotations [ 19, 3 ]. We investigate in this article how several bio-ontologies can be used conjointly to cooperate to improve classi cation of a heterogeneous set of Genetic Intellectual Disabilities (GID).

Numerous studies report on the hypothesis that analyzing disease networks, here named diseasomes, may be a mean to discover new knowledge on mechanisms or treatments of diseases [ 2 ]. Various methods for diseasome building have been described in the literature. For instance, two diseases can be associated if they share one causing gene [ 8 ], phenotype [ 10 ] or are linked through a chain of protein-protein interactions [ 9 ]. Hoehndorf et al. [ 11 ] proposed a diseasome that associates diseases with respect to their phenotypic similarity. They assembled a dataset, extracted from the literature, of about 6; 000 diseases annotated with their associated phenotypes, using classes of the Monarch Disease Ontology (MonDO) [ 18 ]. The similarity of diseases with regard to their annotation were subsequently computed with the SimGIC function [ 19 ]. We propose here to extend this approach by conjointly using annotations taken from several ontologies, and by assessing their respective contribution to a disease classi cation task.

The hypothesis that several bio-ontologies can somehow cooperate to re ne a diseasome is evaluated here within the task of classifying GIDs. The classi cation of GIDs is of particular interest, and challenging for experts, because these diseases are very heterogeneous both in terms of causal genes and clinical outcomes. We focused on a set of 374 GIDs for which causal genes are known and used for genetic diagnosis. We manually classi ed these diseases with experts into ve groups, on the basis of the biological mechanisms disturbed in the disease: regulation, regulation of genetic expression, metabolic, synaptic, neurogenesis. We detail in this article a diseasome approach based on semantic similarity of GIDs at both the phenotype and genetic level, and study how it can match an expert GID classi cation. 2 2.1

Material and Methods Data and Ontologies

A dataset of 374 GIDs was built for this study on the basis of a list of 312 genes associated with GIDs derived from the work of Gilissen et al. [ 7 ] who initially compiled two lists of GID genes: a list of 528 \known" GID genes and a list of 628 \candidates" genes, based on the number of reported patients in which a mutation or variant of the gene is observed. The 312 genes retained here (230 \known" genes and 82 \candidates") are those that are found associated with a genetic disease in OMIM database (Online Mendelian Inheritance in Man, http://omim.org) and used for diagnosis in the Genetics Laboratory of Nancy Hospital. Four distinct ontologies were used in this study: Human Phenotype Ontology (HPO) [ 17 ] and the three aspects of Gene Ontology (GO) [ 1 ] named here BP for Biological Process, CC for Cellular Component and MF for Molecular Function. These three aspects of GO are organized into independent hierarchies of classes related by the subsumption relation, and are here considered as separate ontologies. HPO annotations of GIDs were collected from the HPO database (http://hpo.jax.org). BP, CC and MF annotations were collected from the GOA database [ 12 ] at the European Bioinformatics Institute (https://www.ebi.ac.uk/GOA) for all UniProtKB proteins encoded by the genes associated to GIDs and transferred to the corresponding GID. The average number of HPO classes associated per GID was 22:4 17:5, whereas the average numbers of BP, CC and MF classes per GID were 14:8 16:4, 5:8 4:3 and 5:2 4, respectively. All GIDs could be associated with at least one HPO class and one BP class. Only 27 GIDs were found lacking one or the other aspect of GO annotation, mostly CC. 2.2

Expert classi cation of GIDs

GID diversity and heterogeneity renders their classi cation di cult. Our manual classi cation is an attempt to integrate the state-of-the-art knowledge about GIDs[ 13, 5, 15, 14 ] into the de nition of ve classes. The \Metabolic" class represents diseases a ecting synthesis or degradation of metabolites, leading to metabolite de ciency or accumulation with deleterious consequences. The \Synaptic" class represents diseases a ecting the structure and the function of synapses. The \Neurogenesis" class represents diseases a ecting neuronal migration or proper development of central nervous system. The \Regulation of genetic expression" class represents diseases in which genetic expression (chromatin structure, transcription and its regulation, translation and post-translational modi cations) is a ected. The \Regulation" class is for all other diseases in which control of biological processes other than genetic expression is a ected (for instance transport of proteins or energetic balance of the cell). Our dataset of 374 GIDs with their 312 responsible genes was manually distributed into these ve classes by expert inspection of their OMIM notices (disease and gene ones). The resulting classi cation likely relies on several subjective arbitrary statements, but it appeared su cient for the methodology used in this study. Table 1 quantitatively describes the composition of each class of GIDs.

The broad de nition of each class leads to possible assignment of the same disease and gene to two di erent classes. This is the case of 3, 5 and 10 GIDs of the Metabolic, Synaptic and Neurogenesis classes, respectively that are also classi ed in the Regulation class. One additional GID from the Neurogenesis class is also classi ed as Regulation of genetic expression. This GID (OMIM #613454: Rett syndrome, congenital form) illustrates the di culty to classify GIDs, as it is described as a severe neurodevelopmental disorder and therefore classi ed in the Neurogenesis class, whereas its responsible gene is the FOXG1 gene, which codes for a repressor of the forked-head transcription factor family, pointing to the Regulation of genetic expression class. 2.3

Semantic similarity measures

Semantic similarity measures quantify the proximity of two ontology classes, or objects described by a set of classes, from an ontology. Such similarity measures may be used to build a diseasome, using disease annotations linked to an ontology [ 11 ]. We applied such similarity-based diseasome approach to build a diseasome of GIDs, on the basis of both their phenotypic and causal gene product annotations. These annotations are expressed as classes from four ontologies or ontologies fragments : HPO and the 3 aspects of Gene Ontology considered separately | BP, CC and MF.

We aim at assessing the contribution of several ontologies to a diseasome, but we also use two semantic similarity measures, to compare how di erent measures behave with ontology combinations. First, we use a node-based semantic similarity measure: SimGIC [ 19 ], which computes the ratio of common classes among the ontology classes of two diseases, weighted by the information content of each class, and considering all ancestors of each disease annotations. The information content of a class, with respect to a dataset of annotations, is computed as IC(x) = log2(P (x)), where P (x) is the probability that an object is annotated with the class x. Higher values of IC denotes higher speci city of the class. SimGIC was rst introduced to compute similarity of genes annotated by GO classes, and has been successfully used with MonDO to build a diseasome based on phenotypic similarity [ 11 ]. Second, we use the edge-based similarity measure IntelliGO [ 3 ], which permits to compare two biological objects by rst computing distances in the hierarchy between pairs of classes annotating the objects, and then aggregating each pairwise similarities into a single-object similarity score. As SimGIC, this aggregation step takes into account the information content of the compared classes to weight their contribution to the similarity.

To assess the contribution of each ontology, we build several similarity functions, using every possible combination of ontologies among BP, CC, MF and HPO, combined with both IntelliGO and SimGIC. For this purpose and in a rst approach, we simply average the similarities computed separately with each ontology. We thus test 15 possible combinations of one or more ontologies, in turn combined with IntelliGO or SimGIC, resulting in 30 similarity functions. 2.4

Evaluation of similarity functions with respect to a reference classi cation

Hoehndorf et al. described in [ 11 ] a methodology to evaluate the accuracy of a similarity function with respect to a classi cation of diseases. This methodology aims at verifying that the similarity function gives higher similarity scores to pairs of diseases that belong to a same class of disease.

This evaluation is based on a Receiver Operating Characteristic (ROC) analysis, quantifying the accuracy of a binary classi cation model at varying degrees of sensibility. In particular, a ranking of disease pairs, based on their similarity, serves as a classi cation model whose sensibility can be adjusted by de ning a threshold of similarity above which pairs of diseases are labeled as positive by the model. The ROC curve represents the true positives rate as a function of the false positive rate. The ROC Area Under the Curve or ROCAUC can be computed from such a curve, and represents the probability for a random pair of diseases from the positive class to have a higher similarity than a random pair of diseases from the negative class. We note ROCAU C(R; P ) the function that computes the ROCAUC given a ranking R, and the set of positive elements P , describing which elements of R are to be considered positive for the purpose of the evaluation.

The ROCAUC-based evaluation can be conducted either for each single class of diseases or for the entire classi cation of diseases. This evaluation can also be performed with a classi cation of diseases in which a disease can belong to more than one class.

Data: The set of diseases D, a similarity between diseases

sim : D D ! R+ Result: Average ROCAUC for all diseases begin

ROCAU Cavg = 0 foreach disease d 2 D do ranking x 2 (D fdg) ranked in descending order of sim(d; x) pos fx 2 D j d and x share a disease classg

ROCAU Cavg ROCAU Cavg + ROCAU C(ranking; pos) end end return ROCAU Cavg=jDj Algorithm 1: Evaluation algorithm for a similarity function sim on a classication task with several overlapping classes of diseases.

Data: The set of diseases D, a disease class C, a similarity between diseases sim : D D ! R+ Result: Average ROCAUC for diseases of C begin

ROCAU Cavg = 0 pos fx 2 D j x has class Cg foreach disease d in class C do ranking x 2 (D fdg) ranked in descending order of sim(d; x) ROCAU Cavg ROCAU Cavg + ROCAU C(ranking; pos) end end return ROCAU Cavg=jDCj Algorithm 2: Evaluation algorithm for a similarity function sim on a classication task with respect to a single class of diseases C.

The Algorithm 1 describes how the evaluation is performed globally on all disease classes, for an arbitrary similarity function sim. For each disease d of the dataset, we compute the ranking of the other diseases using sim, from the most similar to the least similar. Here, we want high-ranking diseases to share a disease class with d, thus, we consider the positive class to be the set of diseases sharing a GID class with d. The ranking is then evaluated by computing the ROCAUC for the prediction on that positive class. ROCAUCs for each disease in the dataset are then averaged to obtain a global evaluation score.

The Algorithm 2 describes how a similarity function sim is evaluated with respect to a single disease class, noted C. For each disease d of the GID class C, we compute a ranking of the other diseases using sim, and we de ne the positive class to be the set of diseases that belong to C. Again, the ranking is then evaluated by computing the ROCAUC for the prediction on that positive class. ROCAUCs for each disease of the class are then averaged to obtain a score representing how well the similarity function re ects this class. 3

Results

We apply the methodology described previously to our set of 374 GID. This set of diseases has phenotypic annotations expressed as HPO classes and the genetic annotations of their causing genes, expressed as GO classes and split into the three aspects of GO considered here as three independent ontologies. Indeed, as their class hierarchy are separate, semantic similarity measures such as IntelliGO and SimGIC cannot compare classes from these di erent aspects.

We computed all pairwise similarities on our set of GIDs, for all 30 possible similarity functions resulting from the combinations of the four ontologies (BP, CC, MF and HPO) and the two semantic similarity measures (IntelliGO and SimGIC). Each of these similarity functions was then evaluated by computing the average ROCAUC for each of the 5 GID classes, as described in Algorithm 2, and for all classes considered together, as described in Algorithm 1.

Tables 2 and 3 present a selection of the results of theses evaluations for di erent similarity functions based on di erent combinations of ontologies, computed using IntelliGO and SimGIC respectively. Results are given for 6 classi cation tasks, one task for each GID class evaluated separately, as described in Algorithm 2, and a sixth task evaluating the performance of the similarity function on all 5 GID classes, as described in Algorithm 1. For each task, we tested every possible combination of ontologies from BP, CC, MF and HPO, but both Tables report only results for combinations of interest: single ontologies, all GO aspects, all ontologies, and all combinations that provide the best performance on any task for either SimGIC or IntelliGO.

Unsurprisingly, the results are highly variable depending on which ontologies are considered, and which similarity measure is used, and they also vary across di erent classi cation tasks. In particular, we observe that HPO does not positively contribute to the performance when combined with other ontologies, with the exception of the classi cation task of the Neurogenesis class. We also note that similarity functions using only HPO have poor performance compared with those using a single GO aspect in most cases, although such a function performs better than a random classi er. This suggests that, for classes other than Neurogenesis, HPO does not bring more information compared to GO than noise. However, combining several GO aspects produces a great increase in performance, which is notably visible for the Regulation of Genetic Expression class: IntelliGO performance increases from 0:740 in the best case with only one GO aspect to 0:803 with all three of them, and SimGIC performance increases from 0:905 to 0:936.

If combining all three aspects of GO provides the best overall performance, we nonetheless observe that this is not necessarily the best combination for the classi cation task on each individual GID class : { The Regulation class is poorly predicted by most similarity functions. In particular, similarity functions based on HPO alone are not performing better than a random classi cation. { The Regulation of Genetic Expression class is best when computed using SimGIC (0:936) and considering the three aspects of GO conjointly. However, in this case, considering HPO on top of GO does not o er any increase in performance. Similarly, IntelliGO performs best on this classi cation task when considering the three aspects of GO, but shows a decrease in performance when also considering HPO. { The Metabolic class is best predicted using IntelliGO (0:862) when using all three aspects of GO. However, we observe that SimGIC (0:785) obtains its We study in this article how di erent ontologies can cooperate to improve how a diseasome can re ect the expert knowledge of a classi cation of GID. We evaluate here two semantic similarity measures, IntelliGO and SimGIC on a classi cation task realized with 5 GID classes, using di erent combinations of phenotypic and genetic ontologies. The results show that phenotypic annotations from HPO are not su cient to re ect our expert classi cation, while genetic annotations from each aspect of GO can o er better performance. Furthermore, combining several aspects of GO further improve performance and, for the Neurogenesis GID class, combining a GO aspect with HPO o er the best performance.

This illustrates that the cooperation of several ontologies is suitable for such a classi cation task and can improve a diseasome approach. However, the relevance of each ontology greatly depends on individual disease classes, and considering too many ontologies may have a negative e ect. This constitutes a limitation of the use of semantic similarity measures, in that they are sensitive to the quality of annotations, as well to annotations irrelevant to class to predict. The results obtained show that overall, considering only the 3 aspects of GO yields often slightly better classi cation performance than considering both HPO and GO. Here, the contribution of HPO to the classi cation performance may be limited by the GID dataset we used, as these diseases may be too di cult to distinguish based on their phenotypes alone, as other studies show that HPO is suitable to classify more phenotypically heterogeneous sets of diseases [ 11 ]. It seems that in such cases, deciding on which ontologies to consider requires an iterative empirical approach that consider all possible combinations of ontologies. In fact, because performances do not necessarily increase with the number of ontologies, proposing a non-exhaustive strategy to select the best combination of ontologies for a particular task is not trivial.

Furthermore, it may be necessary to develop more sophisticated methods for aggregating similarities based on di erent ontologies. Here, we used an unweighted average of similarities each using a di erent ontology. Weighting the contribution of each ontology to the aggregated similarity could be done in several ways, for instance, by considering the number of annotations in each ontology for the compared disease, or by empirically determining an appropriate weighting scheme, rather than including or excluding ontologies. Moreover, as we observe that di erent settings and therefore di erent diseasome models are optimal only for certain classes, ways to aggregate di erent models could be explored, such as boosting algorithms [ 6 ] or bagging predictors [ 4 ].

Diseasomes based on semantic similarity are able to re ect an expert classi cation of diseases, as illustrated in the di erent classi cation experiments presented in this article. Such a diseasome can be used to classify new diseases by simple propagation of the neighboring diseases classes. Here, the cooperation of several biomedical ontology was shown to be relevant in many cases, however selecting the right ontologies to consider for a particular task require some trial and error. We note that both semantic similarity measures, IntelliGO and SimGIC, have varying performance on the di erent classi cation tasks presented in this article: some GID classes seem to be better predicted by one of these two measures. However, these di erences do not permit to conclude that one of these measures perform strictly better than the other, as their overall performances are very similar. In summary, semantic similarity measures with various combinations of ontologies allow to propose a diseasome as a model synthesizing descriptions of GIDs in regards with several ontologies, in good agreement with an expert classi cation of such diseases. Such cooperation of bio-ontologies could also be explored with other machine learning methods.

1. Ashburner , M. , Ball , C.A. , Blake , J.A. , Botstein , D. , Butler , H. , Cherry , J.M. , Davis , A.P. , Dolinski , K. , Dwight , S.S. , Eppig , J.T. , et al.: Gene Ontology: tool for the uni cation of biology . Nature genetics 25(1) , 25 { 29 ( 2000 )

2. Barabasi , A.L. , Gulbahce , N. , Loscalzo , J.: Network medicine: a network-based approach to human disease . Nature reviews genetics 12(1) , 56 ( 2011 )

3. Benabderrahmane , S. , Smail-Tabbone , M. , Poch , O. , Napoli , A. , Devignes , M.D.: Intelligo: a new vector-based semantic similarity measure including annotation origin . BMC bioinformatics 11(1) , 1 ( 2010 )

4. Breiman , L. : Bagging predictors . Machine learning 24(2) , 123 { 140 ( 1996 )

5. Chelly , J. , Khelfaoui , M. , Francis , F. , Cherif , B. , Bienvenu , T. : Genetics and pathophysiology of mental retardation . European Journal of Human Genetics 14 ( 6 ), 701 ( 2006 )

6. Freund , Y. , Schapire , R.E. , et al.: Experiments with a new boosting algorithm . In: Icml . vol. 96 , pp. 148 { 156 . Citeseer ( 1996 )

7. Gilissen , C. , Hehir-Kwa , J.Y. , Thung , D.T., van de Vorst, M., van Bon , B.W. , Willemsen , M.H. , Kwint , M. , Janssen , I.M. , Hoischen , A. , Schenck , A. , et al.: Genome sequencing identi es major causes of severe intellectual disability . Nature 511 ( 7509 ), 344 ( 2014 )

8. Goh , K.I. , Cusick , M.E. , Valle , D. , Childs , B. , Vidal , M. , Barabasi , A.L. : The human disease network . Proceedings of the National Academy of Sciences 104 ( 21 ), 8685 { 8690 ( 2007 )

9. Guney , E. , Menche , J. , Vidal , M. , Barabasi , A.L. : Network-based in silico drug e cacy screening . Nature communications 7 , 10331 ( 2016 )

10. Hidalgo , C.A. , Blumm , N. , Barabasi , A.L. , Christakis , N.A. : A dynamic network approach for the study of human phenotypes . PLoS computational biology 5 ( 4 ), e1000353 ( 2009 )

11. Hoehndorf , R. , Scho eld, P.N. , Gkoutos , G.V. : Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases . Scienti c reports 5 , 10888 ( 2015 )

12. Huntley , R.P. , Sawford , T. , Mutowo-Meullenet , P. , Shypitsyna , A. , Bonilla , C. , Martin , M.J. , O'donovan, C.: The goa database: gene ontology annotation updates for 2015 . Nucleic acids research 43 ( D1 ), D1057{D1063 ( 2014 )

13. Inlow , J.K. , Restifo , L.L. : Molecular and comparative genetics of mental retardation . Genetics 166 ( 2 ), 835 { 881 ( 2004 )

14. van Karnebeek, C.D. , Stockler , S. : Treatable inborn errors of metabolism causing intellectual disability: a systematic literature review . Molecular genetics and metabolism 105(3) , 368 { 381 ( 2012 )

15. Kaufman , L., Ayub , M. , Vincent , J.B. : The genetic basis of non-syndromic intellectual disability: a review . Journal of neurodevelopmental disorders 2 ( 4 ), 182 ( 2010 )

16. Kohler, S. , Doelken , S.C. , Mungall , C.J. , Bauer , S. , Firth , H.V. , Bailleul-Forestier , I. , Black , G.C. , Brown , D.L. , Brudno , M. , Campbell , J. , et al.: The human phenotype ontology project: linking molecular biology and disease through phenotype data . Nucleic acids research 42 ( D1 ), D966{D974 ( 2013 )

17. Kohler, S. , Vasilevsky , N.A. , Engelstad , M. , Foster , E. , McMurry , J. , Ayme , S. , Baynam , G. , Bello , S.M. , Boerkoel , C.F. , Boycott , K.M. , et al.: The human phenotype ontology in 2017 . Nucleic acids research 45 ( D1 ), D865{D876 ( 2016 )

18. Mungall , C.J. , McMurry , J.A. , Khler , S. , Balho , J.P. , Borromeo , C. , Brush , M. , Carbon , S. , Conlin , T. , Dunn , N. , Engelstad , M. , Foster , E. , Gourdine , J. , Jacobsen , J.O. , Keith , D. , Laraway , B. , Lewis , S.E. , NguyenXuan , J., Shefchek , K. , Vasilevsky , N. , Yuan , Z. , Washington, N., Hochheiser , H. , Groza , T. , Smedley , D. , Robinson , P.N. , Haendel , M.A. : The monarch initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species . Nucleic Acids Research 45 ( D1 ), D712{D722 ( 2017 ). https://doi.org/10.1093/nar/gkw1128, + http://dx.doi.org/10.1093/nar/gkw1128

19. Pesquita , C. , Faria , D. , Bastos , H. , Falcao , A. , Couto , F. : Evaluating go-based semantic similarity measures . In: Proc. 10th Annual Bio-Ontologies Meeting . vol. 37 , p. 38 ( 2007 )