=Paper= {{Paper |id=Vol-2285/ICBO_2018_paper_52 |storemode=property |title=The Integrative use of Anatomy Ontology and Protein-protein Interaction Networks to Study Evolutionary Phenotypic Transitions |pdfUrl=https://ceur-ws.org/Vol-2285/ICBO_2018_paper_52.pdf |volume=Vol-2285 |authors=Pasan Fernando,Erliang Zeng,Paula Mabee |dblpUrl=https://dblp.org/rec/conf/icbo/FernandoM18 }} ==The Integrative use of Anatomy Ontology and Protein-protein Interaction Networks to Study Evolutionary Phenotypic Transitions== https://ceur-ws.org/Vol-2285/ICBO_2018_paper_52.pdf
    Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                      1




     The integrative use of anatomy ontology and
     protein-protein interaction networks to study
          evolutionary phenotypic transitions
                                       Pasan C. Fernando, Erliang Zeng, Paula M. Mabee
                                                    Biology Department
                                                 University of South Dakota
                                                      Vermillion, USA
                         pasan.fernando@coyotes.usd.edu, erliang.zeng@usd.edu, paula.mabee@usd.edu


Abstract— Studying evolutionary phenotypic transitions, such                interactions that are important in regulating the phenotype.
as the fin to limb transition, is popular in evolutionary biology.          Understanding the modular structure of gene interactions is
The recent advances in next-generation technologies have                    extremely important in studying their role in the development
accumulated large volumes of genomics and proteomics data,                  of phenotypes because it is the gene interactions that
which can be used to analyze the genetic basis for evolutionary             determine the outcome rather than the individual genes.
phenotypic transitions. Protein-protein interaction (PPI)
networks can be used to predict candidate genes and identify                    The biggest challenge of using PPI networks is the low
gene modules related to evolutionary phenotypes; however, they              candidate gene prediction accuracy due to the low quality of
suffer from low gene prediction accuracy. Therefore, an
                                                                            the networks [1]. The PPI networks are known to contain a
integrative framework was developed using PPI networks and
anatomy ontology, which significantly improved the accuracy of
                                                                            higher amount of false positive interactions, and some
network-based candidate gene predictions in zebrafish and                   networks are still incomplete [2]. Before using PPI networks
mouse. This integrative framework will also be used to identify             to study evolutionary phenotypic transitions, their quality
gene modules associated with the fin to limb transition and to              must be improved to obtain better results. Because we are
study the changes in these modules which lead to the phenotypic             focusing on anatomical phenotypes, such as the pectoral fin
change.                                                                     development and the forelimb development, we propose an
                                                                            integrative framework that uses anatomy ontology to
    Keywords- Anatomy ontology; network analysis; protein-                  incorporate known information about gene-phenotype
protein interactions; data integration; gene prediction.                    relationships in literature with the PPI networks. This
                                                                            integration is expected to improve the PPI network quality and
                      I.    INTRODUCTION                                    predict candidate genes with a higher accuracy. To test this
    The process of evolution is accompanied by numerous                     hypothesis, we use known anatomical phenotype annotations
important phenotypic transitions, such as the fin to limb                   from mouse and zebrafish. After the evaluation, the integrated
transition in vertebrates, which contributed to the wealth of               networks will be used to detect gene modules associated with
phenotypic diversity observed among different species today.                the fin to limb transition in mouse and zebrafish, and the
Understanding the relationship between genes and their                      modules will be compared to observe the genetic changes
phenotypes is important in explaining the changes in those                  corresponding to the phenotypic transition.
phenotypes. Traditionally, wet lab methods were used to
discover genes to phenotype relations. Despite the higher                                            II. METHODS
accuracy of their predictions, wet lab candidate gene                           The first step of the integrative framework is constructing
prediction methods are high in resource and time                            gene networks that are entirely based on the known gene to
consumption, which lead to the popularity of faster                         anatomical phenotype annotations. The anatomical profiles
computational candidate gene predictions methods [1] that use               for mouse and zebrafish were downloaded from the Monarch
the genomic and proteomic data accumulated in public                        initiative data repository (https://monarchinitiative.org/),
databases.                                                                  which retrieves data from model organism databases.
                                                                            Monarch initiative data is manually pre-processed to remove
    The use of PPI networks for candidate gene prediction has               unwanted annotations and the genes are annotated to Uberon
become popular due to the availability of large PPI datasets                anatomy ontology terms [3]. Uberon (http://uberon.github.io/)
for model organisms. Network analysis algorithms can be                     is a cross-species anatomy ontology that integrates species-
used to analyze PPI networks and detect gene modules                        specific anatomy ontologies, such as Mouse Anatomy
corresponding to phenotypes in question [1]. Other gene                     Ontology (MA) and Zebrafish Anatomy Ontology (ZFA),
prediction methods only discover direct gene to phenotype                   which makes it suitable for evolutionary analyses involving
relationships, but network analysis further identifies gene                 multiple species [4].



    ICBO 2018                                                   August 7-10, 2018                                                   1
    Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA                                  2




    Semantic similarity scores between anatomy ontology
terms were calculated to obtain pairwise gene similarity
values for all the genes in mouse and zebrafish. Semantic
similarity is a quantitative value that represents similarity
between two ontology terms based on their location in the
ontological structure and their gene annotations [5]. Four
different semantic similarity methods (Lin, Resnik, Schlicker,
and Wang) were used to generate pairwise gene similarity
matrices, which in turn were used to generate gene networks
that are entirely based on the anatomy ontology annotations of
the genes (anatomy-based gene networks). These networks
were filtered using a gene similarity score cutoff to remove
interactions with low scores. In these networks, the genes with
higher similarity scores are the ones that are annotated to
similar anatomy ontology terms.

    The PPI networks for mouse and zebrafish were
downloaded from the STRING database (https://string-
db.org/). Then, the PPI networks were integrated with the
anatomy-based gene networks using pairwise gene similarity                  Fig. 1. Comparison of ROC curves for PPI (red), integrated (green), and
scores of the two networks in a probabilistic model. In the                 anatomy-based gene (blue) networks for the four semantic similarity
                                                                            methods. The integrated and anatomy-based gene networks clearly
integrated network, only the gene pairs that receive high
                                                                            outperform the PPI networks when predicting candidate genes.
similarity scores from both the input networks have high gene
similarity scores. To assess the candidate gene prediction                  large number of unknown genes coming from PPI networks,
performance of the integrated networks and the PPI networks,                which can be potential candidates for anatomical phenotypes.
Uberon anatomy ontology terms that have at least 10 or more                 Therefore, integrated networks are more useful for
gene annotations were used from the zebrafish and mouse                     downstream network analysis.
anatomical profiles downloaded from the Monarch initiative
data repository. Hishigaki prediction method [6] was used as
                                                                                      The integrated network with the highest performance
the network-based candidate gene prediction algorithm and                   for mouse and zebrafish will be used for detecting gene
leave-one-out-cross-validation was used as the evaluation
                                                                            modules associated with the fin to limb transition. Because the
technique. Receiver operating characteristic (ROC) and                      quality of the integrated networks is higher than the PPI
precision-recall curves were generated for the comparison of
                                                                            networks, the gene modules will be more accurate. The gene
different network types. Although the goal was to compare the               modules for pectoral fin and pelvic fin in zebrafish will be
integrated versus PPI networks, the anatomy-based gene
                                                                            compared with gene modules for forelimb and hindlimb in
networks were also included in the comparison.                              mouse, respectively, to identify modular changes genes during
                                                                            the fin to limb transition. This work showcases how anatomy
        III. PRELIMINARY RESULTS AND DISCUSSION                             ontology can be used to improve the quality of candidate gene
    The ROC and precision-recall curve comparisons for                      predictions and to perform efficient network analyses to study
mouse and zebrafish indicate that the integrated networks                   evolutionary transitions.
significantly outperform the original PPI networks when
predicting candidate genes (Only the zebrafish ROC curve                                                  REFERENCES
comparisons of the four semantic similarity calculation                     [1]   R. Sharan, I. Ulitsky, and R. Shamir, "Network-based prediction of
methods are shown in Fig. 1). This result is consistent among                     protein function," Molecular systems biology, vol. 3, 2007, p. 88.
the four semantic similarity calculation methods used. The                  [2]   C. von Mering et al., "Comparative assessment of large-scale data sets
higher candidate gene prediction accuracy of the integrated                       of protein-protein interactions," Nature, vol. 417, 2002, pp. 399-403.
networks means that their network quality was increased                     [3]   C. J. Mungall et al., "The Monarch Initiative: an integrative data and
during the integration. Although anatomy-based gene                               analytic platform connecting phenotypes to genotypes across species,"
networks (shown in blue in Fig. 1) have the highest                               Nucleic Acids Res, vol. 45, 2017, pp. D712-D722.
performance among most of the semantic similarity                           [4]   M. A. Haendel et al., "Unification of multi-species vertebrate anatomy
                                                                                  ontologies for comparative biology in Uberon," Journal of biomedical
calculation methods, they are not suitable for candidate gene                     semantics, vol. 5, 2014, p. 21.
prediction or identifying network modules because they only
                                                                            [5]   C. Pesquita, D. Faria, A. O. Falcão, P. Lord, and F. M. Couto,
contain genes that have at least one anatomy ontology term                        "Semantic Similarity in Biomedical Ontologies," PLoS Comput Biol,
annotation. This number is low compared to the integrated and                     vol. 5, 2009, p. e1000443.
PPI networks. For instance, the zebrafish anatomy-based gene                [6]   H. Hishigaki, K. Nakai, T. Ono, A. Tanigami, and T. Takagi,
network constructed using the Schlicker method contains                           "Assessment of prediction accuracy of protein function from protein–
5,386 genes, whereas the corresponding integrated network                         protein interaction data," Yeast, vol. 18, 2001, pp. 523-531.
contains 12,755 genes. The integrated networks contain a




    ICBO 2018                                                   August 7-10, 2018                                                               2