<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The integrative use of anatomy ontology and protein-protein interaction networks to study evolutionary phenotypic transitions</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Pasan</forename><forename type="middle">C</forename><surname>Fernando</surname></persName>
							<email>pasan.fernando@coyotes.usd.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Biology Department</orgName>
								<orgName type="institution">University of South Dakota Vermillion</orgName>
								<address>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Erliang</forename><surname>Zeng</surname></persName>
							<email>erliang.zeng@usd.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Biology Department</orgName>
								<orgName type="institution">University of South Dakota Vermillion</orgName>
								<address>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paula</forename><forename type="middle">M</forename><surname>Mabee</surname></persName>
							<email>paula.mabee@usd.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Biology Department</orgName>
								<orgName type="institution">University of South Dakota Vermillion</orgName>
								<address>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The integrative use of anatomy ontology and protein-protein interaction networks to study evolutionary phenotypic transitions</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F0DD8B1592AFE2C49B10D9594C9CAC8E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:05+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Anatomy ontology</term>
					<term>network analysis</term>
					<term>proteinprotein interactions</term>
					<term>data integration</term>
					<term>gene prediction</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Studying evolutionary phenotypic transitions, such as the fin to limb transition, is popular in evolutionary biology. The recent advances in next-generation technologies have accumulated large volumes of genomics and proteomics data, which can be used to analyze the genetic basis for evolutionary phenotypic transitions. Protein-protein interaction (PPI) networks can be used to predict candidate genes and identify gene modules related to evolutionary phenotypes; however, they suffer from low gene prediction accuracy. Therefore, an integrative framework was developed using PPI networks and anatomy ontology, which significantly improved the accuracy of network-based candidate gene predictions in zebrafish and mouse. This integrative framework will also be used to identify gene modules associated with the fin to limb transition and to study the changes in these modules which lead to the phenotypic change.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>INTRODUCTION</head><p>The process of evolution is accompanied by numerous important phenotypic transitions, such as the fin to limb transition in vertebrates, which contributed to the wealth of phenotypic diversity observed among different species today. Understanding the relationship between genes and their phenotypes is important in explaining the changes in those phenotypes. Traditionally, wet lab methods were used to discover genes to phenotype relations. Despite the higher accuracy of their predictions, wet lab candidate gene prediction methods are high in resource and time consumption, which lead to the popularity of faster computational candidate gene predictions methods <ref type="bibr" target="#b0">[1]</ref> that use the genomic and proteomic data accumulated in public databases.</p><p>The use of PPI networks for candidate gene prediction has become popular due to the availability of large PPI datasets for model organisms. Network analysis algorithms can be used to analyze PPI networks and detect gene modules corresponding to phenotypes in question <ref type="bibr" target="#b0">[1]</ref>. Other gene prediction methods only discover direct gene to phenotype relationships, but network analysis further identifies gene interactions that are important in regulating the phenotype. Understanding the modular structure of gene interactions is extremely important in studying their role in the development of phenotypes because it is the gene interactions that determine the outcome rather than the individual genes.</p><p>The biggest challenge of using PPI networks is the low candidate gene prediction accuracy due to the low quality of the networks <ref type="bibr" target="#b0">[1]</ref>. The PPI networks are known to contain a higher amount of false positive interactions, and some networks are still incomplete <ref type="bibr" target="#b1">[2]</ref>. Before using PPI networks to study evolutionary phenotypic transitions, their quality must be improved to obtain better results. Because we are focusing on anatomical phenotypes, such as the pectoral fin development and the forelimb development, we propose an integrative framework that uses anatomy ontology to incorporate known information about gene-phenotype relationships in literature with the PPI networks. This integration is expected to improve the PPI network quality and predict candidate genes with a higher accuracy. To test this hypothesis, we use known anatomical phenotype annotations from mouse and zebrafish. After the evaluation, the integrated networks will be used to detect gene modules associated with the fin to limb transition in mouse and zebrafish, and the modules will be compared to observe the genetic changes corresponding to the phenotypic transition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. METHODS</head><p>The first step of the integrative framework is constructing gene networks that are entirely based on the known gene to anatomical phenotype annotations. The anatomical profiles for mouse and zebrafish were downloaded from the Monarch initiative data repository (https://monarchinitiative.org/), which retrieves data from model organism databases. Monarch initiative data is manually pre-processed to remove unwanted annotations and the genes are annotated to Uberon anatomy ontology terms <ref type="bibr" target="#b2">[3]</ref>. Uberon (http://uberon.github.io/) is a cross-species anatomy ontology that integrates speciesspecific anatomy ontologies, such as Mouse Anatomy Ontology (MA) and Zebrafish Anatomy Ontology (ZFA), which makes it suitable for evolutionary analyses involving multiple species <ref type="bibr" target="#b3">[4]</ref>.</p><p>Semantic similarity scores between anatomy ontology terms were calculated to obtain pairwise gene similarity values for all the genes in mouse and zebrafish. Semantic similarity is a quantitative value that represents similarity between two ontology terms based on their location in the ontological structure and their gene annotations <ref type="bibr" target="#b4">[5]</ref>. Four different semantic similarity methods (Lin, Resnik, Schlicker, and Wang) were used to generate pairwise gene similarity matrices, which in turn were used to generate gene networks that are entirely based on the anatomy ontology annotations of the genes (anatomy-based gene networks). These networks were filtered using a gene similarity score cutoff to remove interactions with low scores. In these networks, the genes with higher similarity scores are the ones that are annotated to similar anatomy ontology terms.</p><p>The PPI networks for mouse and zebrafish were downloaded from the STRING database (https://stringdb.org/). Then, the PPI networks were integrated with the anatomy-based gene networks using pairwise gene similarity scores of the two networks in a probabilistic model. In the integrated network, only the gene pairs that receive high similarity scores from both the input networks have high gene similarity scores. To assess the candidate gene prediction performance of the integrated networks and the PPI networks, Uberon anatomy ontology terms that have at least 10 or more gene annotations were used from the zebrafish and mouse anatomical profiles downloaded from the Monarch initiative data repository. Hishigaki prediction method <ref type="bibr" target="#b5">[6]</ref> was used as the network-based candidate gene prediction algorithm and leave-one-out-cross-validation was used as the evaluation technique. Receiver operating characteristic (ROC) and precision-recall curves were generated for the comparison of different network types. Although the goal was to compare the integrated versus PPI networks, the anatomy-based gene networks were also included in the comparison.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PRELIMINARY RESULTS AND DISCUSSION</head><p>The ROC and precision-recall curve comparisons for mouse and zebrafish indicate that the integrated networks significantly outperform the original PPI networks when predicting candidate genes (Only the zebrafish ROC curve comparisons of the four semantic similarity calculation methods are shown in Fig. <ref type="figure" target="#fig_0">1</ref>). This result is consistent among the four semantic similarity calculation methods used. The higher candidate gene prediction accuracy of the integrated networks means that their network quality was increased during the integration. Although anatomy-based gene networks (shown in blue in Fig. <ref type="figure" target="#fig_0">1</ref>) have the highest performance among most of the semantic similarity calculation methods, they are not suitable for candidate gene prediction or identifying network modules because they only contain genes that have at least one anatomy ontology term annotation. This number is low compared to the integrated and PPI networks. For instance, the zebrafish anatomy-based gene network constructed using the Schlicker method contains 5,386 genes, whereas the corresponding integrated network contains 12,755 genes. The integrated networks contain a large number of unknown genes coming from PPI networks, which can be potential candidates for anatomical phenotypes. Therefore, integrated networks are more useful for downstream network analysis.</p><p>The integrated network with the highest performance for mouse and zebrafish will be used for detecting gene modules associated with the fin to limb transition. Because the quality of the integrated networks is higher than the PPI networks, the gene modules will be more accurate. The gene modules for pectoral fin and pelvic fin in zebrafish will be compared with gene modules for forelimb and hindlimb in mouse, respectively, to identify modular changes genes during the fin to limb transition. This work showcases how anatomy ontology can be used to improve the quality of candidate gene predictions and to perform efficient network analyses to study evolutionary transitions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Comparison of ROC curves for PPI (red), integrated (green), and anatomy-based gene (blue) networks for the four semantic similarity methods. The integrated and anatomy-based gene networks clearly outperform the PPI networks when predicting candidate genes.</figDesc><graphic coords="2,313.60,72.00,244.75,210.00" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Proceedings of the 9th International Conference on Biological Ontology (ICBO 2018), Corvallis, Oregon, USA</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1">August 7-10, 2018</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Network-based prediction of protein function</title>
		<author>
			<persName><forename type="first">R</forename><surname>Sharan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ulitsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Shamir</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Molecular systems biology</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page">88</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Comparative assessment of large-scale data sets of protein-protein interactions</title>
		<author>
			<persName><forename type="first">C</forename><surname>Mering</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">417</biblScope>
			<biblScope unit="page" from="399" to="403" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Mungall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="D712" to="D722" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Haendel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical semantics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page">21</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Semantic Similarity in Biomedical Ontologies</title>
		<author>
			<persName><forename type="first">C</forename><surname>Pesquita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Faria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">O</forename><surname>Falcão</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lord</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Couto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLoS Comput Biol</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page">e1000443</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Assessment of prediction accuracy of protein function from proteinprotein interaction data</title>
		<author>
			<persName><forename type="first">H</forename><surname>Hishigaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nakai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ono</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tanigami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Takagi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Yeast</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="523" to="531" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
