<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">An Analysis of Differences In Biological Pathway Resources</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Lucy</forename><forename type="middle">L</forename><surname>Wang</surname></persName>
							<email>lucylw@uw.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Biomedical Informatics and Medical Education</orgName>
								<orgName type="institution">University of Washington Seattle</orgName>
								<address>
									<postCode>98195</postCode>
									<settlement>Washington</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">John</forename><forename type="middle">H</forename><surname>Gennari</surname></persName>
							<email>gennari@uw.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Biomedical Informatics and Medical Education</orgName>
								<orgName type="institution">University of Washington Seattle</orgName>
								<address>
									<postCode>98195</postCode>
									<settlement>Washington</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Neil</forename><forename type="middle">F</forename><surname>Abernethy</surname></persName>
							<email>neila@uw.edu</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Biomedical Informatics and Medical Education</orgName>
								<orgName type="institution">University of Washington Seattle</orgName>
								<address>
									<postCode>98195</postCode>
									<settlement>Washington</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">An Analysis of Differences In Biological Pathway Resources</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">347D716F360271A4C0BE3651A391F03C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>pathway database</term>
					<term>knowledge representation</term>
					<term>resource comparison</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Integrating content from multiple biological pathway resources is necessary to fully exploit pathway knowledge for the benefit of biology and medicine. Differences in content, representation, coverage, and more occur between databases, and are challenges to resource merging. We introduce a typology of representational differences between pathway resources and give examples using several databases: BioCyc, KEGG, PANTHER pathways, and Reactome. We also detect and quantify annotation mismatches between HumanCyc and Reactome. The typology of mismatches can be used to guide entity and relationship alignment between these databases, helping us identify and understand deficiencies in our knowledge, and allowing the research community to derive greater benefit from the existing pathway data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Describing and studying biological pathways is necessary for understanding biological and disease processes. Biological functions and processes follow from complex networks of interactions among gene products and molecules. Through the study of pathways of known biochemical reactions, we can gain deeper insights into these interactions. Many of these relationships and reactions have been catalogued in pathway resources such as Reactome, BioCyc, KEGG, and others <ref type="bibr" target="#b0">[1]</ref><ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref><ref type="bibr" target="#b4">[5]</ref>.</p><p>As of April 2016, PathGuide, a pathway resource aggregator, lists 547 pathway resources <ref type="bibr" target="#b5">[6]</ref>, each providing specialized knowledge in niche areas of biology. Efforts have been made to integrate some of these databases. PathwayCommons catalogs human pathway resources under a unified biological pathway exchange umbrella (BioPAX), allowing easier querying of pathways across 22 different resources <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. Tools such as Consensus Pathway DB <ref type="bibr" target="#b8">[9]</ref> and hiPATHDB <ref type="bibr" target="#b9">[10]</ref> offer querying and visualization of pathways from multiple databases. Statistical frameworks like R Spider seek to proba-This work was supported by the National Library of Medicine (NLM) Training Grant T15LM007442.</p><p>bilistically combine protein interactions from various pathway databases into merged networks <ref type="bibr" target="#b10">[11]</ref>. These tools improve querying of multiple resource, and pave the way towards more comprehensive network models of human biological processes. Some work has also been done in inter-resource comparison, quantifying the overlap between different databases <ref type="bibr" target="#b11">[12]</ref><ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref>. These comparison studies emphasize differences in entity membership in pathways and differing counts of unique entities and pathways, but do not focus on cross-resource entity alignment. Existing tools for entity normalization of proteins <ref type="bibr" target="#b15">[16]</ref> and metabolites <ref type="bibr" target="#b16">[17]</ref> may provide a starting point for alignment. Other studies emphasize aligning metabolic pathways of different species in order to find analogous but missing relationships <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref>, merging resources for combined network analysis <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b19">20]</ref>, or defining conserved pathway elements across existing pathway resources <ref type="bibr" target="#b20">[21]</ref>. However, although they represent progress, the tools and studies mentioned above accomplish goals that do not include aligning representations across resources.</p><p>Given the number and uniqueness of pathway resources, inter-resource merging is a challenge. In order to successfully align and integrate the content of multiple knowledge bases, we must contend with variability in content correctness, standards usage, knowledge representation choices, and coverage. Pathway data sharing standards such as BioPAX, SBML, and PSI MI <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23]</ref> assist in the interchange of pathway resources, but even resources available in the same standard still retain differences in content and representation. Nonetheless, our goal is to align knowledge, so that users can benefit from a semantic union across multiple resources.</p><p>To align resources, we must comprehensively understand the types of differences one may encounter. Stobbe et al. have made an excellent start in this direction, providing numerous examples and descriptions of the sorts of differences among metabolic pathway resources <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b23">24]</ref>. Here, we extend this work, aiming at a typology We also present some results in quantifying annotation mismatches between two popular human pathway resources: HumanCyc and Reactome. Results demonstrate the pervasiveness of representational differences and suggest further work towards consensus pathway representations. Understanding the types of mismatches that exist between resources is a first step towards expanding and deriving the full benefit of our pathway knowledge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. MISMATCHES IN PATHWAY RESOURCES: A TYPOLOGY</head><p>To provide examples of mismatches, we retrieved reaction representations from HumanCyc, KEGG, PAN-THER, and Reactome. Fig. <ref type="figure" target="#fig_0">1</ref> shows several different representations of a step of glycolysis in Homo sapiens: the conversion of phosphoenolpyruvate and ADP to pyruvate and ATP modulated by the enzyme pyruvate kinase. In this single, well-studied biochemical reaction, we see a variety of important mismatches, of which a subset are described below. *</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Annotation</head><p>We first consider annotation mismatches on the participating physical entities. Inconsistencies arise when two pathway resources refer to the same entity with different identifiers or different names. Pyruvate is represented by all four resources (Fig. <ref type="figure" target="#fig_0">1</ref>), but is annotated with two identifiers, ChEBI:15361 (HumanCyc and PANTHER) and ChEBI:32816 (KEGG and Reactome). The ChEBI:15361 entity "pyruvate" and ChEBI:32816 entity "pyruvic acid" are conjugate acids and bases of one another in ChEBI. The display name for pyruvate also differs between resources, and is given as "pyruvate" (HumanCyc), "Pyruvate" (KEGG, PANTHER), or "PYR" (Reactome). Differences in identifiers and names are also seen for all other participants in this reaction.</p><p>In order to resolve these mismatches, we must either enforce consistent labeling of entities across resources, or somehow infer alignment of similar but differently annotated entities across resources. The former strategy is usually impractical; in this case, we can infer similarity by treating ChEBI identifiers that refer to conjugate acid/base pairs as synonyms.</p><p>A second type of annotation mismatch occurs when entities lack cross-referenced identifiers, e.g., no identifiers are given for ADP or ATP in PANTHER pathways. Other features such as string name, entity relationships, and local network topology can be used to align entities between resources when identifiers are insufficient.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Existence</head><p>Existence refers to missing or extraneous physical entities, reactions, relationships, or information, e.g., entities that participate in a reaction or reactions that are members of a pathway in one resource but not another, or a connection between two reactions that occurs in one resource but not another. In Reactome, for example, the conversion of fructose 6-phosphate to fructose 2,6biphosphate is a reaction in the glycolysis pathway. This reaction is not included in the glycolysis pathway of the other three resources. Although the reaction involves entities that participate in glycolysis, there is uncertainty in whether it is important to the overall process.</p><p>Another example of an existence mismatch is the inclusion of H+ in the conversion of phosphoenolpyruvate to pyruvate in HumanCyc (Fig. <ref type="figure" target="#fig_0">1</ref>). The ion is included in order to balance reaction charge, but according to BioPAX3 documentation, reaction participants should be neutral and ions such as H+ and Mg2+ are not recommended for inclusion <ref type="bibr" target="#b24">[25]</ref>. Other potential existence mismatches could occur if one resource lacks or is missing relevant information about a relationship between two entities, or one resource specifically negates the existence of a relationship asserted in another resource.</p><p>Existence mismatches can be resolved by taking the most common representation between many resources (democratic) or by integrating all possible representations (exhaustive). Although an exhaustive consensus method is unlikely to leave out information, it may, however, produce a large and unwieldy alignment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Reaction semantics</head><p>Many differences in reaction representation have been described in Stobbe et al, such as using the terms left and right, product and substrate, and input and output to describe participants in reactions <ref type="bibr" target="#b23">[24]</ref>. In BioPAX, the properties conversionDirection, stepDirection, left, and right are used to indicate reaction direction, as well as the identities of reactants and products <ref type="bibr" target="#b24">[25]</ref>. In KEGG, PANTHER, and Reactome, phosphoenolpyruvate is labeled left and pyruvate right, with a reaction direction of left-to-right. However, in HumanCyc, phosphoenolpyruvate is labeled right and pyruvate left and the reaction direction is right-to-left, a choice dictated by the Enzyme Commission system <ref type="bibr" target="#b1">[2]</ref>. Even though HumanCyc is in the minority, its choice follows recommendations from the BioPAX3 specifications <ref type="bibr" target="#b24">[25]</ref>.</p><p>Resolving this type of semantic mismatch between resources requires knowledge about the ordering of reactions, which can be derived from pathway design, or when reactions are taken out of context, may depend on chemical kinetics and the reacting environment. For well-studied pathways, a consensus ordering usually exists. When participant left and right labels differ between resources and ordering is unclear, the BioPAX pathway-Order object (designed to relay reaction topology) can sometimes be used along with reaction direction to infer the correct sequence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Granularity</head><p>Mismatches of granularity occur when resources represent the same entity or process in different degrees of detail. One example is complex naming. Many reaction enzymes are complexes made up of multiple protein subunits. A reaction may be annotated with a protein modifier, when in actuality, it is catalyzed by a complex: a dimer, trimer etc. In Fig. <ref type="figure" target="#fig_0">1</ref>, Reactome makes this distinction by annotating to the "pyruvate kinase tetramer," a complex composed of the pyruvate kinase protein referenced from the other three resources. Due to the lack of standardized complex naming, however, we often cannot easily align complexes and proteins between resources.</p><p>Another type of granularity mismatch occurs at the reaction level. For example, one resource may choose to represent the elementary steps of a reaction, including intermediate chemical species. A single reaction in one resource may be represented as several in another, with the same ultimate inputs and outputs. For example, the oxidative decarboxylation of isocitrate is a two step process, modified by the enzyme isocitrate dehydrogenase, producing α-ketoglutarate from isocitrate via an oxalosuccinate intermediate. The reaction can be represented both with and without the intermediate species, as in Fig. <ref type="figure" target="#fig_1">2</ref>. In these cases, we can study the ultimate inputs and outputs of ordered reaction sequences to determine the appropriate reaction-level alignment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. ANNOTATION DIFFERENCES BETWEEN TWO</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RESOURCES</head><p>We identify and enumerate mismatches in entity annotation between two exemplar resources: HumanCyc and Reactome. Compared to other mismatches, a disagreement in the annotation of entities could be viewed as primary: if two resources disagree on physical entities, then they are also likely to disagree on the reactions and pathways in which these entities participate.</p><p>The most confident match between entities in two resources arises when both identifiers and names match. For example, the molecule ATP matches both on name and ChEBI identifier for KEGG and Reactome (Fig. <ref type="figure" target="#fig_0">1</ref>). Less confident are identifier matches without string name matches (e.g. HumanCyc and KEGG use different names for the entity cross-referenced to UniProt:P30613), and string name matches without identifier matches (e.g. HumanCyc and PANTHER cross-reference to different ChEBI identifiers for the entity named "phosphoenolpyruvate").</p><p>From HumanCyc and Reactome, we extract all proteins and small molecules with cross-referenced identifiers (UniProt for proteins, ChEBI for small molecules) and names. String names are taken as all objects of the BioPAX properties name, displayName, and stan-dardName on the entity of interest. Using only string names and UniProt/ChEBI identifiers, there are four possible ways that entities can match between these two resources. Entities in HumanCyc can match to entities in Reactome on ID and name (+I/+N), ID but not name (+I/-N), name but not ID (-I/+N), and on neither ID nor name (-I/-N). For this initial analysis, we define string name matches as case-insensitive equivalence, so small differences in spelling do not produce a match.</p><p>For each entity in HumanCyc, SPARQL queries are used to determine whether a matching entity exists in Reactome, and similarly, Reactome entities are matched   <ref type="table" target="#tab_0">I</ref>. Out of 2737 unique HumanCyc proteins, 2078 (75.9%) match to Reactome entities using identifiers and/or string names. Out of 16949 unique Reactome proteins, 2973 (17.5%) match to a HumanCyc protein on identifiers and/or name. Reactome references many protein isoforms, causing the large imbalance in unique protein counts between the two resources. These match ratios are illustrated in Fig. <ref type="figure" target="#fig_2">3</ref>.</p><p>Table <ref type="table" target="#tab_1">II</ref> shows matches for small molecules. In Hu-manCyc, 866 (53.8%) out of 1610 small molecules match on annotation to an entity in Reactome. In Reactome, 1591 (55.0%) out of 2891 small molecules match to entities in HumanCyc, with a large proportion (890 out of 1591) matching on string names only.</p><p>Cross-referenced identifiers are the gold standard of matching between two resources. Therefore, groups +I/+N and +I/-N likely consist of true matches. Group -I/+N can be used to learn about representational differences. Some of the cross-references for entities in this group point to secondary accession identifiers, which redirect to other identifiers in the same database. For example, UniProt:A0AVP9 redirects to UniProt:Q8IWU4, the entity Zinc transporter 8. For small molecules only, we also find annotation to ChEBI conjugate acids or bases (e.g., HumanCyc annotates with ChEBI:456216 (ATP(3-)), a conjugate base of ChEBI:16761 (ATP), which is used in Reactome), or annotation to tautomers (e.g., ChEBI: 16828 and ChEBI:57912 for L-tryptophan and the L-tryptophan zwitterion respectively). Annotation mismatches of the above subtypes are detected by querying the UniProt or ChEBI APIs using the BioServices 1.4.8 Python package <ref type="bibr" target="#b25">[26]</ref>.</p><p>Within -I/+N matches, the 55 HumanCyc and 88 Reactome proteins had 208 pairwise string name matches. Of these, 28 pairs had cross-referenced identifiers that are UniProt secondary accession IDs, indicating that they likely refer to the same entity. We could not confirm the identities of the other 180 pairs through UniProt accession identifiers. For small molecules in the -I/+N group, the 479 HumanCyc and 890 Reactome molecules had 1869 pairwise string name matches. Of these, at least 1506 pairs referred to similar entities. Annotation to ChEBI conjugate acids or bases accounted for the majority of these (1122), followed by annotation with ChEBI tautomer IDs (301), and ChEBI secondary accession numbers (83).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. DISCUSSION</head><p>In order to reduce redundancy and errors when merging information from different knowledge bases, we must correctly align entities and other assertions between resources. Entity alignment is a necessary first step before we can clarify higher-order concepts such as complexes, reactions, and pathways. As demonstrated using HumanCyc and Reactome, many proteins and small molecules can be matched between two resources using annotation features such as cross-referenced identifiers and string names. Among entities that share only string names, many have related identifiers that can be matched computationally. Related identifiers can be used to help improve the accuracy of annotations.</p><p>Moving beyond annotation, other issues of semantics and granularity come into play. For future work, we intend to incorporate other features, such as entity relationships and graph properties like degree and bipartite connectivity to assist in entity alignment.</p><p>Several limitations exist in this work. First, we only compared entities between two pathway resources, Hu-manCyc and Reactome. We expect to expand our analysis to include other resources as well. Although some of our current methods rely on BioPAX, our general ideas about physical entities and their annotations can be applied to data represented using other biological pathway knowledge standards.</p><p>Another limitation arises in the way we identify annotation mismatches. We only assessed proteins and small molecules with UniProt or ChEBI identifiers, excluding those entities without cross-references or with crossreferences to other databases. This was partially for simplicity and partially to limit the size of the comparison problem. For example, an agreement on one set of identifiers and a disagreement on another yields yet another class of mismatches.</p><p>Lastly, we were limited by our use of 100% string name matching to identify potential matched entities. By doing so, we limit our ability to detect positive matches and yield more conservative results, e.g., "fructose 1,6 bisphosphate" does not match to "D-fructose 1,6-bisphosphate"; the second is a stereoisomer of the first (generic) molecule, and they may play similar roles in reactions. Fuzzy string matches may perform better. However, we want to minimize the false positive rate, e.g., "fructose 1,6-bisphosphate" and "fructose 2,6bisphosphate" only differ by one character but refer to different molecules. With these caveats, the typology we present affords an opportunity to test different algorithms for the systematic alignment of pathway resources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. CONCLUSION</head><p>The complexity of pathway content is a barrier to resource integration, but as described above, we are also challenged by representational and content differences. Standards like BioPAX help clarify some differences between resources, but they do not solve all problems of interoperability. In order to draw from the spectrum of knowledge we have built as a community, the content of these resources must be aligned and integrated into something greater than the parts. Doing so involves identifying the differences between resources, and resolving those differences to understand shared meaning. Our results show that a sizable portion of physical entities can be aligned between pathway resources using existing cross-referenced identifiers and string names. However, annotation features alone are likely insufficient for matching a majority of entities between resources. Knowledge of entity relationships, reaction semantics, granularity, and more about these resources is necessary to create and evaluate potential alignments. Much of the work can be done computationally, and the typology above should guide the engineering of future matching algorithms.</p><p>To align and integrate knowledge across resources, the research community must have strategies for resolving these different sorts of mismatches. Some mismatches, such as those of annotation, can largely be resolved using the existing data. Other issues of semantics, such as differences in how standard languages are used to express the same knowledge, pose a bigger challenge. Resource developers should be allowed to make different choices in knowledge representation. However, this flexibility should not come at the cost of increased error or decreased interoperability. A better understanding of how specific mismatches occur will provide an incentive for resources to work toward interoperable data and representations.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: The conversion of phosphoenolpyruvate and ADP into pyruvate and ATP assisted by the enzyme pyruvate kinase, as represented by HumanCyc, KEGG, PANTHER, and Reactome. The display name for each entity is given, along with ChEBI or UniProt identifiers where available. Entities related to the reaction by the BioPAX left property are red, and entities related by the BioPAX right property are green.</figDesc><graphic coords="2,72.00,72.00,468.01,247.49" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 :</head><label>2</label><figDesc>Fig.2:The oxidative decarboxylation of isocitrate can be represented as a two-step process with an oxalosuccinate intermediary (left) and as a one-step process (right).</figDesc><graphic coords="3,323.38,72.00,205.22,112.61" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 :</head><label>3</label><figDesc>Fig. 3: Protein (top) and small molecule (bottom) matches between Hu-manCyc and Reactome based on annotation, consisting of unmatched HumanCyc entities (red), unmatched Reactome entities (blue), and matched entities between the two resources (purple).</figDesc><graphic coords="5,72.00,72.00,228.02,135.26" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>TABLE I :</head><label>I</label><figDesc>Proteins matches between HumanCyc and Reactome on UniProt identifiers and names</figDesc><table><row><cell cols="4">HumanCyc protein matches to Reactome</cell></row><row><cell></cell><cell>+N</cell><cell>-N</cell><cell>Total</cell></row><row><cell>+I</cell><cell cols="3">1264 759 2023</cell></row><row><cell>-I</cell><cell>55</cell><cell>659</cell><cell>714</cell></row><row><cell cols="4">Total 1319 1418 2737</cell></row><row><cell cols="4">Reactome protein matches to HumanCyc</cell></row><row><cell></cell><cell>+N</cell><cell>-N</cell><cell>Total</cell></row><row><cell>+I</cell><cell cols="2">1495 1390</cell><cell>2885</cell></row><row><cell>-I</cell><cell>88</cell><cell cols="2">13976 14064</cell></row><row><cell cols="4">Total 1583 15366 16949</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>TABLE II :</head><label>II</label><figDesc>Small molecule matches between HumanCyc and Reactome on ChEBI identifiers and names</figDesc><table><row><cell cols="4">HumanCyc small molecule matches to Reactome</cell></row><row><cell></cell><cell>+N</cell><cell cols="2">-N Total</cell></row><row><cell>+I</cell><cell cols="3">247 140 387</cell></row><row><cell>-I</cell><cell cols="3">479 744 1223</cell></row><row><cell cols="4">Total 726 884 1610</cell></row><row><cell cols="4">Reactome small molecule matches to HumanCyc</cell></row><row><cell></cell><cell>+N</cell><cell>-N</cell><cell>Total</cell></row><row><cell>+I</cell><cell>425</cell><cell>276</cell><cell>701</cell></row><row><cell>-I</cell><cell cols="3">890 1300 2190</cell></row><row><cell cols="4">Total 1315 1576 2891</cell></row><row><cell cols="4">to HumanCyc entities. Resulting matches for proteins</cell></row><row><cell>are given in Table</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGEMENTS</head><p>The authors thank Peter Karp for helpful comments on an early draft of this paper.</p></div>
			</div>


			<div type="availability">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>* Pathways were retrieved from Reactome v55 (http://reactome. org) and HumanCyc v19.5 (http://humancyc.org) BioPAX3 exports and through PathwayCommons v7 <ref type="bibr" target="#b6">[7]</ref>. Glycolysis pathways for KEGG and PANTHER are located at http://purl. org/pc2/7/#Pathway 307add3cea6530288cc1016267ec055b and http: //identifiers.org/panther.pathway/P00024 respectively and are supplemented by the pathway diagrams at http://kegg.jp and http:// pantherdb.org.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The reactome pathway knowledgebase</title>
		<author>
			<persName><forename type="first">D</forename><surname>Croft</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mundo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Haw</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="page" from="D472" to="477" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Computational prediction of human metabolic pathways from the complete human genome</title>
		<author>
			<persName><forename type="first">P</forename><surname>Romero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wagg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Green</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krummenacker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Karp</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome Biology</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">R2</biblScope>
			<biblScope unit="page" from="1" to="17" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Kegg: Kyoto encyclopedia of genes and genomes</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kanehisa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Goto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="27" to="30" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Panther: a library of protein families and subfamilies indexed by function</title>
		<author>
			<persName><forename type="first">P</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Campbell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome Res</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="2129" to="2141" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Wikipathways: capturing the full diversity of pathway knowledge</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kutmon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Riutta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nunes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="issue">D1</biblScope>
			<biblScope unit="page" from="D488" to="D494" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Pathguide: a pathway resource list</title>
		<author>
			<persName><forename type="first">G</forename><surname>Bader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sander</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="D504" to="506" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Pathway commons, a web resource for biological pathway data</title>
		<author>
			<persName><forename type="first">E</forename><surname>Cerami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Demir</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="D685" to="690" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The biopax community standard for pathway data sharing</title>
		<author>
			<persName><forename type="first">E</forename><surname>Demir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Paley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Biotechnology</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="935" to="942" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Consensuspathdb -a database for integrating human functional interaction networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kamburov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wierling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lehrach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Herwig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="D623" to="628" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">hipathdb: a human-integrated pathway database with facile visualization</title>
		<author>
			<persName><forename type="first">N</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Seo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="D797" to="802" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">R spider: a network-based analysis of gene lists by combining signaling and metabolic pathways from reactome and kegg databases</title>
		<author>
			<persName><forename type="first">A</forename><surname>Antonov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dietmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krestyaninova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hermjakob</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Res</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="W78" to="83" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note>Web Server issue</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Consistency, comprehensiveness, and compatibility of pathway databases</title>
		<author>
			<persName><forename type="first">D</forename><surname>Soh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="449" to="464" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Critical assessment of human metabolic pathway databases: a stepping stone for future integration</title>
		<author>
			<persName><forename type="first">M</forename><surname>Stobbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Houten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Jansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Van Kampen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Moerland</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Systems Biology</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="165" to="183" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A systematic comparison of the metacyc and kegg pathway databases</title>
		<author>
			<persName><forename type="first">T</forename><surname>Altman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Travers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kothari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Caspi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Karp</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page">112</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Comparison of human cell signaling pathway databases -evolution, drawbacks and challenges</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chowdhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sarkar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<biblScope unit="page">126</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Integrating various resources for gene name normalization</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLoS One</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page">e43558</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">The chemical translation service -a web-based tool to improve standardization of metabolomic reports</title>
		<author>
			<persName><forename type="first">G</forename><surname>Wholgemuth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Haldiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Willighagen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kind</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Fiehn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">20</biblScope>
			<biblScope unit="page" from="2647" to="2648" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Submap: aligning metabolic pathways with subnetwork mappings</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kahveci</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computational Biology</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="219" to="235" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Mp-align: alignment of metabolic pathways</title>
		<author>
			<persName><forename type="first">R</forename><surname>Alberich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Llabrés</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Simeoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tuduri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Systems Biology</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">58</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Using random walks to identify cancer-associated modules in expression data</title>
		<author>
			<persName><forename type="first">D</forename><surname>Petrochilos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shojaie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gennari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Abernethy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BioData Mining</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">17</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Unipathway: a resource for the exploration and annotation of metabolic pathways</title>
		<author>
			<persName><forename type="first">A</forename><surname>Morgat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Coissac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Coudert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Axelsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Keller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bairoch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="D761" to="769" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
	<note>Database issue</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hucka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Finney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sauro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="524" to="531" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The hupo psi&apos;s molecular interaction format -a community standard for the representation of protein interaction data</title>
		<author>
			<persName><forename type="first">H</forename><surname>Hermjakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Montecchi-Palazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bader</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Biotechnology</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="177" to="183" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Knowledge representation in metabolic pathway databases</title>
		<author>
			<persName><forename type="first">M</forename><surname>Stobbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Jansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Moerland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Van Kampen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Brief Bioinform</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="455" to="470" />
			<date type="published" when="2014-05">2014 May</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m">Biopax -biological pathways exchange language, level 3</title>
				<imprint>
			<date type="published" when="2010-07">July 2010</date>
		</imprint>
	</monogr>
	<note>release version 1 documentation</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Bioservices: a common python package to access biological web services programmatically</title>
		<author>
			<persName><forename type="first">T</forename><surname>Cokelaer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pultz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Harder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Serra-Musach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Saez-Rodriguez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">24</biblScope>
			<biblScope unit="page" from="3241" to="3242" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
