<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Protein Ontology Development using OWL</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Amandeep</forename><forename type="middle">S</forename><surname>Sidhu</surname></persName>
							<email>asidhu@it.uts.edu.au</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Information Technology</orgName>
								<orgName type="institution">University of Technology</orgName>
								<address>
									<settlement>Sydney</settlement>
									<country key="AU">Australia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tharam</forename><forename type="middle">S</forename><surname>Dillon</surname></persName>
							<email>tharam@it.uts.edu.au</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Information Technology</orgName>
								<orgName type="institution">University of Technology</orgName>
								<address>
									<settlement>Sydney</settlement>
									<country key="AU">Australia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Elizabeth</forename><surname>Chang</surname></persName>
							<email>elizabeth.chang@cbs.curtin.edu.au</email>
							<affiliation key="aff1">
								<orgName type="department">School of Information Systems</orgName>
								<orgName type="institution">Curtin University of Technical University</orgName>
								<address>
									<settlement>Perth</settlement>
									<country key="AU">Australia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Baldev</forename><forename type="middle">S</forename><surname>Sidhu</surname></persName>
							<email>bsidhu@biomap.org</email>
							<affiliation key="aff2">
								<orgName type="department">State Council of Education Research and Training</orgName>
								<address>
									<region>Punjab</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Protein Ontology Development using OWL</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">FBB2EF1238AAEFD98EF6E981D3BCF706</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T13:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Protein Ontology</term>
					<term>Biomedical Ontologies</term>
					<term>OWL based Protein Ontology</term>
					<term>Protégé</term>
					<term>OWL</term>
					<term>Proteomics</term>
					<term>Data Integration</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>To efficiently represent the protein annotation framework and to integrate all the existing data representations into a standardized protein data specification for the bioinformatics community, the protein ontology need to be represented in a format that not enforce semantic constraints on protein data, but can also facilitate reasoning tasks on protein data using semantic query algebra. This motivates the representation of Protein Ontology (PO) Model in Web Ontology Language (OWL). In this paper we briefly discuss the usage of OWL in achieving the objectives of Protein Ontology Project. We provide a brief overview of Protein Ontology (PO) to start with. In the later sections discuss why OWL was an ideal choice for PO Development.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Background</head><p>Traditional approaches to integrate protein data generally involved keyword searches, which immediately excludes unannotated or poorly annotated data. It also excludes proteins annotated with synonyms unknown to the user. Of the protein data that is retrieved in this manner, some biological resources do not record information about the data source, so there is no evidence of the annotation. An alternative protein annotation approach is to rely on sequence identity, or structural similarity, or functional identification. The success of this method is dependent on the family the protein belongs to. Some proteins have high degree of sequence identity, or structural similarity, or similarity in functions that are unique to members of that family alone. Consequently, this approach can't be generalized to integrate the protein data. Clearly, these traditional approaches have limitations in capturing and integrating data for Protein Annotation. For these reasons, we have adopted an alternative method that does not rely on keywords or similarity metrics, but instead uses ontology. Briefly, Ontology is a means of formalizing knowledge; at the minimum ontology must include concepts or terms relevant to the domain, definitions of concepts, and defined relationships between the concepts. Ontology for Protein Domain must contain terms or concepts relevant to protein synthesis, describing Protein Sequence, Structure and Function and relationships between them. Protein Ontology (PO) provides clear and unambiguous definitions of all major biological concepts of protein synthesis process and relationship between them using OWL. The use OWL in PO provides a unified controlled vocabulary both for annotation data types and for annotation data. We have built PO </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Protein Ontology and OWL</head><p>As technologies mature, the shift from single annotation databases being queried by web-based scripts generating HTML pages to annotation repositories capable of exporting selected data in XML format, either to be further analysed by remote applications, or to undergo a transformation stage to be presented to user in a web browser -will undoubtedly be one of the major evolutions of protein annotation process. XML is a markup language much like HTML, but XML describes data using hierarchy. An XML document uses the schema to describe data and is designed to be self descriptive. This allows easy and powerful manipulation of data in XML documents. XML provides syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.</p><p>Resource Description Framework (RDF) is a data model for objects or resources and relations between them, provides a simple semantics for this data model, and these data models can be represented in XML syntax. RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalization-hierarchies of such properties and classes.</p><p>To efficiently represent the protein annotation framework and to integrate all the existing data representations into a standardized protein data specification for the bioinformatics community, the protein ontology need to be represented in a format that not enforce semantic constraints on protein data, but can also facilitate reasoning tasks on protein data using semantic query algebra. This motivates the representation of Protein Ontology (PO) Model in Web Ontology Language (OWL). OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema by providing additional vocabulary along with a formal semantics. Knowledge captured from protein data using OWL is classified in a rich hierarchy of concepts and their inter-relationships. OWL is compositional and dynamic, relying on notions of classification, reasoning, consistency, retrieval and querying. We investigated the use of OWL for making Protein Ontology (PO) using Protégé OWL Plug-in.</p><p>OWL allows us to write explicit, formal concepts of describing protein data. Use of OWL to define formal protein data concepts provides: (1) well-defined syntax; (2) semantics, which is already present in protein data; (3) convenience of expression of integrated protein data using query algebra. Well-defined and structured syntax of protein ontology is necessary for machine processing and mining of protein data. Formal semantics describes the meaning of knowledge in protein data precisely. One of the uses of formal semantics is to allow people to reason about knowledge of protein domain. For the case of Protein Ontology, we may reason about:</p><p>• Class membership. If M is an instance of class Molecule, and Molecule is a subclass of Entry, then we can infer that M is an instance of Entry. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">PO Benefits and Limitations</head><p>Apart from classifying or organizing protein data and knowledge about proteins in a hierarchy, PO has following benefits: 1. Protein Ontology (PO) provides a unified vocabulary for capturing declarative knowledge about protein domain and to classify that knowledge. Information captured by PO is classified in a rich hierarchy of concepts and their inter-relationships. 2. In PO the notions classification, reasoning, and consistency are applied by defining new concepts or classes from defined generic concepts or classes.</p><p>The concepts derived from generic concepts are placed precisely into class hierarchy of Protein Ontology to completely represent information defining a protein complex. OWL fits to be used as development language for OWL as of following reasons:</p><p>1. As the OWL representation used in Protein Ontology is an XML-Abbrev based (Abbreviated XML Notation), it can be easily transformed to the corresponding RDF and XML formats without much effort using the available converters. 2. Most of the Other Biomedical Ontologies in Genetics and Molecular Biology are represented in OWL, such as: Gene Ontology (GO) <ref type="bibr">[GO 2001]</ref>,</p><p>RiboWEB <ref type="bibr" target="#b0">[Altman et al., 1999]</ref> and UMLS <ref type="bibr">[UMLS 1993</ref>]. We are constantly working to improve PO features. Here are some of the improvements that we are looking at on achieving by next year:</p><p>1. For Protein Functional Classification, in addition to presence of domains, motifs or functional residues, following factors are relevant: (a) similarity of three dimensional protein structures, (b) proximity to genes (may indicate that proteins they produce are involved in same pathway), (c) metabolic functions of organisms and (d) evolutionary history of the protein. At the moment PO's Functional Domain Classification does not address the issues of proximity of genes and evolutionary history of proteins. These factors will be added in future to complete the Functional Domain Classification System in PO. 2. The Constraints defined in PO are not mapped back to protein sequence, structure and function they affect. Achieving this in future will inter-link all the concepts of PO. 3. We are in process of defining semantic query algebra for PO to efficiently reason and query the underlying XML database. 4. We will soon provide secured user interfaces to browse, query, and add protein data instances in PO.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Concluding Remarks</head><p>The overall objective of Protein Ontology (PO) Project is: "To correlate information about multiprotein machines with data in major protein databases to better understand sequence, structure and function of protein machines." OWL provides a language for capturing declarative knowledge about protein domain and a classifier that allows reasoning about protein data. Knowledge captured from protein data using OWL is classified in a rich hierarchy of concepts and their inter-relationships. We investigated the use of OWL for making Protein Ontology (PO) using Protégé OWL Plug-in.</p><p>OWL is flexible and powerful enough to capture and classify biological concepts of proteins in a consistent and principled fashion. OWL is used to construct Protein Ontology (PO) that can be used for making inferences from proteomics data using defined semantic query algebra.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>[Sidhu et al., 2006, Sidhu et al., 2005a, Sidhu et al., 2005b, Sidhu et al., 2005c, Sidhu et al., 2004a, Sidhu et al., 2004b, and Sidhu et al., 2004c] to</head><label></label><figDesc>integrate protein data formats and provide a structured and unified vocabulary to represent protein synthesis concepts. PO also helps to codify proteomics data for analysis by researchers. The Complete Class Hierarchy of Protein Ontology (PO) is shown in Figure1. More detailed UML Diagrams for PO are available at the website:</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>http://www.proteinontology.info/ A</head><label></label><figDesc>XML Database of 10 Major Prion Proteins available in various Protein data sources, based on the vocabulary provided by Protein Ontology is available on the PO website. Soon we will have all the 57 Prion Proteins known to exist, and user interfaces to browse and query the database. The XML database currently contains 24 tables, 261 attributes and 17550 instances. Prion Protein is a membrane bound protein of 253 amino acid residues in length that is normally found in neurons and several other cell types. The abnormal Prion Protein is resistant to digestion with enzymes that breaks down normal proteins, and accumulates in the brain. Abnormal Prion Proteins are the major cause of various Human Prion Diseases in Brain like Fatal Familial Insomnia. Recently, discovery of Interesting Properties of Prion Proteins encouraged Scientists to understand Prion Proteins for finding cure to various Human Brain Diseases. Building a XML Data Source based on PO will assist in discovery process.</figDesc><table><row><cell>Protein</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Ontology Development using OWL 3 Figure 1: Class Hierarchy of Protein Ontology</head><label></label><figDesc></figDesc><table><row><cell>•</cell><cell>ProteinOntology</cell></row><row><cell cols="2">o AtomicBind 4 A. Sidhu et al.</cell></row><row><cell></cell><cell>o Atoms</cell></row><row><cell></cell><cell>o Bind</cell></row><row><cell></cell><cell>o Chains</cell></row><row><cell></cell><cell>o Family</cell></row><row><cell></cell><cell>o ProteinComplex</cell></row><row><cell></cell><cell cols="2">ChemicalBonds</cell></row><row><cell></cell><cell>•</cell><cell>CISPeptide</cell></row><row><cell></cell><cell>•</cell><cell>DisulphideBond</cell></row><row><cell></cell><cell>•</cell><cell>HydrogenBond</cell></row><row><cell></cell><cell>•</cell><cell>ResidueLink</cell></row><row><cell></cell><cell>•</cell><cell>SaltBridge</cell></row><row><cell></cell><cell cols="2">Constraints</cell></row><row><cell></cell><cell>•</cell><cell>GeneticDefects</cell></row><row><cell></cell><cell>•</cell><cell>Hydrophobicity</cell></row><row><cell></cell><cell>•</cell><cell>ModifiedResidue</cell></row><row><cell></cell><cell>Entry</cell></row><row><cell></cell><cell>•</cell><cell>Description</cell></row><row><cell></cell><cell>•</cell><cell>Molecule</cell></row><row><cell></cell><cell>•</cell><cell>Reference</cell></row><row><cell></cell><cell cols="2">FunctionalDomains</cell></row><row><cell></cell><cell>•</cell><cell>ActiveBindingSites</cell></row><row><cell></cell><cell>•</cell><cell>BiologicalFunction</cell></row><row><cell></cell><cell></cell><cell>o PathologicalFunctions</cell></row><row><cell></cell><cell></cell><cell>o PhysiologicalFunctions</cell></row><row><cell></cell><cell>•</cell><cell>SourceCell</cell></row><row><cell></cell><cell cols="2">StructuralDomains</cell></row><row><cell></cell><cell>•</cell><cell>Helices</cell></row><row><cell></cell><cell></cell><cell>o Helix</cell></row><row><cell></cell><cell></cell><cell>HelixStructure</cell></row><row><cell></cell><cell>•</cell><cell>OtherFolds</cell></row><row><cell></cell><cell></cell><cell>o Turn</cell></row><row><cell></cell><cell></cell><cell>TurnStructure</cell></row><row><cell></cell><cell>•</cell><cell>Sheets</cell></row><row><cell></cell><cell></cell><cell>o Sheet</cell></row><row><cell></cell><cell></cell><cell>Strands</cell></row><row><cell></cell><cell>•</cell><cell>Structure</cell></row><row><cell></cell><cell></cell><cell>o ATOMSequence</cell></row><row><cell></cell><cell></cell><cell>o UnitCell</cell></row><row><cell></cell><cell>o Residues</cell></row><row><cell></cell><cell>o SiteGroup</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"> A. Sidhu et al.   </note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_1"> A. Sidhu et al.   </note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">RiboWeb: An Ontology-Based System for Collaborative Molecular Biology</title>
		<author>
			<persName><surname>Altman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Intelligent Systems</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1425" to="1433" />
			<date type="published" when="1999">1999. 1999. SEPTEMBER. OCTOBER 1999. 2001</date>
		</imprint>
	</monogr>
	<note>Genome Research</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Ontological Foundation for Protein Data Models</title>
		<author>
			<persName><surname>Sidhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">First IFIP WG 2.12 &amp; WG 12.4 International Workshop on Web Semantics (SWWS 2005)</title>
		<title level="s">Lecture Notes in Computer Science (LNCS</title>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Sidhu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><forename type="middle">S</forename><surname>Dillon</surname></persName>
		</editor>
		<meeting><address><addrLine>Prague; Agia Napa, Cyprus</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2005">2006. 2006. 2006. 2005</date>
		</imprint>
	</monogr>
	<note>conjunction with On The Move Federated Conferences (OTM 2005)</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Protein Ontology: Semantic Data Integration in Proteomics</title>
		<author>
			<persName><surname>Sidhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">4th International Joint Conference of InCoB, AASBi and KSBI</title>
				<imprint>
			<date type="published" when="2005">2005. BIOINFO2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Protein Ontology: Vocabulary for Protein Data</title>
		<author>
			<persName><forename type="first">Korea</forename><surname>Busan</surname></persName>
		</author>
		<author>
			<persName><surname>Sidhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Information Technology and Applications (IEEE ICITA 2005</title>
				<meeting><address><addrLine>Sydney</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE CS Press</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="465" to="469" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A Unified Representation of Protein Structure Databases (Book Section)</title>
		<author>
			<persName><surname>Sidhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Biotechnological Approaches for Sustainable Development</title>
				<editor>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Sidhu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><forename type="middle">S</forename><surname>Dillon</surname></persName>
		</editor>
		<meeting><address><addrLine>Sydney</addrLine></address></meeting>
		<imprint>
			<publisher>National Health and Medical Research Council, Australian Government</publisher>
			<date type="published" when="2004">2004. 2004b. 2004. 2004</date>
			<biblScope unit="page" from="150" to="151" />
		</imprint>
	</monogr>
	<note>2nd Australian and Medical Research Congress</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Protein Knowledge Base: Making of Protein Ontology (Invited Paper)</title>
		<author>
			<persName><surname>Sidhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">High-performance medical libraries: advances in information management for the virtual era</title>
				<editor>
			<persName><forename type="first">N</forename><forename type="middle">C</forename><surname>Broering</surname></persName>
		</editor>
		<meeting><address><addrLine>Beijing, China; Westport (CT)</addrLine></address></meeting>
		<imprint>
			<publisher>Meckler</publisher>
			<date type="published" when="1993">2004. Oct. 1993. 1993</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="31" to="44" />
		</imprint>
	</monogr>
	<note>Representing biomedical knowledge in the UMLS semantic network</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
