<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Protein Ontology Development using OWL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amandeep S. Sidhu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tharam S. Dillon</string-name>
          <email>tharam@it.uts.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elizabeth Chang</string-name>
          <email>Elizabeth.Chang@cbs.curtin.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Baldev S. Sidhu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Technology, University of Technology</institution>
          ,
          <addr-line>Sydney</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information Systems, Curtin University of Technical University</institution>
          ,
          <addr-line>Perth</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>State Council of Education Research and Training</institution>
          ,
          <addr-line>Punjab</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>To efficiently represent the protein annotation framework and to integrate all the existing data representations into a standardized protein data specification for the bioinformatics community, the protein ontology need to be represented in a format that not enforce semantic constraints on protein data, but can also facilitate reasoning tasks on protein data using semantic query algebra. This motivates the representation of Protein Ontology (PO) Model in Web Ontology Language (OWL). In this paper we briefly discuss the usage of OWL in achieving the objectives of Protein Ontology Project. We provide a brief overview of Protein Ontology (PO) to start with. In the later sections discuss why OWL was an ideal choice for PO Development.</p>
      </abstract>
      <kwd-group>
        <kwd>Protein Ontology</kwd>
        <kwd>Biomedical Ontologies</kwd>
        <kwd>OWL based Protein Ontology</kwd>
        <kwd>Protégé</kwd>
        <kwd>OWL</kwd>
        <kwd>Proteomics</kwd>
        <kwd>Data Integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Background</title>
      <p>Traditional approaches to integrate protein data generally involved keyword
searches, which immediately excludes unannotated or poorly annotated data. It also
excludes proteins annotated with synonyms unknown to the user. Of the protein data
that is retrieved in this manner, some biological resources do not record information
about the data source, so there is no evidence of the annotation. An alternative protein
annotation approach is to rely on sequence identity, or structural similarity, or
functional identification. The success of this method is dependent on the family the
protein belongs to. Some proteins have high degree of sequence identity, or structural
similarity, or similarity in functions that are unique to members of that family alone.
Consequently, this approach can’t be generalized to integrate the protein data.
Clearly, these traditional approaches have limitations in capturing and integrating data
for Protein Annotation. For these reasons, we have adopted an alternative method that
does not rely on keywords or similarity metrics, but instead uses ontology. Briefly,
Ontology is a means of formalizing knowledge; at the minimum ontology must
include concepts or terms relevant to the domain, definitions of concepts, and defined
relationships between the concepts. Ontology for Protein Domain must contain terms
or concepts relevant to protein synthesis, describing Protein Sequence, Structure and
Function and relationships between them. Protein Ontology (PO) provides clear and
unambiguous definitions of all major biological concepts of protein synthesis process
and relationship between them using OWL. The use OWL in PO provides a unified
controlled vocabulary both for annotation data types and for annotation data. We have
built PO [Sidhu et al., 2006, Sidhu et al., 2005a, Sidhu et al., 2005b, Sidhu et al.,
2005c, Sidhu et al., 2004a, Sidhu et al., 2004b, and Sidhu et al., 2004c] to integrate
protein data formats and provide a structured and unified vocabulary to represent
protein synthesis concepts. PO also helps to codify proteomics data for analysis by
researchers. The Complete Class Hierarchy of Protein Ontology (PO) is shown in
Figure 1. More detailed UML Diagrams for PO are available at the website:
http://www.proteinontology.info/</p>
      <p>A XML Database of 10 Major Prion Proteins available in various Protein data
sources, based on the vocabulary provided by Protein Ontology is available on the PO
website. Soon we will have all the 57 Prion Proteins known to exist, and user
interfaces to browse and query the database. The XML database currently contains 24
tables, 261 attributes and 17550 instances. Prion Protein is a membrane bound protein
of 253 amino acid residues in length that is normally found in neurons and several
other cell types. The abnormal Prion Protein is resistant to digestion with enzymes
that breaks down normal proteins, and accumulates in the brain. Abnormal Prion
Proteins are the major cause of various Human Prion Diseases in Brain like Fatal
Familial Insomnia. Recently, discovery of Interesting Properties of Prion Proteins
encouraged Scientists to understand Prion Proteins for finding cure to various Human
Brain Diseases. Building a XML Data Source based on PO will assist in discovery
process.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Protein Ontology and OWL</title>
      <p>As technologies mature, the shift from single annotation databases being queried
by web-based scripts generating HTML pages to annotation repositories capable of
exporting selected data in XML format, either to be further analysed by remote
applications, or to undergo a transformation stage to be presented to user in a web
browser – will undoubtedly be one of the major evolutions of protein annotation
process. XML is a markup language much like HTML, but XML describes data
using hierarchy. An XML document uses the schema to describe data and is designed
to be self descriptive. This allows easy and powerful manipulation of data in XML
documents. XML provides syntax for structured documents, but imposes no semantic
constraints on the meaning of these documents.</p>
      <p>Resource Description Framework (RDF) is a data model for objects or resources
and relations between them, provides a simple semantics for this data model, and
these data models can be represented in XML syntax. RDF Schema is a vocabulary
for describing properties and classes of RDF resources, with a semantics for
generalization-hierarchies of such properties and classes.</p>
      <p>To efficiently represent the protein annotation framework and to integrate all the
existing data representations into a standardized protein data specification for the
bioinformatics community, the protein ontology need to be represented in a format
that not enforce semantic constraints on protein data, but can also facilitate reasoning
tasks on protein data using semantic query algebra. This motivates the representation
of Protein Ontology (PO) Model in Web Ontology Language (OWL). OWL facilitates
greater machine interpretability of Web content than that supported by XML, RDF,
and RDF Schema by providing additional vocabulary along with a formal semantics.
Knowledge captured from protein data using OWL is classified in a rich hierarchy of
concepts and their inter-relationships. OWL is compositional and dynamic, relying on
notions of classification, reasoning, consistency, retrieval and querying. We
investigated the use of OWL for making Protein Ontology (PO) using Protégé OWL
Plug-in.</p>
      <p>OWL allows us to write explicit, formal concepts of describing protein data. Use of
OWL to define formal protein data concepts provides: (1) well-defined syntax; (2)
semantics, which is already present in protein data; (3) convenience of expression of
integrated protein data using query algebra. Well-defined and structured syntax of
protein ontology is necessary for machine processing and mining of protein data.
Formal semantics describes the meaning of knowledge in protein data precisely. One
of the uses of formal semantics is to allow people to reason about knowledge of
protein domain. For the case of Protein Ontology, we may reason about:
• Class membership. If M is an instance of class Molecule, and Molecule is
a subclass of Entry, then we can infer that M is an instance of Entry.
• Equivalence of classes. If Class HelixStructure is equivalent to class
TurnStructure, and class TurnStructure is equivalent to class
OtherFoldsStructure, then HelixStructure is equivalent to
OtherFoldsStructure too.
• Classification. If we have declared that certain property-value pairs for
Residue class should satisfy the condition that Residue should be a
3letter word for membership of Residue Class, then if an individual R
satisfies such a condition, we can conclude R is instance of Residue Class.</p>
    </sec>
    <sec id="sec-3">
      <title>3. PO Benefits and Limitations</title>
    </sec>
    <sec id="sec-4">
      <title>4. Concluding Remarks</title>
      <p>The overall objective of Protein Ontology (PO) Project is: “To correlate information
about multiprotein machines with data in major protein databases to better understand
sequence, structure and function of protein machines.” OWL provides a language for
capturing declarative knowledge about protein domain and a classifier that allows
reasoning about protein data. Knowledge captured from protein data using OWL is
classified in a rich hierarchy of concepts and their inter-relationships. We investigated
the use of OWL for making Protein Ontology (PO) using Protégé OWL Plug-in.
OWL is flexible and powerful enough to capture and classify biological concepts of
proteins in a consistent and principled fashion. OWL is used to construct Protein
Ontology (PO) that can be used for making inferences from proteomics data using
defined semantic query algebra.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Altman et al. 1999] Altmann,
          <string-name>
            <given-names>R. B.</given-names>
            ,
            <surname>M. Bada</surname>
          </string-name>
          , et al. (
          <year>1999</year>
          ).
          <article-title>"RiboWeb: An Ontology-Based System for Collaborative Molecular Biology." IEEE Intelligent Systems (SEPTEMBER/OCTOBER</article-title>
          <year>1999</year>
          ):
          <fpage>68</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[GO</source>
          <year>2001</year>
          ]
          <article-title>GO</article-title>
          . (
          <year>2001</year>
          ).
          <article-title>"Creating the Gene Ontology Resource: Design and Implementation."</article-title>
          <source>Genome Research</source>
          <volume>11</volume>
          :
          <fpage>1425</fpage>
          -
          <lpage>1433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Sidhu et al.,
          <year>2006</year>
          ] Sidhu,
          <string-name>
            <surname>A. S.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Dillon</surname>
          </string-name>
          , et al. (
          <year>2006</year>
          ).
          <article-title>Protein Ontology Project: 2006 Updates (Invited Paper)</article-title>
          .
          <source>Data Mining and Information Engineering</source>
          <year>2006</year>
          . A.
          <string-name>
            <surname>Zanasi</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Sidhu et al., 2005a]
          <string-name>
            <surname>Sidhu</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Dillon</surname>
          </string-name>
          , et al. (
          <year>2005</year>
          ).
          <article-title>Ontological Foundation for Protein Data Models</article-title>
          .
          <source>First IFIP WG 2.12 &amp; WG 12</source>
          .4 International Workshop on Web Semantics (
          <article-title>SWWS 2005), In conjunction with On The Move Federated Conferences (OTM</article-title>
          <year>2005</year>
          ). Agia Napa, Cyprus,
          <source>Springer-Verlag. Lecture Notes in Computer Science (LNCS).</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Sidhu et al., 2005b]
          <string-name>
            <surname>Sidhu</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Dillon</surname>
          </string-name>
          , et al. (
          <year>2005</year>
          ).
          <article-title>Protein Ontology: Semantic Data Integration in Proteomics</article-title>
          . 4th International Joint Conference of InCoB, AASBi and
          <string-name>
            <surname>KSBI (BIOINFO2005). Busan</surname>
          </string-name>
          , Korea.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Sidhu et al., 2005c]
          <string-name>
            <surname>Sidhu</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Dillon</surname>
          </string-name>
          , et al. (
          <year>2005</year>
          ).
          <article-title>Protein Ontology: Vocabulary for Protein Data</article-title>
          .
          <source>3rd IEEE International Conference on Information Technology and Applications (IEEE ICITA</source>
          <year>2005</year>
          ). Sydney,
          <source>IEEE CS Press. Volume</source>
          <volume>1</volume>
          :
          <fpage>465</fpage>
          -
          <lpage>469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Sidhu et al., 2004a]
          <string-name>
            <surname>Sidhu</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Dillon</surname>
          </string-name>
          , et al. (
          <year>2004</year>
          ).
          <article-title>A Unified Representation of Protein Structure Databases (Book Section)</article-title>
          .
          <article-title>Biotechnological Approaches for Sustainable Development</article-title>
          .
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Reddy</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          . India, Allied Publishers Pty. Ltd., India:
          <fpage>396</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Sidhu et al., 2004b]
          <string-name>
            <surname>Sidhu</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Dillon</surname>
          </string-name>
          , et al. (
          <year>2004</year>
          ).
          <article-title>Making of Protein Ontology (Invited Paper)</article-title>
          .
          <source>2nd Australian and Medical Research Congress</source>
          <year>2004</year>
          .
          <string-name>
            <given-names>M.</given-names>
            <surname>Kavallaris</surname>
          </string-name>
          . Sydney, National Health and Medical Research Council, Australian Government:
          <fpage>150</fpage>
          -
          <lpage>151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Sidhu et al., 2004c]
          <string-name>
            <surname>Sidhu</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Dillon</surname>
          </string-name>
          , et al. (
          <year>2004</year>
          ).
          <article-title>Protein Knowledge Base: Making of Protein Ontology (Invited Paper)</article-title>
          .
          <source>HUPO 3rd Annual World Congress</source>
          <year>2004</year>
          . R. A.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Bradshaw</surname>
          </string-name>
          . Beijing, China.,
          <source>American Society for Biochemistry and Molecular Biology</source>
          . Vol.
          <volume>3</volume>
          , No. 10 Oct. (Sup.):
          <fpage>S262</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[UMLS</source>
          <year>1993</year>
          ]
          <article-title>McCray et al</article-title>
          . (
          <year>1993</year>
          ).
          <article-title>Representing biomedical knowledge in the UMLS semantic network</article-title>
          . In:
          <article-title>Broering NC, editor. High-performance medical libraries: advances in information management for the virtual era</article-title>
          .
          <source>Westport (CT): Meckler; 1993</source>
          . p.
          <fpage>31</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>