<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using ontologies for querying and analysing protein-protein interaction data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Cannataro</string-name>
          <email>cannataro@unicz.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pietro Hiram Guzzi</string-name>
          <email>hguzzi@unicz.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bioinformatics Laboratory, Department of Experimental Medicine and Clinic University Magna Graecia</institution>
          ,
          <addr-line>Catanzaro</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Introduction
Many wet lab experiments lead to the accumulation of a large amount of data
related to interaction among proteins [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] also referred as Protein to Protein
Interaction (PPI) data. The whole set of protein interactions of a single organism
is also referred to as Protein to protein Interaction Network (PIN) and it is
built from a set of binary interactions. PINs have been easily modeled by using
undirected graphs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] where nodes are associated to proteins, and edges represent
interactions among proteins. The high dimension of this graph makes infeasible
the manual inspection even for simple organisms, so the study of PINs requires
graph-based computational methods.
      </p>
      <p>
        PPI databases, such as the Database of Interacting Proteins (DIP) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], are
often publicly available on the Internet and offer to the user the possibility to
retrieve data of interest through simple querying interfaces. User, in fact, can
conduct a search by the insertion of: (i) one or more protein identifiers, or (ii)
a protein sequence. Results may consist of, respectively, a list of proteins that
interact directly with the seed protein or that are at distance k from the seed
protein in the PIN. It is impossible to formulate even simple queries involving
biological concepts or annotations, such as: retrieve all the interactions that
are related to glucose synthesis‘. Nevertheless, these annotations there still exist
and are spread in different data sources. The main hypothesis of this paper is
that annotating PPI data with biological information may result in more rich
querying interfaces and in more powerful PINs analysis algorithms that may use
such biological information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This work presents a first prototype of a system
system able to adding to actual data the information coming from ontologies
such as Gene Ontology [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and from other sources. Moreover the proposed system
allows the use of annotated data into an analysis pipeline.
      </p>
      <p>
        The annotation of a PPI network consists of three main phases: (i) retrieval
of PPI data (Data Extraction Module), (ii) retrieval of existing annotations
for that data (Metadata Extraction Module), (iii) generation of annotated
interactions and storage into the annotated database. Initially, the proposed
system queries the existing interaction database and retrieves data about
interactions. Then the protein identifiers are used to find related annotations. For
instance, the Gene Ontology Annotation Database [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] (GOA) can be queried by
using the UniProt identifiers or Gene Ontology terms. Finally, data are merged
together and encoded by using an XML-based syntax, and stored into the
annotated database. Figure 1 depicts the architecture of the system for extracting
annotations from Gene Ontology and for annotating the PPI database.
      </p>
      <p>Main advantage of such system is the possibility to retrieve interactions,
non only proteins whose nodes have a given annotation. Let us consider protein
MEC1 of yeast and its interacting partners. Let us consider, moreover, the kinase
activity process. When a user searches in existing databases it will retrieve the
interactions: (MEC1, TEL1), (MEC1, RNR1). Successively, he/she has to check
the annotation manually to discover which proteins are annotated with kinase
activity. By using the annotated PPI database user can directly specify the
process retrieving desired informations.</p>
      <p>Such a system could be useful not only for the semantic search of data, but
also for the semantic-based analysis of PPI data. The analysis of PPI networks is
usually done by using graph-based algorithms, and associating graph properties
to biological properties of the modeled PPI. The availability of annotated data
could enable the development of novel algoritms able to gather such information.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Uetz, a.:
          <article-title>A comprehensive analysis of proteinprotein interactions in saccharomyces cerevisiae</article-title>
          .
          <source>Nature</source>
          <volume>403</volume>
          (
          <year>2000</year>
          )
          <fpage>623627</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>West</surname>
          </string-name>
          , D.B.:
          <article-title>Introduction to Graph Theory</article-title>
          . Prentice Hall,
          <string-name>
            <surname>NY</surname>
          </string-name>
          (
          <year>August 2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Salwinski</surname>
            ,
            <given-names>S.e.a.</given-names>
          </string-name>
          :
          <source>The Database of Interacting Proteins: 2004 update. Nucl. Acids Res</source>
          .
          <volume>32</volume>
          (
          <issue>suppl1</issue>
          ) (
          <year>2004</year>
          )
          <fpage>D449</fpage>
          -
          <lpage>451</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cannataro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzzi</surname>
            ,
            <given-names>P.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veltri</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Using ontologies for annotating and retrieving protein-protein interactions data</article-title>
          .
          <source>In: Computer-Based Medical Systems</source>
          ,
          <year>2009</year>
          .
          <source>CBMS 2009. 22nd IEEE International Symposium on. (Aug</source>
          .
          <year>2009</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          , et al:
          <article-title>The gene ontology (go) database and informatics resource</article-title>
          .
          <source>Nucleic Acids Res Nucleic Acids Res</source>
          <volume>32</volume>
          (
          <string-name>
            <surname>Database</surname>
            <given-names>issue)</given-names>
          </string-name>
          (
          <year>January 2004</year>
          )
          <fpage>258</fpage>
          -
          <lpage>61</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Camon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al:
          <article-title>The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology</article-title>
          .
          <source>Nucleic Acids Res</source>
          <volume>32</volume>
          (
          <string-name>
            <surname>Database</surname>
            <given-names>issue)</given-names>
          </string-name>
          (
          <year>January 2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>