<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linking Biological Data Across Organisms in Graph Databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luana Loubet Borges</string-name>
          <email>luana@lis.ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andre Santanche</string-name>
          <email>santanche@ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computing</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Campinas</institution>
          ,
          <addr-line>Campinas</addr-line>
          ,
          <country country="BR">Brasil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Representing data as networks have been shown to be a powerful approach for data analysis in biodiversity, e.g., interactions among organisms; relations among genes and phenotypes, etc. In this context, databases and repositories following a graph model (e.g., RDF) have been increasingly used to interconnect information and to support network-driven analysis. Usually, this kind of analysis requires gathering together and linking data from several distinct and heterogeneous sources. In this work, we investigate this challenge in the context of biological bases focusing on the characterization of living organisms, especially their phenotypes and diseases. It includes the rich diversity of Model Organism Databases (MODs) { repositories specialized in a particular taxon { widely used in the biological and medical studies. We exploit a lightweight integration approach, inspired in the Linked Open Data initiative, mapping several biological bases in a uni ed graph database { our BioGraph { and linking key elements to o er an interconnected view over the data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>2. We have created a uni ed database containing data from all these data
sources. To solve the heterogeneity problem, we developed a uni ed model
to support di erent approaches to describe phenotypes.
3. We have interlinked data from several sources combining two strategies: (i)
exploiting existing cross references among sources; (ii) importing bridges
between ontologies: Uberon and Uberpheno.
4. With the interlinked graph, we have inferred new edges and nodes, generating
knowledge.</p>
      <p>The main contributions of this work are: the uni ed model to support several
descriptive approaches for phenotypes and the uni ed graph database, containing
descriptions of phenotypes from 63 distinct data sources. Future work includes: to
import genes, linking them with their phenotypes and diseases and to implement
an interface for our system.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>R.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aleksic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butano</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carr</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Contrino</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyne</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyne</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalderimis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutherford</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , et al.:
          <article-title>Intermine: a exible data warehouse system for the integration and analysis of heterogeneous biological data</article-title>
          .
          <source>Bioinformatics</source>
          <volume>28</volume>
          (
          <issue>23</issue>
          ) (
          <year>2012</year>
          )
          <volume>3163</volume>
          {
          <fpage>3165</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torniai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkoutos</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haendel</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          , et al.:
          <article-title>Uberon, an integrative multi-species anatomy ontology</article-title>
          .
          <source>Genome Biol</source>
          <volume>13</volume>
          (
          <issue>1</issue>
          ) (
          <year>2012</year>
          ) R5
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>