<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A RDF-Based Portal of Biological Phenotype Data Created in Japan</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Terue Takatsuki</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikako Saito</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadahiro Kumagai</string-name>
          <email>sadahiro.kumagai.jj@hitachi.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eiki Takayama</string-name>
          <email>e_takayama@brc.riken.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kazuya Ohshima</string-name>
          <email>kazuya22@brc.riken.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nozomu Ohshiro</string-name>
          <email>ohshiro-n@brc.riken.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai Lenz</string-name>
          <email>kai.lenz@riken.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nobuhiko Tanaka</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norio Kobayashi</string-name>
          <email>norio.kobayashi@riken.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiroshi Masuya</string-name>
          <email>hmasuya@brc.riken.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advanced Center for Computing and Communication, RIKEN</institution>
          ,
          <addr-line>2-1, Hirosawa, Wako</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hitachi Ltd.</institution>
          ,
          <addr-line>Omori Bellport B Bldg., Shinagawa, Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>RIKEN BioResource Center</institution>
          ,
          <addr-line>3-1-1 Kouyadai, Tsukuba</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We developed RDF-based databases of the phenotypes and animal strains produced in Japan and a portal site termed as “J-Phenome”. By the application of common schema, these databases can be retrieved by the same SPARQL query across graphs. In the operation of these databases, RDF represented multiple advantages such as improvement of comprehensive search, data integration using ontologies and public data, reuse of data and wider dissemination of phenotype data compared to conventional systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Biological phenotype</kwd>
        <kwd>data integration</kwd>
        <kwd>RDF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In life science, “Phenotype”, a biological characteristic, which an organism shows
as a result of interaction of genes and environment, is critical information for
researchers and the scientific community in order to choose appropriate experimental
materials for their studies.</p>
      <p>
        In this context, the recent sharing of phenotype data is performed by various
databases using phenotype ontologies. For example, Mammalian Phenotype ontology
(MP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Human Phenotype ontology (HP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are often used as standardized
vocabularies of phenotypes and symptoms. Equivalent links between MP and HP
terms, candidates of disease model animals can be shown through the
diseasephenotype relationship across human, rodents, fish, worms and flies [
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ]. These data
integrations are performed by Semantic Web technologies. Therefore, dissemination
of phenotype-related information, RDF technology seems to be one of the best
solutions.
      </p>
      <p>We introduced a data integration project, “J-Phenome” (http://jphenome.info/), for
wider dissemination of phenotype data produced in Japan using the RIKEN
MetaDatabase (http://metadb.riken.jp/) as an infrastructure of the RDF data handling.
In this paper, we overview the development of RDF datasets and discuss advantages
of RDF for sharing phenotype data in Japan and worldwide.</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <sec id="sec-2-1">
        <title>Collection of the phenotype data</title>
        <p>For integration of the phenotype data available in the Japan databases datasets were
collected from the original databases and are summarized in Table 1.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Conversion to the RDF data</title>
        <p>For the conversion of phenotype and animal strain data, we used a RDF-based data
schema, Bioresource Schema (BRS: http://metadb.riken.jp/metadb/db/bioresource_
*Abbreviations: CLO: Cell line ontology, RS: Rat Strain Ontology, MP: Mammalian Phenotype
Ontology, ORDO: Orphanet Rare Disease Ontology, CHEBI: Chemical Entities of Biological Interest
Ontology, CL: Cell Ontology, FMA: Foundational Model of Anatomy, GO: Gene Ontology, IAO:
Information Artifact Ontology, MA: Mouse Adult Gross Anatomy Ontology, MPATH: Mouse Pathology
Ontology, NBO: Neuro Behavior Ontology, NCBITaxon, PATO: Phenotypic Quality Ontology, PR:
Protein Ontology, RO: Relations Ontology, UBERON（Uber Anatomy Ontology, UO: Units of
Measurement Ontology, ZP: Zebrafish phenotypes, CMO: Clinical Measurement Ontology, MCCV:
Microbial Culture Collection Vocabulary, MEO: Metagenome and Microbes Environmental Ontology,
CSSO: Clinical Signs and Symptoms Ontology, PDO: Pathogenic Disease Ontology
Number
of triples
12,473
655,347
26,058
203,562
618,938
513,509</p>
        <p>NCBITaxon, ZP,
UO, PATO, RO
MP, RS, CL, GO,
IAO, MA,
NCBITaxon, UO,
PATO, PR, RO
MP, CL, FMA, GO,
IAO, MA,
NCBITaxon, CMO,
UO, PATO, RO
CLO, ORDO, CL,
GO, IAO,
NCBITaxon, UO,
PATO, RO
NCBITaxon,
MCCV, MEO,
CSSO, PDO, UO,
RO
MP, CL, FMA, GO,
IAO, MA,
NCBITaxon, UO,
PATO, PR, RO,
UBERON, CHEBI,
NBO
MP, CL, GO, IAO,
MA, NCBITaxon,</p>
        <p>
          PATO, PR, RO,
schema). BRS provides standardized properties to describe attributes of instances.
BRS-based data conversion was performed using the data-conversion function of
RIKEN MetaDatabase [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As a result, we made eight RDF graphs for individual
databases. As “common vocabularies”, Open Biomedical Ontologies were used for
the annotations (Table 1).
The RDF data of phenotypes are visualized by RIKEN MetaDatabase, which provides
functions of Web-based data browsing, (Fig. 1) downloading and SPARQL endpoint
(http://metadb.riken.jp/sparql) that can process a query across graphs (databases).
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>
        In the preceding section, we described the outline of RDF-based data integration in
the J-Phenome project. Using the GUIs of the RIKEN MetaDatabase, biologists who
are inexperienced in RDF data can easily browse and explore links of the RDF (Fig.
1).
1 ） Datasets of J-Phenome will contribute to the data integration in RIKEN
MetaDatabase through use and provide common URIs for genes and bio-resources [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
2）Comprehensiveness in the data retrieving. As a result of coordination of data by
the common schema, especially utilizing common properties, cross-graph
(crossdatabase) search using same or similar query(ies) were realized. Particularly, with the
SPARQL endpoint, a single query can be applicable for phenotype data search of
phenotype data in different species.
3) For the cooperation with external database(s) to share RDF datasets, we imported
interrelationship data from Monarch Initiatives (https://monarchinitiative.org) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which easily expands the J-Phenome to show candidate diseases related with
phenotypes. This expansion contributes to add “values” of Japanese animal strains as
a “disease model”. In addition, data import of the “opposite direction” is also useful.
      </p>
      <p>We are currently planning to export the Japanese phenotype data to Monarch
Initiative for wider dissemination of the Japanese phenotype data. J-Phenome data
will be updated independently from Monarch Initiative’s data, and will require
frequent synchronizations to provide the latest data to users. However, we currently
do not apply federated query because of problems on the performance. Improvement
of performance on the federated query of SPARQL-related technology is expected to
expand inter-database cooperation with the RDF.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>We thank Drs. K. Naruse and H. Kaneko in National Institute for Basic Biology, T.
Kuramoto in Kyoto Univ., T. Mashimo in Osaka Univ. T. Takada and K. Kawakami
in National Institute of Genetics for giving useful advices and phenotype annotation
in the conversion of original phenotype data into RDF. We also thank for H. Mori in
Tokyo Institute of Technology, S. Kawashima in Database Center for Life Science
and S. Carbon in Lawrence Berkeley National Laboratory for useful discussion for
interoperability between databases. This work is partially supported by Database
Integration Coordination Program (DICP) of National Bioscience Database Center
(NBDC) / Japan Science and Technology Agency (JST).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hoehndorf</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schofield</surname>
            <given-names>PN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkoutos</surname>
            <given-names>GV</given-names>
          </string-name>
          .:
          <article-title>PhenomeNET: a whole-phenome approach to disease gene discovery</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>18</volume>
          ,
          <issue>e119</issue>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Köhler</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doelken</surname>
            <given-names>SC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruef</surname>
            <given-names>BJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauer</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Washington</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westerfield</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkoutos</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schofield</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smedley</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            <given-names>SE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robinson</surname>
            <given-names>PN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mungall</surname>
            <given-names>CJ</given-names>
          </string-name>
          .:
          <article-title>Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research</article-title>
          .
          <source>F1000Res</source>
          . (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Smith</surname>
            <given-names>CL1</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldsmith</surname>
            <given-names>CA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eppig</surname>
            <given-names>JT</given-names>
          </string-name>
          .:
          <article-title>The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information</article-title>
          .
          <source>Genome Biol</source>
          .
          <volume>6</volume>
          ,
          <issue>R7</issue>
          , (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Köhler</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doelken</surname>
            <given-names>SC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mungall</surname>
            <given-names>CJ</given-names>
          </string-name>
          et al.:
          <article-title>The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>42</volume>
          (Database issue):
          <fpage>D966</fpage>
          --
          <lpage>974</lpage>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lenz</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masuya</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobayashi</surname>
            <given-names>N. RIKEN</given-names>
          </string-name>
          <article-title>MetaDatabase: a database publication platform for RIKENs life-science researchers that promotes research collaborations over dierent research area</article-title>
          .
          <source>The 15th International Semantic Web Conference (ISWC2016)</source>
          ,
          <fpage>poster</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>