<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RDF-Based Integration with SPARQL Building System for Life Science Database Archive</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Atsuko Yamaguchi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katsuhiko Okubo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norio Kobayashi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kouji Kozaki</string-name>
          <email>kozaki@ei.sanken.osaka-u.ac.jp</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadahiro Kumagai</string-name>
          <email>sadahiro.kumagai.jjg@hitachi.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai Lenz</string-name>
          <email>kai.lenzg@riken.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomoe Nobusada</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hongyan Wu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yasunori Yamamoto</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hideki Hatanaka</string-name>
          <email>hidekig@biosciencedbc.jp</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advanced Center for Computing and Communication (ACCC), RIKEN</institution>
          ,
          <addr-line>2-1 Hirosawa, Wako, Saitama, 351-0198</addr-line>
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Database Center for Life Science (DBCLS), ROIS</institution>
          ,
          <addr-line>178-4-4 Wakashiba, Kashiwa, Chiba, 277-0871</addr-line>
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Information &amp; Telecommunication Systems Company, Hitachi Ltd</institution>
          ,
          <addr-line>6-26-2 Minami Oi, Shinagawa-ku, Tokyo 140-8573</addr-line>
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Bioscience Database Center, JST</institution>
          ,
          <addr-line>5-3, Science Plaza 7F, Yonbancho, Chiyoda-ku, Tokyo 102-8666</addr-line>
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>The Institute of Scienti c and Industrial Research (ISIR), Osaka University</institution>
          ,
          <addr-line>8-1 Mihogaoka, Ibaraki, Osaka, 567-0047</addr-line>
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Life Science Database Archive (LSDB Archive, https://dbarchive.biosciencedbc.jp/ ) is a service to collect, preserve and provide databases generated by life-science researchers in Japan. As of September 2015, the LSDB Archive includes 103 databases and all the databases can be downloadable with appropriate licenses and metadata. Although a simple keyword search tool is available for the databases, more exible retrieval system to obtain relevant data from heterogeneous databases is required. Therefore, we rst converted the databases into RDF datasets, uploaded in a triple store. Then, we developed a prototype of retrieval system using SPARQL Builder. Because SPARQL Builder assists users in writing queries, the prototype enables users without knowledge of RDF to access the datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>SPARQL</kwd>
        <kwd>RDF</kwd>
        <kwd>database archive</kwd>
        <kwd>life-science databases</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In life sciences, many kinds of data have been generated as results of
experimental research. Although they are often organized and provided as databases,
we found that many databases in Japan were not maintained anymore after
fundings of projects end as of 2006 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Even in case that databases are
maintained, because they were unclear with respect to the terms of use, or were not
downloadable, they may not be fully used. To address these issues, the Life
Science Database Archive (LSDB Archive, https://dbarchive.biosciencedbc.jp), a
service to maintain, store and provide downloadable databases with appropriate
licences and uni ed descriptions, started from 2009. As of September 2015, the
LSDB Archive includes 103 heterogeneous databases generated by life-science
researchers during national projects in Japan.
      </p>
      <p>
        Although a simple keyword search tool is available for the databases, more
exible retrieval system to obtain relevant data from heterogeneous databases
is required. To do so, we decided to use semantic web technologies to integrate
the databases. We converted the databases into RDF datasets, and uploaded in
a triple store with a SPARQL endpoint as a trial. However, a SPARQL query
construct is intractable to users who are unfamiliar with semantic web
technologies although a SPARQL endpoint is very exible retrieval system. As a system
for assisting users to write SPARQL query, we employed SPARQL Builder [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
that is a semiautomatic SPARQL query generation system. Using this system,
we developed a prototype of a search interface for LSDB Archive that enable
users to extract semantically related data.
2
      </p>
      <p>
        Method and Result
To generate initial RDF datasets from databases in LSDB Archive, we used
TogoDB [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which accepts tabular formatted data and generates RDF datasets.
By indicating a class for each column in TogoDB, rdf:type for objects are
automatically attached. To type subjects, we added an additional data including
rdfs:domain for each property corresponding to a column. Because all the classes
used in the RDF datasets are introduced from external ontologies, we extracted
necessary part of those ontologies. Then we generated SPARQL Builder
metadata for the RDF datasets together with extracted ontologies.
      </p>
      <p>We then developed a prototype of a search interface on RDFized LSDB
Archive using SPARQL Builder. Using this interface, by selecting two classes and
a relationship between the two classes, users can search for desired data using
SPARQL query from combined heterogenious RDF datasets in LSDB archive.
Acknowledgments. This work was supported by the National Bioscience
Database Center (NBDC) of the Japan Science and Technology Agency (JST).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Ministry of Education, Culture, Sports, Science, and
          <string-name>
            <surname>Technology (MEXT) Integrated Database Project</surname>
          </string-name>
          : http://lifesciencedb.mext.go.jp/en/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Yamaguchi</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozaki</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenz</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobayashi</surname>
            <given-names>N.:</given-names>
          </string-name>
          <article-title>An Intelligent SPARQL Query Builder for Exploration of Various Life-science Databases</article-title>
          ,
          <source>CEUR Workshop Proceedings 1279, The 3rd International Workshop on Intelligent Exploration of Semantic Data (IESD</source>
          <year>2014</year>
          ),
          <source>Riva del Garda</source>
          , Italy.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. TogoDB: http://togodb.org/</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>