<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPANG: a command-line client supporting query generation for distributed SPARQL endpoints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hirokazu Chiba</string-name>
          <email>chiba@nibb.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ikuo Uchiyama</string-name>
          <email>uchiyama@nibb.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute for Basic Biology, National Institutes of Natural Sciences</institution>
          ,
          <addr-line>Nishigonaka 38, Myodaiji, Okazaki, Aichi, 444-8585</addr-line>
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>An increasing number of biological databases have been made available in the form of RDF that are accessible through SPARQL endpoints. These endpoints o er a valuable opportunity to utilize the RDF datasets as integrative databases. However, writing a SPARQL query often becomes a burden for biologists; thus, an easy-to-use querying tool for the SPARQL endpoints is necessary. Here, we developed SPANG, a command-line SPARQL client supporting query generation. SPANG can dynamically generate typical SPARQL queries according to the commandline arguments. SPANG can also use SPARQL templates existing in local system or on the Web. Further, SPANG allows a user to combine multiple queries, each with distinct target endpoints, via Unix pipe. These features enable easy access to RDF datasets through SPARQL endpoints, and enhance the integrative analysis of various biological data distributed across the Web. SPANG is freely available from http://purl.org/net/spang.</p>
      </abstract>
      <kwd-group>
        <kwd>SPARQL client</kwd>
        <kwd>distributed SPARQL endpoints</kwd>
        <kwd>SPARQL template library</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Because of the rapid progress in biotechnologies, various types of
biological data have been rapidly accumulating. The Semantic Web technology has
attracted attention as a promising approach for integrating such growing
heterogeneous data; thus, an increasing number of biological databases have been
made available in the form of RDF through SPARQL endpoints [1{4]. Although
these SPARQL endpoints provide an opportunity to utilize the RDF datasets
as integrative databases, using SPARQL language often requires cumbersome
coding tasks, such as pre x declaration of URIs and writing common code
patterns repeatedly. Automation of such troublesome tasks will help the researchers
use the biological databases through SPARQL endpoints. Here, we developed a
novel SPARQL client, SPANG, for reducing the burden of coding SPARQL and
increasing the reusability of written SPARQL codes.</p>
      <p>SPANG is a command-line client that can dynamically generate SPARQL
queries according to the command-line arguments. Basically, a single SPANG
command submits a query to an endpoint. SPANG has two modes of
operations: (i) SPARQL shortcut mode, where typical query patterns are generated
according to command-line options; and (ii) SPARQL template mode, where</p>
      <p>SPANG: SPARQL client supporting query generation
the speci ed SPARQL template and parameters in the command line are used
to generate a runtime query. The speci ed SPARQL template can be either a
local le or a le on the Web. A normal SPARQL query that does not include
parameters can also be executed in the SPARQL template mode. SPANG has
several other mechanisms to simplify the cumbersome tasks in SPARQL
querying; (a) pre x declarations described in con guration les are used for runtime
autocompletion; (b) nicknames for SPARQL endpoints de ned in con guration
les can be used in the command line; and (c) SPARQL template libraries in the
local system can be looked up by specifying a template name in the command
line. Although the distributed SPANG package provides prede ned con guration
les and a template library for general use, each user can extend the settings by
user-de ned les.</p>
      <p>
        Whereas a single SPANG command can submits a query to a speci c
endpoint, SPANG also enables combinatorial execution of multiple queries, each
with distinct target endpoint, by connecting respective SPANG processes via
Unix pipe. This functionality is similar to that of the SPARQL 1.1 federated
query using the SERVICE keyword [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The code of a federated query includes
nested subqueries and tends to be complicated. Instead, SPANG realizes
combinatorial execution of multiple queries by distinct Unix processes connected via
pipe to transfer variable bindings. Such a modular structure provides several
merits; the reduced complexity of each query makes its implementation and
debugging easier; and its combinatorial use with other queries or with other Unix
commands o ers a wide range of application.
      </p>
      <p>
        Thus, SPANG enables easy access to RDF datasets through SPARQL
endpoints, and also facilitate combinatorial use of distributed databases across the
Web. These functionalities will enhance the integrative analysis of various
biological data toward knowledge discovery. As a practical application, we will show a
set of example queries using the UniProt SPARQL endpoint [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the MBGD
SPARQL endpoint [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to transfer protein annotations from well characterized
genomes to poorly characterized genomes through the MBGD ortholog group
information as a hub.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Katayama</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aoki-Kinoshita</surname>
            ,
            <given-names>K.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawashima</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.:
          <article-title>BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains</article-title>
          .
          <source>J. Biomed. Semantics 5</source>
          ,
          <issue>5</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. UniProt Consortium:
          <article-title>Activities at the universal protein resource (UniProt)</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>42</volume>
          ,
          <issue>D191</issue>
          {
          <fpage>D198</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jupp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malone</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolleman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandizi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davies</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>The EBI RDF platform: linked open data for the life sciences</article-title>
          .
          <source>Bioinformatics</source>
          <volume>30</volume>
          1338-
          <fpage>1339</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chiba</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nishide</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uchiyama</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data</article-title>
          .
          <source>PLoS ONE</source>
          <volume>10</volume>
          ,
          <year>e0122802</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Prud'hommeaux</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Buil-Aranda</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>SPARQL 1.1 federated query</article-title>
          .
          <source>W3C Recommendation (21 March</source>
          <year>2013</year>
          ), http://www.w3.org/TR/sparql11-federated-query/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>